# Pandas
- Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
- It is powerfull library work on data frames that have 'Relational' or 'labeled' data.
- Its aim aligns with doing real-world data analaysis using python.

### Types of Data Structures:
1. Series: One-dimensional labeled array
2. DataFrame: Two-dimensional labeled data structure (like a table)

### to install pandas
- !pip install pandas

In [1]:
import numpy as np
import pandas as pd

# Series
- Series is a One-Dimensional labeled array capable of holding data of any type(integer, string, float, python object, etc.)
- The axis labels are collectively called index.
- A pandas Series can be created using the constructor:
    - pandas.Series(data, index, dtype)

### Creating Series
1. From a List
 - by default, index value is same as indexing point.
   if assign index value length must same as data.

In [2]:
s = pd.Series()
s

s_lst = pd.Series(data=[1,2,3,4,5,6])
s_lst

s_lst_index= pd.Series([10,20,30,40,50], index=['a','b','c','d','e'])
s_lst_index


a    10
b    20
c    30
d    40
e    50
dtype: int64

2. From a Dictionary
  -if index is not assign, it takes key as index.
  -if index is assign, value assign as index order.
  

In [3]:
s_dict = pd.Series({
    'a': 12,
    'b':15,
    'c': 45,
    'd': 30
}, index = ['d','c','a','e'])
s_dict

d    30.0
c    45.0
a    12.0
e     NaN
dtype: float64

In [4]:
np.nan

nan

3. From NumPy Array

In [5]:
data=np.array(['a','b','c','d','e'])
s_array = pd.Series(data, index=[101,102,103,104,105])
s_array

101    a
102    b
103    c
104    d
105    e
dtype: object

In [6]:
pd.Series(5,index=range(0,10))

0    5
1    5
2    5
3    5
4    5
5    5
6    5
7    5
8    5
9    5
dtype: int64

In [7]:
# Accessing and Indexing
# - .loc = it use labeled of structures.
# - .iloc= it use default index Value 
#s_array[101]
s_array.iloc[0]

'a'

In [8]:
s_array.loc[101]

'a'

In [9]:
s_dict.loc['a']

np.float64(12.0)

In [10]:
#boolean masking
s_lst_index>=40

a    False
b    False
c    False
d     True
e     True
dtype: bool

In [11]:
#boolean indexing
s_lst_index.loc[s_lst_index>=40]

d    40
e    50
dtype: int64

In [12]:
s_lst_index.loc[['a','e','b']]

a    10
e    50
b    20
dtype: int64

In [13]:
s_lst_index[['a','e','b']]

a    10
e    50
b    20
dtype: int64

# DataFrame
- A DataFrame is a two-dimensional data structure.
- i.e., data is aligned in a tabular format in rows and columns.
- Potentially columns are of different types:
    - size: mutable
    - labeled axes(rows and columns)
    - can perform arithmetic operations on rows and columns wise.
- pandas DataFrame can be created using constructor:
    - pandas.DataFrame(data, index, columns,dtype)

### Creating DataFrames
1. From Dictionary of Lists

In [14]:
data=   [[10,20,30,40,50],['Ram','Hari','Sita','Gita','Rita']]
data

[[10, 20, 30, 40, 50], ['Ram', 'Hari', 'Sita', 'Gita', 'Rita']]

In [15]:
pd. DataFrame(data,columns=[2020,2021,2022,2023,20240],index=['Marks','Name'])

Unnamed: 0,2020,2021,2022,2023,20240
Marks,10,20,30,40,50
Name,Ram,Hari,Sita,Gita,Rita


In [16]:
pd.DataFrame([['Ram',10],['hari',20],['shyam',30]],columns=['Name','Id'])

Unnamed: 0,Name,Id
0,Ram,10
1,hari,20
2,shyam,30


2. Dictionary of List

In [17]:
data={
    'Name' : ['Ram','Shyam','hari','gita'],
    'Age': [20,30,40,50]

}
df= pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Ram,20
1,Shyam,30
2,hari,40
3,gita,50


3. List of Edit

In [18]:
data=[
    {'name': 'bob','Age':22},
    {'name':'ole','Age':23} ,
]
pd.DataFrame(data)

Unnamed: 0,name,Age
0,bob,22
1,ole,23


In [19]:
pd.DataFrame(
    {
        'Name':pd.Series(['Ram','Shya','hari','gita']),
        'Age': pd.Series([20,30,40,50])
    }
)


Unnamed: 0,Name,Age
0,Ram,20
1,Shya,30
2,hari,40
3,gita,50


# Accessing and Indexing

In [20]:
type(df)

pandas.core.frame.DataFrame

In [21]:
type(df['Name'])

pandas.core.series.Series

In [22]:
df['Name']

0      Ram
1    Shyam
2     hari
3     gita
Name: Name, dtype: object

In [23]:
df[['Age','Name']]

Unnamed: 0,Age,Name
0,20,Ram
1,30,Shyam
2,40,hari
3,50,gita


In [24]:
df.iloc[[0,3],[1]]

Unnamed: 0,Age
0,20
3,50


In [25]:
df.loc[[0,3],['Age','Name']]

Unnamed: 0,Age,Name
0,20,Ram
3,50,gita


In [26]:
df.iloc[0:2]

Unnamed: 0,Name,Age
0,Ram,20
1,Shyam,30


In [27]:
df.loc[0:2]

Unnamed: 0,Name,Age
0,Ram,20
1,Shyam,30
2,hari,40


In [28]:
# Boolean Indexing - display data having age above 30
df['Age']>30

0    False
1    False
2     True
3     True
Name: Age, dtype: bool

In [29]:
# Correct One of previous question
df[df['Age']>30]

Unnamed: 0,Name,Age
2,hari,40
3,gita,50


In [30]:
ind= df[df['Age']>30].index
ind

Index([2, 3], dtype='int64')

In [31]:
df.loc[ind]

Unnamed: 0,Name,Age
2,hari,40
3,gita,50


In [32]:
df

Unnamed: 0,Name,Age
0,Ram,20
1,Shyam,30
2,hari,40
3,gita,50


In [33]:
print('dimension: ',df.ndim)
print('Shape: ',df.shape)
print('Length: ',len(df))

dimension:  2
Shape:  (4, 2)
Length:  4


# To displat total features names 

In [34]:
df.columns

Index(['Name', 'Age'], dtype='object')

In [35]:
df.keys()

Index(['Name', 'Age'], dtype='object')