### Pandas-Dataframe And Series

Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning. It provide two primary data structres: Series and DataFrame. A series is a One Dimensional Array-like object, while data frame is a two dimensional, size-mutuable, and potentially heterogenous tabular data structre with labelled axes (rows and columns). 

In [1]:
!pip install pandas



In [2]:
import pandas as pd

In [3]:
## series 
# pandas series is a one dimensional array like object that can hold any data type. it is similar to a column in a table


In [5]:
data = [1,2,3,4,5]
series = pd.Series(data)
print("Series:\n", series)

Series:
 0    1
1    2
2    3
3    4
4    5
dtype: int64


In [6]:
## Create a series from dictionary elements
data = {"name":"Talib","age":25,"surname":"Sayyed","company":"Coreflex"}
ser_dict = pd.Series(data)

In [7]:
print(ser_dict)

name          Talib
age              25
surname      Sayyed
company    Coreflex
dtype: object


In [8]:
da = {"a":1,"b":2,"c":3}
serr_dict = pd.Series(da)
print(serr_dict)

a    1
b    2
c    3
dtype: int64


In [9]:
data = [10,20,30]
index = ['a','b','c']
ser=pd.Series(data,index=index)
print(ser)

a    10
b    20
c    30
dtype: int64


In [11]:
## Dataframe
# Create a dataframe from a dictionary of list
data = {
    "Name":["Talib","John","Jack"],
    "Age":[25,30,45],
    "City":["Pune","Mumbai","Banglore"]
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Talib,25,Pune
1,John,30,Mumbai
2,Jack,45,Banglore


In [12]:
print(df)

    Name  Age      City
0  Talib   25      Pune
1   John   30    Mumbai
2   Jack   45  Banglore


In [13]:
# To convert dataframe into a numpy array
import numpy as np
arr1=np.array(df)
print(arr1)

[['Talib' 25 'Pune']
 ['John' 30 'Mumbai']
 ['Jack' 45 'Banglore']]


In [23]:
## Create a dataframe from a list of dictionaries
data = [
    {'Name':'Talib','Age':25,'City':'Pune'},
    {'Name':'John','Age':30,'City':'Mumbai'},
    {'Name':'Jacob','Age':35,'City':'Banglore'},
    {'Name':'Jibran','Age':40,'City':'Chennai'}
]
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Talib,25,Pune
1,John,30,Mumbai
2,Jacob,35,Banglore
3,Jibran,40,Chennai


In [15]:
print(df)

     Name  Age      City
0   Talib   25      Pune
1    John   30    Mumbai
2   Jacob   35  Banglore
3  Jibran   40   Chennai


In [18]:
df=pd.read_csv('salary_data.csv')
df.head(5)

Unnamed: 0,YearsExperience,Salary
0,1.1,39343
1,1.3,46205
2,1.5,37731
3,2.0,43525
4,2.2,39891


In [20]:
df.tail(5)

Unnamed: 0,YearsExperience,Salary
25,9.0,105582
26,9.5,116969
27,9.6,112635
28,10.3,122391
29,10.5,121872


In [21]:
## Accessing data from dataframe
data

[{'Name': 'Talib', 'Age': 25, 'City': 'Pune'},
 {'Name': 'John', 'Age': 30, 'City': 'Mumbai'},
 {'Name': 'Jacob', 'Age': 35, 'City': 'Banglore'},
 {'Name': 'Jibran', 'Age': 40, 'City': 'Chennai'}]

In [24]:
df

Unnamed: 0,Name,Age,City
0,Talib,25,Pune
1,John,30,Mumbai
2,Jacob,35,Banglore
3,Jibran,40,Chennai


In [25]:
df['Name']

0     Talib
1      John
2     Jacob
3    Jibran
Name: Name, dtype: object

In [26]:
df['Age']

0    25
1    30
2    35
3    40
Name: Age, dtype: int64

In [27]:
type(df['Name'])

pandas.core.series.Series

In [30]:
df.loc[3]

Name     Jibran
Age          40
City    Chennai
Name: 3, dtype: object

In [31]:
df.iloc[2]

Name       Jacob
Age           35
City    Banglore
Name: 2, dtype: object

In [36]:
df.iloc[0][1]

  df.iloc[0][1]


25

In [37]:
### accessing a specified element
df['Name']


0     Talib
1      John
2     Jacob
3    Jibran
Name: Name, dtype: object

In [39]:
df.at[0,'Name']

'Talib'

In [40]:
df.at[2,'Age']

35

In [41]:
## Accessing a specific element using iat
df.iat[2,2]

'Banglore'

In [42]:
df

Unnamed: 0,Name,Age,City
0,Talib,25,Pune
1,John,30,Mumbai
2,Jacob,35,Banglore
3,Jibran,40,Chennai


In [43]:
## Data manipulation with dataframe
## add a new columns 
df['Salary']=[90000,50000,60000,70000]
df

Unnamed: 0,Name,Age,City,Salary
0,Talib,25,Pune,90000
1,John,30,Mumbai,50000
2,Jacob,35,Banglore,60000
3,Jibran,40,Chennai,70000


In [44]:
df.drop('Salary',axis=1) # Axis = 1 so that is will search for columns or else it will search for rows because by default axis is = 0

Unnamed: 0,Name,Age,City
0,Talib,25,Pune
1,John,30,Mumbai
2,Jacob,35,Banglore
3,Jibran,40,Chennai


In [45]:
df

Unnamed: 0,Name,Age,City,Salary
0,Talib,25,Pune,90000
1,John,30,Mumbai,50000
2,Jacob,35,Banglore,60000
3,Jibran,40,Chennai,70000


In [46]:
## As we can see above that salary column isn't permanently dropped
df.drop('Salary',axis = 1,inplace=True)

In [47]:
df

Unnamed: 0,Name,Age,City
0,Talib,25,Pune
1,John,30,Mumbai
2,Jacob,35,Banglore
3,Jibran,40,Chennai


In [48]:
## inremenent age to the age column
df['Age']=df['Age']+1
df

Unnamed: 0,Name,Age,City
0,Talib,26,Pune
1,John,31,Mumbai
2,Jacob,36,Banglore
3,Jibran,41,Chennai


In [49]:
# Drop a row using its index value
df.drop(1)

Unnamed: 0,Name,Age,City
0,Talib,26,Pune
2,Jacob,36,Banglore
3,Jibran,41,Chennai


In [50]:
df


Unnamed: 0,Name,Age,City
0,Talib,26,Pune
1,John,31,Mumbai
2,Jacob,36,Banglore
3,Jibran,41,Chennai


In [51]:
df.drop(1,inplace=True)

In [52]:
df

Unnamed: 0,Name,Age,City
0,Talib,26,Pune
2,Jacob,36,Banglore
3,Jibran,41,Chennai


In [53]:
df=pd.read_csv('salary_data.csv')
df.head(5)

Unnamed: 0,YearsExperience,Salary
0,1.1,39343
1,1.3,46205
2,1.5,37731
3,2.0,43525
4,2.2,39891


In [55]:
## Display the datatypes of each columns
print('Data Types :\n', df.dtypes)

# Describe the dataframe
print('Description :\n',df.describe())

# Group by a column and perform an aggregation
grouped = df.groupby('Salary')['Salary'].mean()
print("Mean value by Salary", grouped)

Data Types :
 YearsExperience    float64
Salary               int64
dtype: object
Description :
        YearsExperience         Salary
count        30.000000      30.000000
mean          5.313333   76003.000000
std           2.837888   27414.429785
min           1.100000   37731.000000
25%           3.200000   56720.750000
50%           4.700000   65237.000000
75%           7.700000  100544.750000
max          10.500000  122391.000000
Mean value by Salary Salary
37731      37731.0
39343      39343.0
39891      39891.0
43525      43525.0
46205      46205.0
54445      54445.0
55794      55794.0
56642      56642.0
56957      56957.0
57081      57081.0
57189      57189.0
60150      60150.0
61111      61111.0
63218      63218.0
64445      64445.0
66029      66029.0
67938      67938.0
81363      81363.0
83088      83088.0
91738      91738.0
93940      93940.0
98273      98273.0
101302    101302.0
105582    105582.0
109431    109431.0
112635    112635.0
113812    113812.0
116969    116969.0
1

In [57]:
df.describe()

Unnamed: 0,YearsExperience,Salary
count,30.0,30.0
mean,5.313333,76003.0
std,2.837888,27414.429785
min,1.1,37731.0
25%,3.2,56720.75
50%,4.7,65237.0
75%,7.7,100544.75
max,10.5,122391.0
