## What is Pandas

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data. This library is built on top of the NumPy library.  Pandas allows importing data from various file formats such as comma-separated values, JSON, Parquet, SQL database tables or queries, and Microsoft Excel.

#1. Dataframe
Pandas DataFrame is two-dimensional size-mutable, tabular data structure with labeled axes (rows and columns). Dataframe consists of three principal components, the data, rows, and columns.

In [None]:
# Creating a dataframe using List.
# Dataframe can be created in different ways like by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file or creating the lists, dictionary etc

# import pandas library and call it as pd
import pandas as pd

In [None]:
# list of strings
lst = {'Name':['Tom', 'nick', 'krish', 'jack'],
        'Age':[20, 21, 19, 18]}

In [None]:
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
print(df)

    Name  Age
0    Tom   20
1   nick   21
2  krish   19
3   jack   18


# 2. Series
Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. Pandas Series is nothing but a column in an excel sheet.


- Series Vs. DataFrame? Series is considered a single column of a DataFrame.

In [None]:
# Creating a string using List.

# import pandas as pd
import pandas as pd

In [None]:
# list of strings
data = ['Welcome', 'to', 'the', 'pandas', 
            'tutorial', 'using', 'python']

In [None]:
# create series form a list
ser = pd.Series(data)
print(ser)

0     Welcome
1          to
2         the
3      pandas
4    tutorial
5       using
6      python
dtype: object


# 3. Reading data from different files

#### 3.1 Read data from CSV

In [None]:
# Import pandas
import pandas as pd

In [None]:
 # reading csv file
df = pd.read_csv("/content/sample_data/mnist_test.csv") # path of your CSV file

3.2 Read data from EXCEL

In [None]:
# reading csv file
df = pd.read_excel("/content/sample_data/mnist_train_small.xlsx") # path of your CSV file

3.3 Read data from text file

In [None]:
df = pd.read_csv("data.txt")

# 4. Viewing Data

### We will be using [rainfall.csv](https://drive.google.com/file/d/1YkpySCBa64nwoShEoJSe7KkJR_Y_l7Wk/view?usp=sharing)

In [None]:
 # reading csv file
df = pd.read_csv("Rainfall.csv")

### 4.1 To view top 5 records in dataset

In [None]:
df.head()

Unnamed: 0,STATE_UT_NAME,DISTRICT,JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC,ANNUAL,Jan-Feb,Mar-May,Jun-Sep,Oct-Dec
0,ANDAMAN And NICOBAR ISLANDS,NICOBAR,107.3,57.9,65.2,117.0,358.5,295.5,285.0,271.9,354.8,326.0,315.2,250.9,2805.2,165.2,540.7,1207.2,892.1
1,ANDAMAN And NICOBAR ISLANDS,SOUTH ANDAMAN,43.7,26.0,18.6,90.5,374.4,457.2,421.3,423.1,455.6,301.2,275.8,128.3,3015.7,69.7,483.5,1757.2,705.3
2,ANDAMAN And NICOBAR ISLANDS,N & M ANDAMAN,32.7,15.9,8.6,53.4,343.6,503.3,465.4,460.9,454.8,276.1,198.6,100.0,2913.3,48.6,405.6,1884.4,574.7
3,ARUNACHAL PRADESH,LOHIT,42.2,80.8,176.4,358.5,306.4,447.0,660.1,427.8,313.6,167.1,34.1,29.8,3043.8,123.0,841.3,1848.5,231.0
4,ARUNACHAL PRADESH,EAST SIANG,33.3,79.5,105.9,216.5,323.0,738.3,990.9,711.2,568.0,206.9,29.5,31.7,4034.7,112.8,645.4,3008.4,268.1


### 4.2 select particular columns

In [None]:
df[['DISTRICT', 'ANNUAL']]

Unnamed: 0,DISTRICT,ANNUAL
0,NICOBAR,2805.2
1,SOUTH ANDAMAN,3015.7
2,N & M ANDAMAN,2913.3
3,LOHIT,3043.8
4,EAST SIANG,4034.7
...,...,...
636,IDUKKI,3302.5
637,KASARGOD,3621.6
638,PATHANAMTHITTA,2958.4
639,WAYANAD,3253.1


### 4.3 Indexing using loc

This function selects data by the label of the rows and columns. 



In [None]:
# retrieving row by loc method
first = df.loc[df.JAN ==4.8]

In [None]:
first

Unnamed: 0,STATE_UT_NAME,DISTRICT,JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC,ANNUAL,Jan-Feb,Mar-May,Jun-Sep,Oct-Dec
115,ORISSA,KALAHANDI,4.8,15.0,15.3,25.6,44.2,237.5,368.5,371.8,229.8,72.3,13.4,4.8,1403.0,19.8,85.1,1207.6,90.5
384,RAJASTHAN,BHILWARA,4.8,3.8,3.9,3.0,10.4,60.0,213.6,217.7,89.6,11.7,8.0,3.7,630.2,8.6,17.3,580.9,23.4
639,KERALA,WAYANAD,4.8,8.3,17.5,83.3,174.6,698.1,1110.4,592.9,230.7,213.1,93.6,25.8,3253.1,13.1,275.4,2632.1,332.5


### 4.4 Indexing using iloc


This function allows us to retrieve rows and columns by position. 

In [None]:
# retrieving row by iloc method
second = df.iloc[2]

In [None]:
second

STATE_UT_NAME    ANDAMAN And NICOBAR ISLANDS
DISTRICT                       N & M ANDAMAN
JAN                                     32.7
FEB                                     15.9
MAR                                      8.6
APR                                     53.4
MAY                                    343.6
JUN                                    503.3
JUL                                    465.4
AUG                                    460.9
SEP                                    454.8
OCT                                    276.1
NOV                                    198.6
DEC                                    100.0
ANNUAL                                2913.3
Jan-Feb                                 48.6
Mar-May                                405.6
Jun-Sep                               1884.4
Oct-Dec                                574.7
Name: 2, dtype: object

### 4.5 To view bottom 5 records in dataset

In [None]:
df.tail()

Unnamed: 0,STATE_UT_NAME,DISTRICT,JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC,ANNUAL,Jan-Feb,Mar-May,Jun-Sep,Oct-Dec
636,KERALA,IDUKKI,13.4,22.1,43.6,150.4,232.6,651.6,788.9,527.3,308.4,343.2,172.9,48.1,3302.5,35.5,426.6,2276.2,564.2
637,KERALA,KASARGOD,2.3,1.0,8.4,46.9,217.6,999.6,1108.5,636.3,263.1,234.9,84.6,18.4,3621.6,3.3,272.9,3007.5,337.9
638,KERALA,PATHANAMTHITTA,19.8,45.2,73.9,184.9,294.7,556.9,539.9,352.7,266.2,359.4,213.5,51.3,2958.4,65.0,553.5,1715.7,624.2
639,KERALA,WAYANAD,4.8,8.3,17.5,83.3,174.6,698.1,1110.4,592.9,230.7,213.1,93.6,25.8,3253.1,13.1,275.4,2632.1,332.5
640,LAKSHADWEEP,LAKSHADWEEP,20.8,14.7,11.8,48.9,171.7,330.2,287.7,217.5,163.1,157.1,117.7,58.8,1600.0,35.5,232.4,998.5,333.6


### 4.6 Describing Dataframe

escribing data frame with both object and numeric data type



In [None]:
df.describe()

Unnamed: 0,JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC,ANNUAL,Jan-Feb,Mar-May,Jun-Sep,Oct-Dec
count,641.0,641.0,641.0,641.0,641.0,641.0,641.0,641.0,641.0,641.0,641.0,641.0,641.0,641.0,641.0,641.0,641.0
mean,18.35507,20.984399,30.034789,45.543214,81.535101,196.007332,326.033697,291.152262,194.609048,90.446334,34.117473,18.150858,1346.969579,39.33947,157.113105,1007.80234,142.714665
std,21.082806,27.729596,45.451082,71.556279,111.96039,196.556284,221.364643,152.647325,99.83054,74.990685,59.371274,32.711009,838.878874,47.212773,213.445888,629.33261,148.951752
min,0.0,0.0,0.0,0.0,0.9,3.8,11.6,14.1,8.6,3.1,1.2,0.0,94.6,0.0,1.5,39.6,5.6
25%,6.9,7.0,7.0,5.0,12.1,68.8,206.4,194.6,128.8,34.3,6.6,5.3,830.4,14.7,27.8,625.4,51.6
50%,13.3,12.3,12.7,15.1,33.9,131.9,293.7,284.8,181.3,62.6,12.9,7.9,1116.2,27.7,67.2,896.6,86.7
75%,19.2,24.1,33.2,48.3,91.9,226.6,374.8,358.1,234.1,130.2,32.3,14.9,1530.9,41.1,172.4,1193.8,175.2
max,144.5,229.6,367.9,554.4,733.7,1476.2,1820.9,1522.1,826.3,517.7,475.1,297.7,7229.3,335.3,1256.5,5228.0,1048.5


### 4.7 Information of Dataframe

The info() method prints information about the DataFrame

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 641 entries, 0 to 640
Data columns (total 19 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   STATE_UT_NAME  641 non-null    object 
 1   DISTRICT       641 non-null    object 
 2   JAN            641 non-null    float64
 3   FEB            641 non-null    float64
 4   MAR            641 non-null    float64
 5   APR            641 non-null    float64
 6   MAY            641 non-null    float64
 7   JUN            641 non-null    float64
 8   JUL            641 non-null    float64
 9   AUG            641 non-null    float64
 10  SEP            641 non-null    float64
 11  OCT            641 non-null    float64
 12  NOV            641 non-null    float64
 13  DEC            641 non-null    float64
 14  ANNUAL         641 non-null    float64
 15  Jan-Feb        641 non-null    float64
 16  Mar-May        641 non-null    float64
 17  Jun-Sep        641 non-null    float64
 18  Oct-Dec   

### 4.8 Checking Missing Value

In [None]:
df.isnull()
## True respresents value is null & False respresents not null

Unnamed: 0,STATE_UT_NAME,DISTRICT,JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC,ANNUAL,Jan-Feb,Mar-May,Jun-Sep,Oct-Dec
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
636,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
637,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
638,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
639,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [None]:
df.isnull().sum()
## We have no null value so all so each column will shown count as 0

STATE_UT_NAME    0
DISTRICT         0
JAN              0
FEB              0
MAR              0
APR              0
MAY              0
JUN              0
JUL              0
AUG              0
SEP              0
OCT              0
NOV              0
DEC              0
ANNUAL           0
Jan-Feb          0
Mar-May          0
Jun-Sep          0
Oct-Dec          0
dtype: int64