## Python Pandas - Indexing and Selecting Data

* Indexing in pandas means selecting particular rows and columns of data from a DataFrame. 
* It is also called as Subset Selection.

There are three types of Multi-axes indexing:

1. loc() - Label based
2. iloc() - Integer based

Note: 
3. ix() -  Both Label and Integer based. It has been deprecated in recent versions of Pandas 

In [1]:
# Import the pandas package
import pandas as pd

In [2]:
# Read the cvs file and display the details
df = pd.read_csv(r"covid_india.csv")
df

Unnamed: 0,S. No.,Name of State / UT,Active Cases,Cured/Discharged/Migrated,Deaths,Total Confirmed cases
0,1,Andaman and Nicobar,5,4964,62,5031
1,2,Andhra Pradesh,1400,883277,7184,891861
2,3,Arunachal Pradesh,3,16781,56,16840
3,4,Assam,1619,215080,1098,217797
4,5,Bihar,338,261136,1551,263025
5,6,Chandigarh,1088,21650,358,23096
6,7,Chhattisgarh,4006,309433,3890,317329
7,8,Dadra and Nagar Haveli and Daman and Diu,31,3405,2,3438
8,9,Delhi,2262,630493,10941,643696
9,10,Goa,749,54362,806,55917


In [3]:
# display all the columns
df.columns

Index(['S. No.', 'Name of State / UT', 'Active Cases',
       'Cured/Discharged/Migrated', 'Deaths', 'Total Confirmed cases'],
      dtype='object')

## Indexing a Dataframe using indexing operator [ ]

In [4]:
# Selecting the single Column 
df['Name of State / UT']

0                          Andaman and Nicobar
1                               Andhra Pradesh
2                            Arunachal Pradesh
3                                        Assam
4                                        Bihar
5                                   Chandigarh
6                                 Chhattisgarh
7     Dadra and Nagar Haveli and Daman and Diu
8                                        Delhi
9                                          Goa
10                                     Gujarat
11                                     Haryana
12                            Himachal Pradesh
13                           Jammu and Kashmir
14                                   Jharkhand
15                                   Karnataka
16                                      Kerala
17                                      Ladakh
18                                 Lakshadweep
19                                 Maharashtra
20                                     Manipur
21           

In [5]:
# Selecting multiple columns 
df[["Name of State / UT","Total Confirmed cases"]]

Unnamed: 0,Name of State / UT,Total Confirmed cases
0,Andaman and Nicobar,5031
1,Andhra Pradesh,891861
2,Arunachal Pradesh,16840
3,Assam,217797
4,Bihar,263025
5,Chandigarh,23096
6,Chhattisgarh,317329
7,Dadra and Nagar Haveli and Daman and Diu,3438
8,Delhi,643696
9,Goa,55917


## 1. Label Based Selection - loc()

* This function selects data by the label of the rows and columns. 

* loc takes two single/list/range operator separated by ','. The first one indicates the row and the second one indicates columns.

In [6]:
#select all rows and columns
df.loc[:,]

Unnamed: 0,S. No.,Name of State / UT,Active Cases,Cured/Discharged/Migrated,Deaths,Total Confirmed cases
0,1,Andaman and Nicobar,5,4964,62,5031
1,2,Andhra Pradesh,1400,883277,7184,891861
2,3,Arunachal Pradesh,3,16781,56,16840
3,4,Assam,1619,215080,1098,217797
4,5,Bihar,338,261136,1551,263025
5,6,Chandigarh,1088,21650,358,23096
6,7,Chhattisgarh,4006,309433,3890,317329
7,8,Dadra and Nagar Haveli and Daman and Diu,31,3405,2,3438
8,9,Delhi,2262,630493,10941,643696
9,10,Goa,749,54362,806,55917


In [7]:
#select all rows for a specific column
df.loc[:,'Name of State / UT']

0                          Andaman and Nicobar
1                               Andhra Pradesh
2                            Arunachal Pradesh
3                                        Assam
4                                        Bihar
5                                   Chandigarh
6                                 Chhattisgarh
7     Dadra and Nagar Haveli and Daman and Diu
8                                        Delhi
9                                          Goa
10                                     Gujarat
11                                     Haryana
12                            Himachal Pradesh
13                           Jammu and Kashmir
14                                   Jharkhand
15                                   Karnataka
16                                      Kerala
17                                      Ladakh
18                                 Lakshadweep
19                                 Maharashtra
20                                     Manipur
21           

In [8]:
# Select all rows for multiple columns, say list[]
df.loc[:,['Name of State / UT','Total Confirmed cases']]

Unnamed: 0,Name of State / UT,Total Confirmed cases
0,Andaman and Nicobar,5031
1,Andhra Pradesh,891861
2,Arunachal Pradesh,16840
3,Assam,217797
4,Bihar,263025
5,Chandigarh,23096
6,Chhattisgarh,317329
7,Dadra and Nagar Haveli and Daman and Diu,3438
8,Delhi,643696
9,Goa,55917


In [9]:
# Select few rows for multiple columns, say list[]
df.loc[[1,15,16,26,30,31],['Name of State / UT','Total Confirmed cases']]

Unnamed: 0,Name of State / UT,Total Confirmed cases
1,Andhra Pradesh,891861
15,Karnataka,960272
16,Kerala,1091270
26,Puducherry,40030
30,Tamil Nadu,859726
31,Telengana,301318


## 2. Integer Based Selection - iloc()


* Like python and numpy, these are 0-based indexing.

* This function allows us to retrieve rows and columns by position. In order to do that, we’ll need to specify the positions of the rows that we want, and the positions of the columns that we want as well. 

* The df.iloc indexer is very similar to df.loc but only uses integer locations to make its selections.

In [10]:
#select all rows and columns
df.iloc[:,]

Unnamed: 0,S. No.,Name of State / UT,Active Cases,Cured/Discharged/Migrated,Deaths,Total Confirmed cases
0,1,Andaman and Nicobar,5,4964,62,5031
1,2,Andhra Pradesh,1400,883277,7184,891861
2,3,Arunachal Pradesh,3,16781,56,16840
3,4,Assam,1619,215080,1098,217797
4,5,Bihar,338,261136,1551,263025
5,6,Chandigarh,1088,21650,358,23096
6,7,Chhattisgarh,4006,309433,3890,317329
7,8,Dadra and Nagar Haveli and Daman and Diu,31,3405,2,3438
8,9,Delhi,2262,630493,10941,643696
9,10,Goa,749,54362,806,55917


In [15]:
#select all rows for a specific column
df.iloc[:,1]

0                          Andaman and Nicobar
1                               Andhra Pradesh
2                            Arunachal Pradesh
3                                        Assam
4                                        Bihar
5                                   Chandigarh
6                                 Chhattisgarh
7     Dadra and Nagar Haveli and Daman and Diu
8                                        Delhi
9                                          Goa
10                                     Gujarat
11                                     Haryana
12                            Himachal Pradesh
13                           Jammu and Kashmir
14                                   Jharkhand
15                                   Karnataka
16                                      Kerala
17                                      Ladakh
18                                 Lakshadweep
19                                 Maharashtra
20                                     Manipur
21           

In [12]:
# Select all rows for multiple columns, say list[]
df.iloc[:,[1,5]]

Unnamed: 0,Name of State / UT,Total Confirmed cases
0,Andaman and Nicobar,5031
1,Andhra Pradesh,891861
2,Arunachal Pradesh,16840
3,Assam,217797
4,Bihar,263025
5,Chandigarh,23096
6,Chhattisgarh,317329
7,Dadra and Nagar Haveli and Daman and Diu,3438
8,Delhi,643696
9,Goa,55917


In [13]:
# Select few rows for multiple columns, say list[]
df.iloc[[1,15,16,26,30,31],[1,5]]

Unnamed: 0,Name of State / UT,Total Confirmed cases
1,Andhra Pradesh,891861
15,Karnataka,960272
16,Kerala,1091270
26,Puducherry,40030
30,Tamil Nadu,859726
31,Telengana,301318


In [14]:
# Integer slicing
df.iloc[2:10,1:4]

Unnamed: 0,Name of State / UT,Active Cases,Cured/Discharged/Migrated
2,Arunachal Pradesh,3,16781
3,Assam,1619,215080
4,Bihar,338,261136
5,Chandigarh,1088,21650
6,Chhattisgarh,4006,309433
7,Dadra and Nagar Haveli and Daman and Diu,31,3405
8,Delhi,2262,630493
9,Goa,749,54362


## Boolean Indexing

Check the condition for each rows and display the result which satisfy the condition(true)

In [21]:
# Death greater than 10000
df[df['Deaths']>10000]

Unnamed: 0,S. No.,Name of State / UT,Active Cases,Cured/Discharged/Migrated,Deaths,Total Confirmed cases
8,9,Delhi,2262,630493,10941,643696
15,16,Karnataka,8383,939499,12390,960272
19,20,Maharashtra,127480,2134072,52861,2314413
30,31,Tamil Nadu,4870,842309,12547,859726
35,36,West Bengal,3143,564912,10292,578347


In [24]:
# Total Confirmed cases greater than 1 Million(10 Lakhs)
df[df['Total Confirmed cases']>1000000]

Unnamed: 0,S. No.,Name of State / UT,Active Cases,Cured/Discharged/Migrated,Deaths,Total Confirmed cases
16,17,Kerala,29777,1057097,4396,1091270
19,20,Maharashtra,127480,2134072,52861,2314413
