### Introduction to pandas:
<ul>
<li>This notebook introduces one to the data handling library of python i.e <b>pandas.</b></li>
<li>The dataset that is used is provided by <b>WHO</b> to monitor <b>TB deaths in BRICS</b> countries.</li>
<li>This notebook has notes in form of comments written inside code cells</li>
</ul> 
<h4><b>Author:</b> Vishva Patel</h4>
<h4><b>Github Repository:</b> Data Visualization with python<h4>

In [0]:
import pandas as pd #This is python's builtin library to work with data analysis.

In [13]:
#Loading the dataset. Don't forget to upload the file. 
dataset = pd.read_excel('WHO POP TB all.xls');
dataset #printing the dataset.

Unnamed: 0,Country,Population (1000s),TB deaths
0,Afghanistan,30552,13000.00
1,Albania,3173,20.00
2,Algeria,39208,5100.00
3,Andorra,79,0.26
4,Angola,21472,6900.00
...,...,...,...
189,Venezuela (Bolivarian Republic of),30405,480.00
190,Viet Nam,91680,17000.00
191,Yemen,24407,990.00
192,Zambia,14539,3600.00


In [14]:
#Accessing a column from a dataset
dataset['TB deaths']

0      13000.00
1         20.00
2       5100.00
3          0.26
4       6900.00
         ...   
189      480.00
190    17000.00
191      990.00
192     3600.00
193     5700.00
Name: TB deaths, Length: 194, dtype: float64

In [15]:
#Calculations with columns
tbDeaths = dataset['TB deaths']
tbDeaths.sum() #calculates the total deaths. 

1072677.97

In [17]:
#Calculating the min and max deaths
print("Minimum deaths")
tbDeaths.min() #calculating the minimum.


Minimum deaths


0.0

In [18]:
print("Maximum deaths")
tbDeaths.max() #calculating the maximum of the column. 

Maximum deaths


240000.0

In [20]:
#Calculating the average deaths in all the countries
tbDeaths.sum() / 194

5529.267886597938

In [21]:
#one can also use mean() method
tbDeaths.mean()

5529.267886597938

In [22]:
#Calculating the median
tbDeaths.median()

315.0

In [23]:
#Sorting the values in the columns 
#Python has a built-in method wichtakes a column as an argument and retunrs a sorted dataframe.
dataset.sort_values('TB deaths')
#Make sure that this does not change the actual data.

Unnamed: 0,Country,Population (1000s),TB deaths
147,San Marino,31,0.00
125,Niue,1,0.01
111,Monaco,38,0.03
3,Andorra,79,0.26
129,Palau,21,0.36
...,...,...,...
128,Pakistan,182143,49000.00
78,Indonesia,249866,64000.00
13,Bangladesh,156595,80000.00
124,Nigeria,173615,160000.00


In [24]:
#Sorting does not only happens with numbers, it can also be used with string datatypes.
#sort_values() for a string sorts in alphabetical order.
dataset.sort_values('Country')

Unnamed: 0,Country,Population (1000s),TB deaths
0,Afghanistan,30552,13000.00
1,Albania,3173,20.00
2,Algeria,39208,5100.00
3,Andorra,79,0.26
4,Angola,21472,6900.00
...,...,...,...
189,Venezuela (Bolivarian Republic of),30405,480.00
190,Viet Nam,91680,17000.00
191,Yemen,24407,990.00
192,Zambia,14539,3600.00


In [26]:
#selecting just a specific columns 
dataset[['Country', 'Population (1000s)']]

Unnamed: 0,Country,Population (1000s)
0,Afghanistan,30552
1,Albania,3173
2,Algeria,39208
3,Andorra,79
4,Angola,21472
...,...,...
189,Venezuela (Bolivarian Republic of),30405
190,Viet Nam,91680
191,Yemen,24407
192,Zambia,14539


In [28]:
#Adding a new column
#We will add a column that will display the difference from average of the population for each country.
dataset['Diff_from_avg'] = dataset['Population (1000s)'] - dataset['Population (1000s)'].mean() 
dataset['Diff_from_avg']

0      -6180.469072
1     -33559.469072
2       2475.530928
3     -36653.469072
4     -15260.469072
           ...     
189    -6327.469072
190    54947.530928
191   -12325.469072
192   -22193.469072
193   -22582.469072
Name: Diff_from_avg, Length: 194, dtype: float64