---   

<h1 align="center">Introduction to Data Analyst and Data Science for beginners</h1>
<h1 align="center">Lecture no 2.15(Pandas-06)</h1>

---
<h3><div align="right">Ehtisham Sadiq</div></h3>    

<img align="right" width="400" height="400"  src="images/pandas-apps.png"  >

## _Modifying Dataframes Part-I_

In [None]:
# To install this library in Jupyter notebook
#import sys
#!{sys.executable} -m pip install pandas

In [1]:
import pandas as pd
pd.__version__ , pd.__path__

('1.4.2', ['/home/dell/.local/lib/python3.8/site-packages/pandas'])

## Learning agenda of this notebook
1. Modifying Column labels of Dataframe
2. Modifying Row indices of Dataframe
3. Modifying Row(s) Data (Records) of a Dataframe
   - Modifying a single Row
   - Modifying multiple Rows
       - `map()` Method
       - `df.remove()` Method
       - `df.apply()` Method
       - `df.applymap()` Method

##  Read a Sample Dataframe

In [2]:
import pandas as pd
df = pd.read_csv('datasets/groupdata.csv')
df.head()

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
4,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0


In [3]:
# `shape` attribute of a dataframe object return a two value tuple containing rows and columns
# Note the rows count does not include the column labels and column count does not include the row index
df.shape

(16, 10)

In [4]:
# `index` attribute of a dataframe object return the list of row indices and its datatype
df.index

RangeIndex(start=0, stop=16, step=1)

In [5]:
# `columns` attribute of a dataframe object return the list of column labels and its datatype
df.columns

Index(['roll no', 'name', 'age', 'address', 'session', 'group', 'gender',
       'subj1', 'subj2', 'scholarship'],
      dtype='object')

In [6]:
# `dtypes` attribute of a dataframe object return the data type of each column in the dataframe
df.dtypes

roll no         object
name            object
age              int64
address         object
session         object
group           object
gender          object
subj1          float64
subj2          float64
scholarship    float64
dtype: object

## 1. Modifying Column Names of a Dataframe
- Every dataframe has column labels associated with its columns
- These by default are integer values from 0,1,2,3...
- However, while creating a dataframe from scratch, or while reading them from a file you can set them to more meaningful string values.
- While reading from csv file the first row in the file is taken as the column labels
- We can change the column labels, if we want
- Let us practically see this for better understanding

In [7]:
! cat datasets/groupdatawithoutcollables.csv

MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000
MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000
MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500
MS04,Hadeed,20,Lahore,MOR,group A,Male,82,84.3,4000
MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500
MS06,Mohid,16,Lahore,MORNING,group C,Female,69.3,78.6,
MS07,Zobia,40,Sialkot,AFT,group B,Female,90.2,,4000
MS08,Idrees,51,Multan,MORNING,group D,Male,84.1,76,8000
MS09,Jamil,53,Karachi,AFT,group C,Male,90.5,81.3,3500
MS10,Shahid,38,Lahore,AFTERNOON,group D,Male,90.5,81.3,3800
MS11,Khurram,35,Islamabad,MOR,group B,Male,90.5,81.3,6000
MS12,Maaz,25,Karachi,AFTERNOON,group C,Male,90.5,81.3,
MS13,Mujahid,18,Lahore,MORNING,group D,Male,,76.5,7000
MS14,Sara,28,Multan,AFTERNOON,group A,Female,84.1,76,8000
MS15,Fatima,33,Sialkot,AFT,group C,Female,90.5,81.3,3500
MS16,Kakamanna,42,Multan,AFTERNOON,group A,Male,90.5,81.3,3800

### a. While Reading a Dataset in a Dataframe
- Pass a List of column names to `names` argument of `pd.read_csv()` method

In [8]:
import pandas as pd
df = pd.read_csv('datasets/groupdatawithoutcollables.csv', names = ['roll no', 'name', 'age', 'address', 'session', 
                                                                'group', 'gender','subj1', 'subj2', 'scholarship'])

df.head(3)

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0


### b. After Dataframe is Loaded (Use `columns` attribute of dataframe)

In [9]:
df = pd.read_csv('datasets/groupdatawithoutcollables.csv', header = None)
df.head(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0


In [10]:
df.columns = ['roll no', 'name', 'age', 'address', 'session', 'group', 'gender', 'subj1', 'subj2', 'scholarship']
df.head(3)

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0


>- Suppose we have a dataframe in which there are certain column labels having spaces in between the names.
>- We want to rename all such columns by replacing the space character with an underscore
>- One way to do this is call `replace()` method of String class on all the column names of dataframe

In [11]:
df.columns

Index(['roll no', 'name', 'age', 'address', 'session', 'group', 'gender',
       'subj1', 'subj2', 'scholarship'],
      dtype='object')

In [13]:
# df.columns.str.replace(' ', '_')
df.columns.str.replace(' ','_')

Index(['roll_no', 'name', 'age', 'address', 'session', 'group', 'gender',
       'subj1', 'subj2', 'scholarship'],
      dtype='object')

In [14]:
df.columns = df.columns.str.replace(' ', '_')

In [15]:
df.columns

Index(['roll_no', 'name', 'age', 'address', 'session', 'group', 'gender',
       'subj1', 'subj2', 'scholarship'],
      dtype='object')

In [16]:
df.head()

Unnamed: 0,roll_no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
4,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0


>- Suppose we have a dataframe in which there are column labels having names in different cases.
>- We want to rename all such columns such that the names are all lower or all upper case.
>- One way to do this is to generate a new list as per the requirement using List comprehension.

In [17]:
list1 = [x.upper() for x in df.columns]
list1

['ROLL_NO',
 'NAME',
 'AGE',
 'ADDRESS',
 'SESSION',
 'GROUP',
 'GENDER',
 'SUBJ1',
 'SUBJ2',
 'SCHOLARSHIP']

In [18]:
[x.lower() for x in df.columns]

['roll_no',
 'name',
 'age',
 'address',
 'session',
 'group',
 'gender',
 'subj1',
 'subj2',
 'scholarship']

In [19]:
df.columns = list1
df.head(3)

Unnamed: 0,ROLL_NO,NAME,AGE,ADDRESS,SESSION,GROUP,GENDER,SUBJ1,SUBJ2,SCHOLARSHIP
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0


### c. After Dataframe is Loaded (Use `df.rename()` method)
- What if your dataframe has lots and lots of columns having appropriate column names, and you just want to change just one or two column names and not all of them.
- Use `df.rename()` method to modify one or more column names to new one
```
df.rename(mapper, axis=None, inplace=False)
```
- Where,
    - `mapper`: can be a dictionary having comma separated key:value pairs, where, key is the old column name, while the value is the new column name
    - `axis`: If you want to change the column names use axis = 1 (column axis that moves from left to right)
    - `inplace`: If you want this change to occur inplace make this argument True, in which case the method will return None

In [20]:
df = pd.read_csv('datasets/groupdata.csv')
df.head(3)

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0


In [21]:
#Since the inplace argument is by default False, so the rename() method will return a new dataframe
df.rename(mapper={'roll no': 'rollno', 'name':'fname'}, axis=1)

Unnamed: 0,rollno,fname,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
4,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0
5,MS06,Mohid,16,Lahore,MORNING,group C,Female,69.3,78.6,
6,MS07,Zobia,40,Sialkot,AFT,group B,Female,90.2,,4000.0
7,MS08,Idrees,51,Multan,MORNING,group D,Male,84.1,76.0,8000.0
8,MS09,Jamil,53,Karachi,AFT,group C,Male,90.5,81.3,3500.0
9,MS10,Shahid,38,Lahore,AFTERNOON,group D,Male,90.5,81.3,3800.0


In [22]:
df.columns

Index(['roll no', 'name', 'age', 'address', 'session', 'group', 'gender',
       'subj1', 'subj2', 'scholarship'],
      dtype='object')

In [23]:
#Since the inplace argument is now set to True, so the rename() method will return None
#however, the `df` will be changed
df.rename(mapper={ 'roll no': 'rollno'}, axis=1, inplace=True)

In [24]:
df.columns

Index(['rollno', 'name', 'age', 'address', 'session', 'group', 'gender',
       'subj1', 'subj2', 'scholarship'],
      dtype='object')

## 2. Modifying Row Indices of a Dataframe
- Every dataframe has row index associated with every row, normally are integer values from 0,1,2,3...
- After you have sliced a datafreame on a condition or sorted a dataframe, these row indices will be randomized.
- We have seen in detail in our previous session the two methods namely `df.set_index()` and `df.reset_index()`, to handle this issue.

## 3. Modifying Data of a Single Row/Record of a Dataframe

In [25]:
df = pd.read_csv('datasets/groupdata.csv')
df.head(3)

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0


### a.  Grep the row/record you want to modify
Let us suppose we want to change the `subj1` and `subj2` marks of Shaista

In [26]:
# Returns a Series object
df.loc[2,:]

roll no             MS03
name             Shaista
age                   35
address          Karachi
session        AFTERNOON
group            group B
gender            Female
subj1               64.9
subj2               75.1
scholarship       8500.0
Name: 2, dtype: object

In [27]:
# Returns a Dataframe object
df.loc[df.name=='Shaista', :]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0


### b.  Option 1:
- One way is to pass a new list of values and assign it to the appropriate series (row)

In [28]:
# Any of the following two LOC will work
df.loc[2,:] = ['MS03', 'Shaista', 35, 'Karachi', 'AFTERNOON', 'group B', 'Female', 99, 99, 8500.0]
df.loc[df.name=='Shaista', :] = ['MS03', 'Shaista', 35, 'Karachi', 'AFTERNOON', 'group B', 'Female', 99, 99, 8500.0]
df.head(3)

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,99.0,99.0,8500.0


### c.  Option 2:
- A better way is to assign only those two values that we want to change instead of assigning the complete list of values in that row

In [29]:
# Returns a series
df.loc[2, ['subj1', 'subj2']] 

subj1    99.0
subj2    99.0
Name: 2, dtype: object

In [30]:
# Returns a dataframe
df.loc[df.name=='Shaista', ['subj1', 'subj2']]

Unnamed: 0,subj1,subj2
2,99.0,99.0


In [31]:
df.loc[2, ['subj1', 'subj2']] = [100, 100]
df.loc[df.name=='Shaista', ['subj1', 'subj2']] = [100, 100]
df.head(3)

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,100.0,100.0,8500.0


**Note: You can also use `df.iloc[]` method instead of `df.loc[]` to change multiple or single value of a row. Other than these two you may also try using `df.at[]` method to change a single value of a row.**
```
df.loc[filter, 'column(s)'] = 'value(s)'
```

In [35]:
# df.iloc[2,-3:-1] = [80,90]
df.head()

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,80.0,90.0,8500.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
4,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0


## 4. Modify Data of Multiple Rows and 
- Uptill now we have learnt to modify a single, multiple or all the values of a single row in a dataframe.
- What if we want to modify multiple rows at a time?
- The following methods will come for your rescue:
    - `map()`
    - `df.replace()`
    - `df.apply()`
    - `df.applymap()`

In [2]:
# map()

### a. The Python Built-in `map()` Method
- The ```map(aFunction, *iterables)``` function simply returns a map object after applying  `aFunction()` to all the elements of `iterable(s)`. 
- Later you can type cast the map object to appropriate data structure
- The original iterable(s) remains unchanged. 

In [11]:
# list1 = [3,4,5,6,7,78,76,56]
# def check_even_odd(x):
#     if x%2==0:
#         return x**2
#     else:
#         return x
# # tuple(map(check_even_odd, list1chec))
# a = map(check_even_odd, list1)
# for i in a:
#     print(i)

In [12]:
import pandas as pd
df = pd.read_csv('datasets/groupdata.csv')
df.head(3)

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0


**Example:** Using built-in function with `map()`

In [13]:
# Passing a Series object (a column of dataframe) to map() as argument
# The Python built-in `len()` function is applied to all the values of name column and return a map object
map(len, df['name'])

<map at 0x7f61a4eee460>

In [14]:
# Type cast the map object to Series
pd.Series(map(len, df['name']))

0     4
1     4
2     7
3     6
4     4
5     5
6     5
7     6
8     5
9     6
10    7
11    4
12    7
13    4
14    6
15    9
dtype: int64

In [15]:
# Another way is to call the map() method by a Series object using dot notation
df['name'].map(len)

0     4
1     4
2     7
3     6
4     4
5     5
6     5
7     6
8     5
9     6
10    7
11    4
12    7
13    4
14    6
15    9
Name: name, dtype: int64

In [17]:
# Third way is to access the column name as well using dot notation
df['name_len']=df.name.map(len)

In [19]:
# df.head()

**Example:** Using a user-defined function with `map()`

In [20]:
df = pd.read_csv('datasets/groupdata.csv')
df.head(3)

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0


In [21]:
# Let us pass a user-defined function
def myfunc(x):
    if (x <= 50):
        return "Young"
    else:
        return "Old"

df['age'].map(myfunc)

0       Old
1       Old
2     Young
3     Young
4     Young
5     Young
6     Young
7       Old
8       Old
9     Young
10    Young
11    Young
12    Young
13    Young
14    Young
15    Young
Name: age, dtype: object

In [22]:
# If you want to save this as a new column in the dataframe you can do that
df['newcol'] = df['age'].map(myfunc)

In [23]:
df.head()

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship,newcol
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0,Old
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0,Old
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0,Young
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0,Young
4,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0,Young


In [26]:
# import seaborn as sns
# sns.set_style('darkgrid')
# sns.countplot('newcol', data=df)

**Example:** Using a Lambda function with `map()`

In [27]:
df['age'].map(lambda x: "Young" if x<=50 else "Old")

0       Old
1       Old
2     Young
3     Young
4     Young
5     Young
6     Young
7       Old
8       Old
9     Young
10    Young
11    Young
12    Young
13    Young
14    Young
15    Young
Name: age, dtype: object

**Example:** Using a Lambda Function with `map()`

In [28]:
# You cannot pass upper to map() as we have passed len to map() 
# as upper() is not a built-in function rather is a method of string class
#df['name'].map(upper)

In [31]:
df['name'].map(lambda x: x.upper())
# df.name.map(lambda x:x.lower())

0          RAUF
1          ARIF
2       SHAISTA
3        HADEED
4          ZARA
5         MOHID
6         ZOBIA
7        IDREES
8         JAMIL
9        SHAHID
10      KHURRAM
11         MAAZ
12      MUJAHID
13         SARA
14       FATIMA
15    KAKAMANNA
Name: name, dtype: object

**Example:** Passing a Dictionary {oldval:newval} to `map()` for changing selected values of a categorical column

In [32]:
df = pd.read_csv('datasets/groupdata.csv')
df.head()

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
4,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0


In [33]:
df.session.unique()

array(['MORNING', 'AFT', 'AFTERNOON', 'MOR'], dtype=object)

In [34]:
df['session'].map({'MORNING':'M', 'AFTERNOON':'A'})

0       M
1     NaN
2       A
3     NaN
4     NaN
5       M
6     NaN
7       M
8     NaN
9       A
10    NaN
11      A
12      M
13      A
14    NaN
15      A
Name: session, dtype: object

In [35]:
df['session'].map({'MORNING':'M', 'AFTERNOON':'A', 'AFT':'F','MOR':'M'})

0     M
1     F
2     A
3     M
4     F
5     M
6     F
7     M
8     F
9     A
10    M
11    A
12    M
13    A
14    F
15    A
Name: session, dtype: object

>**Limitations of `map()` Method**
>- If there are values for which there is no match, the old values are changed and have become NaN. Solution is use `df.replace()` method
>- You can use it on an iterable or Series object not with entire dataframe. Solution is use `df.apply()` and `df.applymap()`

### b. The `df.replace()` Method
- The `df.replace()` method is used to replace values given in `to_replace` with `value`
- The matching values in the entire dataframe are replaced with new values dynamically.
- This differs from updating with ``.loc`` or ``.iloc``, which require you to specify a location to update with some value.

```
df.replace(to_replace, value, inplace=False)
```

In [36]:
df = pd.read_csv('datasets/groupdata.csv')
df.head()

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
4,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0


In [37]:
df['session'].replace({'MORNING':'M', 'AFTERNOON':'A'})

0       M
1     AFT
2       A
3     MOR
4     AFT
5       M
6     AFT
7       M
8     AFT
9       A
10    MOR
11      A
12      M
13      A
14    AFT
15      A
Name: session, dtype: object

>- Note that now there are no NaN values, rather the values that do not have a match remains as such
>- Another important point is `replace()` method works equally well with dataframe

In [39]:
# Calling replace on entire dataframe
df.replace({'MORNING':'M', 'AFTERNOON':'A', 'group A':'GROUP-A','Idrees':'Ehtisham'})

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,M,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,GROUP-A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,A,group B,Female,64.9,75.1,8500.0
3,MS04,Hadeed,20,Lahore,MOR,GROUP-A,Male,82.0,84.3,4000.0
4,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0
5,MS06,Mohid,16,Lahore,M,group C,Female,69.3,78.6,
6,MS07,Zobia,40,Sialkot,AFT,group B,Female,90.2,,4000.0
7,MS08,Ehtisham,51,Multan,M,group D,Male,84.1,76.0,8000.0
8,MS09,Jamil,53,Karachi,AFT,group C,Male,90.5,81.3,3500.0
9,MS10,Shahid,38,Lahore,A,group D,Male,90.5,81.3,3800.0


In [40]:
# Above operation is not inplace
df

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
4,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0
5,MS06,Mohid,16,Lahore,MORNING,group C,Female,69.3,78.6,
6,MS07,Zobia,40,Sialkot,AFT,group B,Female,90.2,,4000.0
7,MS08,Idrees,51,Multan,MORNING,group D,Male,84.1,76.0,8000.0
8,MS09,Jamil,53,Karachi,AFT,group C,Male,90.5,81.3,3500.0
9,MS10,Shahid,38,Lahore,AFTERNOON,group D,Male,90.5,81.3,3800.0


### c. The `df.apply()` Method
- The `df.apply()` method is used to run a function along the mentioned axis of the dataframe. 
- In simple words, `apply()` method runs a function on all the elements of a series of a dataframe

```
df.apply(func, axis=0, args)
```
- Where,
    - `func`: It can be a built-in, user-defined or a lambda function that is applied to every series of the dataframe as per the axis argument. (Objects passed to the func are series objects)
    - `axis`: The default value of axis argument is zero, so the func is applied to each column. If you want to apply the func to the values of a row, mention axis as one.
    - `args` : If you want to pass additional arguments to `func` in addition to the element of series, you can pass them as a tuple.

In [41]:
import pandas as pd
df = pd.read_csv('datasets/groupdata.csv')
df.head(3)

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0


In [42]:
# Let us pass the built-in function `len()` and compute the length of each name under the name column of df
# So now the len() method is applied to all the values of a single column and return a series object
df['name'].apply(len)

0     4
1     4
2     7
3     6
4     4
5     5
6     5
7     6
8     5
9     6
10    7
11    4
12    7
13    4
14    6
15    9
Name: name, dtype: int64

In [45]:
# Let us pass a user-defined function, with an additional argument as well. This was not possible with map() method
def myfunc(x, age):
    if (x <= age):
        return "Young"
    else:
        return "Old"

df['age'].apply(myfunc, args = (30,))

0       Old
1       Old
2       Old
3     Young
4       Old
5     Young
6       Old
7       Old
8       Old
9       Old
10      Old
11    Young
12    Young
13    Young
14      Old
15      Old
Name: age, dtype: object

In [46]:
# Let us use Lambda function to convert each name under the name column of df to upper case
df['name'].apply(lambda x : x.upper())

0          RAUF
1          ARIF
2       SHAISTA
3        HADEED
4          ZARA
5         MOHID
6         ZOBIA
7        IDREES
8         JAMIL
9        SHAHID
10      KHURRAM
11         MAAZ
12      MUJAHID
13         SARA
14       FATIMA
15    KAKAMANNA
Name: name, dtype: object

In [47]:
def myfunc(x, age):
    if (x <= age):
        return "Young"
    else:
        return "Old"


In [48]:
# If you are satisfied with the result, you may assign it to the specific column
df['name'] = df['name'].apply(lambda x : x.upper())

In [49]:
# Verify
df.head(3)

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,RAUF,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,ARIF,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,SHAISTA,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0


In [50]:
# Can anyone guess what this LOC will do?
df['subj1'] = df['subj1'].apply(lambda x : x+5)

In [51]:
df.head(3)

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,RAUF,52,Lahore,MORNING,group C,Male,83.3,84.4,5000.0
1,MS02,ARIF,51,Islamabad,AFT,group A,Male,75.5,60.5,6000.0
2,MS03,SHAISTA,35,Karachi,AFTERNOON,group B,Female,69.9,75.1,8500.0


>Uptill now we have applied the `df.apply()` method on a specific column of a dataframe. Let us apply it on a row of dataframe

In [52]:
# Since we have different dtypes in each row, so let us create a dataframe hving numeric columns only
df = pd.read_csv('datasets/groupdata.csv')
df.head()

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
4,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0


In [54]:
df_numeric = df.loc[:,['age','subj1','subj2','scholarship']]
df_numeric.head()

Unnamed: 0,age,subj1,subj2,scholarship
0,52,78.3,84.4,5000.0
1,51,70.5,60.5,6000.0
2,35,64.9,75.1,8500.0
3,20,82.0,84.3,4000.0
4,40,65.9,72.8,3500.0


In [55]:
df_string = df.loc[:,['roll no','name','address','session', 'group', 'gender']]
df_string.head()

Unnamed: 0,roll no,name,address,session,group,gender
0,MS01,Rauf,Lahore,MORNING,group C,Male
1,MS02,Arif,Islamabad,AFT,group A,Male
2,MS03,Shaista,Karachi,AFTERNOON,group B,Female
3,MS04,Hadeed,Lahore,MOR,group A,Male
4,MS05,Zara,Peshawer,AFT,group D,Female


In [58]:
df_numeric.head()

Unnamed: 0,age,subj1,subj2,scholarship
0,52,78.3,84.4,5000.0
1,51,70.5,60.5,6000.0
2,35,64.9,75.1,8500.0
3,20,82.0,84.3,4000.0
4,40,65.9,72.8,3500.0


In [59]:
# Although not much meaningful, let us add a number to each value of the row
df_numeric.loc[0:3].apply(lambda x : x+5)

Unnamed: 0,age,subj1,subj2,scholarship
0,57,83.3,89.4,5005.0
1,56,75.5,65.5,6005.0
2,40,69.9,80.1,8505.0
3,25,87.0,89.3,4005.0


In [None]:
# If you want to commit this to the datafream you can do that 

In [60]:
df_numeric.loc[0] = df_numeric.loc[0].apply(lambda x : x+5)

In [61]:
df_numeric.head()

Unnamed: 0,age,subj1,subj2,scholarship
0,57,83.3,89.4,5005.0
1,51,70.5,60.5,6000.0
2,35,64.9,75.1,8500.0
3,20,82.0,84.3,4000.0
4,40,65.9,72.8,3500.0


>Let us use the `df.apply()` method on entire dataframe

In [62]:
df_numeric.apply(lambda x: x+5).head()

Unnamed: 0,age,subj1,subj2,scholarship
0,62,88.3,94.4,5010.0
1,56,75.5,65.5,6005.0
2,40,69.9,80.1,8505.0
3,25,87.0,89.3,4005.0
4,45,70.9,77.8,3505.0


In [63]:
df.apply(min)

roll no             MS01
name                Arif
age                   16
address        Islamabad
session              AFT
group            group A
gender            Female
subj1               64.9
subj2               60.5
scholarship       3500.0
dtype: object

In [64]:
min(df['subj1'])

64.9

The `min()` function has been applied on each column of the dataframe and for each column the minimum value has been computed and the `df.apply()` method has returned a Series object

### b. The `df.applymap()` Method
- The `df.map()` method applies a function to datafreame element wise.

```
df.applymap(func, axis=0)
```
- Where,
    - `func`: A function that is passed a single value and returns a single value.
    
Note: A Series object do not have a `applymap()` method, so you cannot call it with a Series object

In [65]:
df = pd.read_csv('datasets/groupdata.csv')
df_string = df.loc[:,['roll no','name','address','session', 'group', 'gender']]
df_numeric = df.loc[:,['age','subj1','subj2','scholarship']]

In [66]:
df_string.head()

Unnamed: 0,roll no,name,address,session,group,gender
0,MS01,Rauf,Lahore,MORNING,group C,Male
1,MS02,Arif,Islamabad,AFT,group A,Male
2,MS03,Shaista,Karachi,AFTERNOON,group B,Female
3,MS04,Hadeed,Lahore,MOR,group A,Male
4,MS05,Zara,Peshawer,AFT,group D,Female


In [67]:
df_numeric.head()

Unnamed: 0,age,subj1,subj2,scholarship
0,52,78.3,84.4,5000.0
1,51,70.5,60.5,6000.0
2,35,64.9,75.1,8500.0
3,20,82.0,84.3,4000.0
4,40,65.9,72.8,3500.0


In [68]:
df_string.head()

Unnamed: 0,roll no,name,address,session,group,gender
0,MS01,Rauf,Lahore,MORNING,group C,Male
1,MS02,Arif,Islamabad,AFT,group A,Male
2,MS03,Shaista,Karachi,AFTERNOON,group B,Female
3,MS04,Hadeed,Lahore,MOR,group A,Male
4,MS05,Zara,Peshawer,AFT,group D,Female


In [69]:
df_string.applymap(str.upper).head()

Unnamed: 0,roll no,name,address,session,group,gender
0,MS01,RAUF,LAHORE,MORNING,GROUP C,MALE
1,MS02,ARIF,ISLAMABAD,AFT,GROUP A,MALE
2,MS03,SHAISTA,KARACHI,AFTERNOON,GROUP B,FEMALE
3,MS04,HADEED,LAHORE,MOR,GROUP A,MALE
4,MS05,ZARA,PESHAWER,AFT,GROUP D,FEMALE


In [70]:
df_numeric.head(5)

Unnamed: 0,age,subj1,subj2,scholarship
0,52,78.3,84.4,5000.0
1,51,70.5,60.5,6000.0
2,35,64.9,75.1,8500.0
3,20,82.0,84.3,4000.0
4,40,65.9,72.8,3500.0


In [71]:
# The applymap() method will apply the len function on each element of dataframe 
df_numeric.applymap(lambda x : x+5).head(5)

Unnamed: 0,age,subj1,subj2,scholarship
0,57,83.3,89.4,5005.0
1,56,75.5,65.5,6005.0
2,40,69.9,80.1,8505.0
3,25,87.0,89.3,4005.0
4,45,70.9,77.8,3505.0


## Practice Exercise no 01
#### Student Alcohol Consumption
#### Step 1. Import the necessary libraries

In [72]:
import pandas as pd
import numpy as np

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/bsef19m521/DatasetsForProjects/master/student-mat.csv).

### Step 3. Assign it to a variable called df.

In [73]:
df = pd.read_csv('datasets/student-mat.csv')
df.head(2)

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,...,4,3,4,1,1,3,6,5,6,6
1,GP,F,17,U,GT3,T,1,1,at_home,other,...,5,3,3,1,1,3,4,5,5,6


### Step 4. For the purpose of this exercise slice the dataframe from 'school' until the 'guardian' column

In [75]:
df = df.loc[:,'school':'guardian']
df.head()

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,course,mother
1,GP,F,17,U,GT3,T,1,1,at_home,other,course,father
2,GP,F,15,U,LE3,T,1,1,at_home,other,other,mother
3,GP,F,15,U,GT3,T,4,2,health,services,home,mother
4,GP,F,16,U,GT3,T,3,3,other,other,home,father


In [None]:
# df = df.loc[:,'school':'guardian']
# df.head(2)

### Step 5. Create a lambda function that will capitalize strings.

In [77]:
# capitalize = lambda x :x.capitalize()
cap = lambda x:x.capitalize()
cap

<function __main__.<lambda>(x)>

### Step 6. Capitalize both Mjob and Fjob

In [79]:
# df.Mjob.apply(cap)

In [81]:
df.Mjob = df.Mjob.apply(cap)
df.Fjob = df.Fjob.apply(cap)

In [82]:
df.head(2)

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian
0,GP,F,18,U,GT3,A,4,4,At_home,Teacher,course,mother
1,GP,F,17,U,GT3,T,1,1,At_home,Other,course,father


### Step 7. Print the last five elements/rows of the data set.

In [83]:
df.tail()

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian
390,MS,M,20,U,LE3,A,2,2,Services,Services,course,other
391,MS,M,17,U,LE3,T,3,1,Services,Services,course,mother
392,MS,M,21,R,GT3,T,1,1,Other,Other,course,other
393,MS,M,18,R,LE3,T,3,2,Services,Other,course,mother
394,MS,M,19,U,LE3,T,1,1,Other,At_home,course,father


### Step 8. Create a function called majority that returns a boolean value to a new column called legal_drinker (Consider majority as older than 17 years old)

In [85]:
def majority(x):
    if x>17:
        return True
    else:
        return False
    
df['legal_drinker']=df.age.apply(majority)
df.head()

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian,legal_drinker
0,GP,F,18,U,GT3,A,4,4,At_home,Teacher,course,mother,True
1,GP,F,17,U,GT3,T,1,1,At_home,Other,course,father,False
2,GP,F,15,U,LE3,T,1,1,At_home,Other,other,mother,False
3,GP,F,15,U,GT3,T,4,2,Health,Services,home,mother,False
4,GP,F,16,U,GT3,T,3,3,Other,Other,home,father,False


In [87]:
df.legal_drinker.value_counts()

False    284
True     111
Name: legal_drinker, dtype: int64

In [None]:
# def majority(x):
#     if x>17:
#         return True
#     else:
#         return False

In [None]:
# df['legal_drinker'] = df.age.apply(majority)
# df.head(4)

In [89]:
# other method
# df.age.apply(lambda x : True if x>17 else False)

### Step 9. Multiply every number of the dataset by 10.
I know this makes no sense, don't forget it is just an exercise

In [96]:
# df.applymap(lambda x:  x*10 if type(x)is int)

In [97]:
def multiply(x):
    if type(x) is int:
        return x*10
    return x


In [98]:
df = df.applymap(multiply)
df.head()

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian,legal_drinker
0,GP,F,180,U,GT3,A,40,40,At_home,Teacher,course,mother,True
1,GP,F,170,U,GT3,T,10,10,At_home,Other,course,father,False
2,GP,F,150,U,LE3,T,10,10,At_home,Other,other,mother,False
3,GP,F,150,U,GT3,T,40,20,Health,Services,home,mother,False
4,GP,F,160,U,GT3,T,30,30,Other,Other,home,father,False


## Practice Exercise no 02
#### United States - Crime Rates - 1960 - 2014
### Step 1. Import the necessary libraries

In [None]:
# import pandas as pd
# import numpy as np

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/bsef19m521/DatasetsForProjects/master/US_Crime_Rates_1960_2014.csv).

### Step 3. Assign it to a variable called crime.

In [99]:
crime = pd.read_csv('datasets/US_Crime_Rates_1960_2014.csv')
crime.head(2)

Unnamed: 0,Year,Population,Total,Violent,Property,Murder,Forcible_Rape,Robbery,Aggravated_assault,Burglary,Larceny_Theft,Vehicle_Theft
0,1960,179323175,3384200,288460,3095700,9110,17190,107840,154320,912100,1855400,328200
1,1961,182992000,3488000,289390,3198600,8740,17220,106670,156760,949600,1913000,336000


In [100]:
crime.shape

(55, 12)

### Step 4. What is the type of the columns?

In [101]:
crime.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype
---  ------              --------------  -----
 0   Year                55 non-null     int64
 1   Population          55 non-null     int64
 2   Total               55 non-null     int64
 3   Violent             55 non-null     int64
 4   Property            55 non-null     int64
 5   Murder              55 non-null     int64
 6   Forcible_Rape       55 non-null     int64
 7   Robbery             55 non-null     int64
 8   Aggravated_assault  55 non-null     int64
 9   Burglary            55 non-null     int64
 10  Larceny_Theft       55 non-null     int64
 11  Vehicle_Theft       55 non-null     int64
dtypes: int64(12)
memory usage: 5.3 KB


##### Have you noticed that the type of Year is int64. But pandas has a different type to work with Time Series. Let's see it now.

### Step 5. Convert the type of the column Year to datetime64

In [102]:
crime.Year = pd.to_datetime(crime.Year, format="%Y")
crime.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   Year                55 non-null     datetime64[ns]
 1   Population          55 non-null     int64         
 2   Total               55 non-null     int64         
 3   Violent             55 non-null     int64         
 4   Property            55 non-null     int64         
 5   Murder              55 non-null     int64         
 6   Forcible_Rape       55 non-null     int64         
 7   Robbery             55 non-null     int64         
 8   Aggravated_assault  55 non-null     int64         
 9   Burglary            55 non-null     int64         
 10  Larceny_Theft       55 non-null     int64         
 11  Vehicle_Theft       55 non-null     int64         
dtypes: datetime64[ns](1), int64(11)
memory usage: 5.3 KB


### Step 6. Set the Year column as the index of the dataframe

In [103]:
crime = crime.set_index('Year')
crime.head()

Unnamed: 0_level_0,Population,Total,Violent,Property,Murder,Forcible_Rape,Robbery,Aggravated_assault,Burglary,Larceny_Theft,Vehicle_Theft
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1960-01-01,179323175,3384200,288460,3095700,9110,17190,107840,154320,912100,1855400,328200
1961-01-01,182992000,3488000,289390,3198600,8740,17220,106670,156760,949600,1913000,336000
1962-01-01,185771000,3752200,301510,3450700,8530,17550,110860,164570,994300,2089600,366800
1963-01-01,188483000,4109500,316970,3792500,8640,17650,116470,174210,1086400,2297800,408300
1964-01-01,191141000,4564600,364220,4200400,9360,21420,130390,203050,1213200,2514400,472800


### Step 7. Delete the Total column

In [104]:
del crime['Total']

### Step 8. Group the year by decades and sum the values
#### Pay attention to the Population column number, summing this column is a mistake
 To learn more about [.resample](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html)
 
 
To learn more about [Offset Aliases](http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases)


In [105]:
# Uses resample to sum each decade
crimes = crime.resample('10AS').sum()
crimes

Unnamed: 0_level_0,Population,Violent,Property,Murder,Forcible_Rape,Robbery,Aggravated_assault,Burglary,Larceny_Theft,Vehicle_Theft
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1960-01-01,1915053175,4134930,45160900,106180,236720,1633510,2158520,13321100,26547700,5292100
1970-01-01,2121193298,9607930,91383800,192230,554570,4159020,4702120,28486000,53157800,9739900
1980-01-01,2371370069,14074328,117048900,206439,865639,5383109,7619130,33073494,72040253,11935411
1990-01-01,2612825258,17527048,119053499,211664,998827,5748930,10568963,26750015,77679366,14624418
2000-01-01,2947969117,13968056,100944369,163068,922499,4230366,8652124,21565176,67970291,11412834
2010-01-01,1570146307,6072017,44095950,72867,421059,1749809,3764142,10125170,30401698,3569080


In [106]:
# Uses resample to get the max value only for the "Population" column
population = crime['Population'].resample('10AS').max()
population

Year
1960-01-01    201385000
1970-01-01    220099000
1980-01-01    248239000
1990-01-01    272690813
2000-01-01    307006550
2010-01-01    318857056
Freq: 10AS-JAN, Name: Population, dtype: int64

In [107]:
# Updating the "Population" column
crimes['Population'] = population
crimes

Unnamed: 0_level_0,Population,Violent,Property,Murder,Forcible_Rape,Robbery,Aggravated_assault,Burglary,Larceny_Theft,Vehicle_Theft
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1960-01-01,201385000,4134930,45160900,106180,236720,1633510,2158520,13321100,26547700,5292100
1970-01-01,220099000,9607930,91383800,192230,554570,4159020,4702120,28486000,53157800,9739900
1980-01-01,248239000,14074328,117048900,206439,865639,5383109,7619130,33073494,72040253,11935411
1990-01-01,272690813,17527048,119053499,211664,998827,5748930,10568963,26750015,77679366,14624418
2000-01-01,307006550,13968056,100944369,163068,922499,4230366,8652124,21565176,67970291,11412834
2010-01-01,318857056,6072017,44095950,72867,421059,1749809,3764142,10125170,30401698,3569080


### Step 9. What is the most dangerous decade to live in the US?

In [108]:
# Uses resample to get the max value only for the "Population" column
population = crime['Murder'].resample('10AS').max()
population

Year
1960-01-01    14760
1970-01-01    21460
1980-01-01    23040
1990-01-01    24700
2000-01-01    17030
2010-01-01    14866
Freq: 10AS-JAN, Name: Murder, dtype: int64

In [None]:
crime.idxmax(0)

## Check Your Concepts:
- What is Pandas?
- Create a pandas column using for loop 
- How to get column names in Pandas dataframe 
- How to rename columns in Pandas DataFrame 
- Collapse multiple Columns in Pandas 
- Get unique values from a column in Pandas DataFrame 
- Conditional operation on Pandas DataFrame columns 
- Return the Index label if some condition is satisfied over a column in Pandas Dataframe 
- Using dictionary to remap values in Pandas DataFrame columns 
- Formatting integer column of Dataframe in Pandas 
- Create a new column in Pandas DataFrame based on the existing columns 
- Python | Creating a Pandas dataframe column based on a given condition 
- Split a column in Pandas dataframe and get part of it 
- Getting Unique values from a column in Pandas dataframe 
- Split a String into columns using regex in pandas DataFrame 
- Getting frequency counts of a columns in Pandas DataFrame 
- Change Data Type for one or more columns in Pandas Dataframe 
- Split a text column into two columns in Pandas DataFrame  
- Difference of two columns in Pandas dataframe 
- Get the index of maximum value in DataFrame column 
- Get the index of minimum value in DataFrame column 
- Get n-largest values from a particular column in Pandas DataFrame 
- Get n-smallest values from a particular column in Pandas DataFrame 
- How to drop one or multiple columns in Pandas Dataframe 
- How to lowercase column names in Pandas dataframe 
- Capitalize first letter of a column in Pandas dataframe 
- Apply uppercase to a column in Pandas dataframe 


# Pandas - Assignment no 06
- Here is link of [Pandas - Assignment no 06]()