# 101 Pandas Exercises for Data Analysis

## 1. How to import pandas and check the version?

In [1]:
import numpy as np  # optional
import pandas as pd
print(pd.__version__)
print(pd.show_versions(as_json=True))

0.23.4
{'system': {'commit': None, 'python': '3.7.0.final.0', 'python-bits': 64, 'OS': 'Windows', 'OS-release': '7', 'machine': 'AMD64', 'processor': 'Intel64 Family 6 Model 30 Stepping 5, GenuineIntel', 'byteorder': 'little', 'LC_ALL': 'None', 'LANG': 'None', 'LOCALE': 'None.None'}, 'dependencies': {'pandas': '0.23.4', 'pytest': '3.8.0', 'pip': '10.0.1', 'setuptools': '40.2.0', 'Cython': '0.28.5', 'numpy': '1.15.1', 'scipy': '1.1.0', 'pyarrow': None, 'xarray': None, 'IPython': '6.5.0', 'sphinx': '1.7.9', 'patsy': '0.5.0', 'dateutil': '2.7.3', 'pytz': '2018.5', 'blosc': None, 'bottleneck': '1.2.1', 'tables': '3.4.4', 'numexpr': '2.6.8', 'feather': None, 'matplotlib': '2.2.3', 'openpyxl': '2.5.6', 'xlrd': '1.1.0', 'xlwt': '1.3.0', 'xlsxwriter': '1.1.0', 'lxml': '4.2.5', 'bs4': '4.6.3', 'html5lib': '1.0.1', 'sqlalchemy': '1.2.11', 'pymysql': None, 'psycopg2': None, 'jinja2': '2.10', 's3fs': None, 'fastparquet': None, 'pandas_gbq': None, 'pandas_datareader': None}}
None


## 2. How to create a series from a list, numpy array and dict?

In [8]:
# Inputs
import numpy as np
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))

# Solution
ser1 = pd.Series(mylist)
ser2 = pd.Series(myarr)
ser3 = pd.Series(mydict)
print(ser3.head())

a    0
b    1
c    2
e    3
d    4
dtype: int64


## 3. How to convert the index of a series into a column of a dataframe?

Convert the series **_ser_** into a dataframe with its index as another column on the dataframe.

In [10]:
# Input
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))
ser = pd.Series(mydict)

# Solution
df = ser.to_frame().reset_index()
print(ser.head())
print(df.head())

a    0
b    1
c    2
e    3
d    4
dtype: int64
  index  0
0     a  0
1     b  1
2     c  2
3     e  3
4     d  4


## 4. How to combine many series to form a dataframe?

Combine **_ser1_** and **_ser2_** to form a dataframe.

In [14]:
# Input
import numpy as np
ser1 = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
ser2 = pd.Series(np.arange(26))

# Solution 1
df = pd.concat([ser1, ser2], axis=1)
print(df.head())

# Solution 2
df = pd.DataFrame({'col1': ser1, 'col2': ser2})
print(df.head())

   0  1
0  a  0
1  b  1
2  c  2
3  e  3
4  d  4
  col1  col2
0    a     0
1    b     1
2    c     2
3    e     3
4    d     4


## 5. How to assign name to the series’ index?
Give a name to the series **_ser_** calling it ‘alphabets’.

In [16]:
# Input
ser = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))

# Solution
ser.name = 'alphabets'
ser.head()

0    a
1    b
2    c
3    e
4    d
Name: alphabets, dtype: object

## 6. How to get the items of series A not present in series B?
From **_ser1_** remove items present in **_ser2_**.

In [17]:
# Input
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

# Solution
ser1[~ser1.isin(ser2)]

0    1
1    2
2    3
dtype: int64

## 7. How to get the items not common to both series A and series B?
Get all items of **_ser1_** and **_ser2_** not common to both.

In [22]:
# Input
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

# Solution
ser_u = pd.Series(np.union1d(ser1, ser2))  # union
ser_u

0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
dtype: int64

In [23]:
ser_i = pd.Series(np.intersect1d(ser1, ser2))  # intersect
ser_i

0    4
1    5
dtype: int64

In [21]:
ser_u[~ser_u.isin(ser_i)] # delete common items

0    1
1    2
2    3
5    6
6    7
7    8
dtype: int64

## 8. How to get the minimum, 25th percentile, median, 75th, and max of a numeric series?
Compute the minimum, 25th percentile, median, 75th, and maximum of **_ser_**.

In [24]:
# Input
state = np.random.RandomState(100)
ser = pd.Series(state.normal(10, 5, 25))

# Solution
np.percentile(ser, q=[0, 25, 50, 75, 100])

array([ 1.25117263,  7.70986507, 10.92259345, 13.36360403, 18.0949083 ])

## 9. How to get frequency counts of unique items of a series?
Calculte the frequency counts of each unique value **_ser_**.

In [32]:
# Input
ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=30)))

# Solution
ser.value_counts()

g    5
d    5
h    4
f    4
b    3
c    3
e    3
a    3
dtype: int64

## 10. How to keep only top 2 most frequent values as it is and replace everything else as ‘Other’?
From **_ser_**, keep the top 2 most frequent items as it is and replace everything else as ‘Other’.

In [40]:
# Input
np.random.RandomState(100)
ser = pd.Series(np.random.randint(1, 5, [12]))
print(ser)

# Solution
print("Top 2 Freq:", ser.value_counts())
ser[~ser.isin(ser.value_counts().index[:2])] = 'Other'
ser

0     1
1     3
2     2
3     4
4     3
5     4
6     4
7     4
8     1
9     4
10    4
11    4
dtype: int32
Top 2 Freq: 4    7
3    2
1    2
2    1
dtype: int64


0     Other
1         3
2     Other
3         4
4         3
5         4
6         4
7         4
8     Other
9         4
10        4
11        4
dtype: object