# From 1 to 100 Pandas Exercises for Data Analysis (Python)

This is the first serie of pandas exercises collected from different sources.
>**_The goal:_** practicing, learning and also teaching using this set of exercises.

If you find an error or a better solution, please feel free to open an issue or a pull request :D

_Sources: [here](sources.json)_

** 1.- How to import pandas and check the version? **

In [1]:
import pandas as pd
pd.__version__

# Alternatively use
# all pandas dependencies version
# print(pd.show_versions(as_json=True))

'0.22.0'

** 2.- How to create a series from a list, numpy array and dict? **

In [2]:
import numpy as np

# input 
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))

# output
serieList = pd.Series(mylist)
print("Series from numpy list: \n ", serieList.head(5))
serieArray = pd.Series(myarr)
print("Series from numpy array: \n ", serieArray.head(5))
serieDict = pd.Series(mydict)
print("Series from numpy dict: \n ", serieDict.head(5))

Series from numpy list: 
  0    a
1    b
2    c
3    e
4    d
dtype: object
Series from numpy array: 
  0    0
1    1
2    2
3    3
4    4
dtype: int64
Series from numpy dict: 
  a    0
b    1
c    2
d    4
e    3
dtype: int64


** 3.- How to convert the index of a series into a column of a dataframe? **
   - Convert the series ser into a dataframe with its index as another column on the dataframe.

In [25]:
# input
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))
ser = pd.Series(mydict)

# output

serDF = ser.to_frame().reset_index()
serDF.head()


Unnamed: 0,index,0
0,a,0
1,b,1
2,c,2
3,d,4
4,e,3


** 4.- How to combine many series to form a dataframe? **
   - Combine ser1 and ser2 to form a dataframe.



In [19]:
# input 
import numpy as np
ser1 = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
ser2 = pd.Series(np.arange(26))

# output
combinedSerieDF = pd.concat([ser1, ser2], axis=1)
combinedSerieDF.head()

# Alternatively use
# combinedSerieDF = pd.DataFrame({'column 1': ser1, 'column 2': ser2})
# combinedSerieDF.head()

Unnamed: 0,column 1,column 2
0,a,0
1,b,1
2,c,2
3,e,3
4,d,4


** 5.- How to assign name to the series’ index? **
   - Give a name to the series ser calling it ‘alphabets’.

In [26]:
# input
ser = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
# output
ser.name = "alphabets"
ser.head()

# alternatively use 
# ser.rename("alphabets") 
# ser.head()

0    a
1    b
2    c
3    e
4    d
Name: alphabets, dtype: object

** 6.- How to get the items of series A not present in series B? **
   - From ser1 remove items present in ser2.

In [31]:
# input 
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

# output
ser1[~(ser1.isin(ser2))]

0    1
1    2
2    3
dtype: int64

** 7.- How to get the items not common to both series A and series B? **
   - Get all items of ser1 and ser2 not common to both.

In [37]:
# input 
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

# output
serieU = pd.Series(np.union1d(ser1, ser2)) # get sorted elements that are either of the two arrays 
serieI = pd.Series(np.intersect1d(ser1,ser2)) # get unique elements that are in both arrays
serieU[~serieU.isin(serieI)]


0    1
1    2
2    3
5    6
6    7
7    8
dtype: int64

** 8.- How to get the minimum, 25th percentile, median, 75th, and max of a numeric series? **
   - Compute the minimum, 25th percentile, median, 75th, and maximum of ser.

In [55]:
# input
np.random.seed(123)
ser = pd.Series(np.random.normal(10, 5, 25))

# output
print("Min: %f, 25th percentile: %f, median: %f, 75th percentile: %f, Max: %f, " % ((ser.min(), ser.quantile(.25), \
ser.median(), ser.quantile(.75), ser.max())))

# Alternatively use
# np.percentile(ser, q=[0, 25, 50, 75, 100])


Min: -2.133396, 25th percentile: 6.605569, median: 9.526455, 75th percentile: 15.879145, Max: 21.029650, 


** 9.- How to get frequency counts of unique items of a series? **
   - Calculte the frequency counts of each unique value ser.

In [69]:
# input 
np.random.seed(123)
ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=30)))

# output
ser.value_counts()

g    7
b    6
a    5
c    3
d    3
e    2
f    2
h    2
dtype: int64

** 10.- How to keep only top 2 most frequent values as it is and replace everything else as ‘Other’? **
   - From ser, keep the top 2 most frequent items as it is and replace everything else as ‘Other’.

In [82]:
# intput data
np.random.seed(123)
ser = pd.Series(np.random.randint(1, 5, [12]))
print("Input data:\n", ser)
print(" More frequent values:\n", ser.value_counts())

# output
ser[~ser.isin(ser.value_counts().index[:2])] = 'Other'
ser

Input data:
 0     3
1     2
2     3
3     3
4     1
5     3
6     3
7     2
8     4
9     3
10    4
11    2
dtype: int64
 More frequent values:
 3    6
2    3
4    2
1    1
dtype: int64


0         3
1         2
2         3
3         3
4     Other
5         3
6         3
7         2
8     Other
9         3
10    Other
11        2
dtype: object

** 11.- How to bin a numeric series to 10 groups of equal size? **
   - Bin the series ser into 10 equal deciles and replace the values with the bin name.

In [85]:
# input
np.random.seed(123)
ser = pd.Series(np.random.random(20))
print("Input data: \n", ser)
pd.qcut(ser, q=[0, .10, .20, .3, .4, .5, .6, .7, .8, .9, 1], 
        labels=['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th']).head()

Input data: 
 0     0.696469
1     0.286139
2     0.226851
3     0.551315
4     0.719469
5     0.423106
6     0.980764
7     0.684830
8     0.480932
9     0.392118
10    0.343178
11    0.729050
12    0.438572
13    0.059678
14    0.398044
15    0.737995
16    0.182492
17    0.175452
18    0.531551
19    0.531828
dtype: float64


0    8th
1    3rd
2    2nd
3    7th
4    9th
dtype: category
Categories (10, object): [1st < 2nd < 3rd < 4th ... 7th < 8th < 9th < 10th]

** 12.- How to convert a numpy array to a dataframe of given shape? (L1) **
   - Reshape the series ser into a dataframe with 7 rows and 5 columns

In [96]:
# input
np.random.seed(123)
ser = pd.Series(np.random.randint(1, 10, 35))

#output
pd.DataFrame(ser.values.reshape(7,5))

Unnamed: 0,0,1,2,3,4
0,3,3,7,2,4
1,7,2,1,2,1
2,1,4,5,1,1
3,5,2,8,4,3
4,5,8,3,5,9
5,1,8,4,5,7
6,2,6,7,3,2


** 13.- How to find the positions of numbers that are multiples of 3 from a series? ** 
   - Find the positions of numbers that are multiples of 3 from ser.



In [111]:
# input
np.random.seed(123)
ser = pd.Series(np.random.randint(1, 10, 7))
print("Input data: \n", ser)

# output
ser[ser%3==0]

# Alternatively use
# np.where(ser%3==0)[0]

Input data: 
 0    3
1    3
2    7
3    2
4    4
5    7
6    2
dtype: int64


0    3
1    3
dtype: int64

** 14.- How to extract items at given positions from a series **
   - From ser, extract the items at positions in list pos.

In [120]:
# input
ser = pd.Series(list('abcdefghijklmnopqrstuvwxyz'))
pos = [0, 4, 8, 14, 20]

# output
ser[pos]

# Alternatively use 
# ser.take(pos)

0     a
4     e
8     i
14    o
20    u
dtype: object

** 15.- How to stack two series vertically and horizontally ? **
   - Stack ser1 and ser2 vertically and horizontally (to form a dataframe).

In [160]:
# input
ser1 = pd.Series(range(5))
ser2 = pd.Series(list('abcde'))

# output
v = pd.concat((ser1,ser2), axis=0)
v.to_frame()

h = pd.concat((ser1,ser2), axis=1)
h


Unnamed: 0,0,1
0,0,a
1,1,b
2,2,c
3,3,d
4,4,e


** 16.- How to get the positions of items of series A in another series B? **
   - Get the positions of items of ser2 in ser1 as a list.

In [177]:
# input
ser1 = pd.Series([10, 9, 6, 5, 3, 1, 12, 8, 13])
ser2 = pd.Series([1, 3, 10, 13])

# output
[pd.Index(ser1).get_loc(i) for i in ser2]

[5, 4, 0, 8]

** 17.- How to compute the mean squared error on a truth and predicted series? **
   - Compute the mean squared error of truth and pred series.

In [179]:
# input
np.random.seed(123)
truth = pd.Series(range(10))
pred = pd.Series(range(10)) + np.random.random(10)

# output
np.mean((truth-pred)**2)

0.3434951102792991

** 18.- How to convert the first character of each element in a series to uppercase? **
   - Change the first character of each word to upper case in each word of ser.

In [182]:
# input
ser = pd.Series(['how', 'to', 'kick', 'ass?'])

# output
ser.str.capitalize()

# Alternatively use
# pd.Series([i.title() for i in ser])

0     How
1      To
2    Kick
3    Ass?
dtype: object

** 19.- How to calculate the number of characters in each word in a series? **

In [199]:
# input
ser = pd.Series(['how', 'to', 'kick', 'ass?'])

# output
[len(i) for i in ser]

# alternatively use
# ser.map(lambda x: len(x))

[3, 2, 4, 4]

** 20.- How to compute difference of differences between consequtive numbers of a series? **
   - Difference of differences between the consequtive numbers of ser.

In [215]:
# input
ser = pd.Series([1, 3, 6, 10, 15, 21, 27, 35])
ser

# output
# np.diff(ser).tolist() # this does not compute the first number (position 0 - nan)
print(ser.diff().tolist())
print(ser.diff().diff().tolist())

[nan, 2.0, 3.0, 4.0, 5.0, 6.0, 6.0, 8.0]
[nan, nan, 1.0, 1.0, 1.0, 1.0, 0.0, 2.0]
