# 101 Pandas exercises for data analysis

This notebook is originated from [101 Pandas Exercises for Data Analysis](https://www.machinelearningplus.com/python/101-pandas-exercises-python/).

## 1. How to import pandas and check the version?

In [1]:
import pandas as pd
pd.__version__

'0.24.2'

In [2]:
# just for possible usage on numpy
import numpy as np

## 2. How to create a series from a list, numpy array and dict?

Create a pandas series from each of the items below: a list, numpy and a dictionary.

In [3]:
# input
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))

In [4]:
# from a list
listSeries = pd.Series(mylist)
listSeries.head()

0    a
1    b
2    c
3    e
4    d
dtype: object

In [5]:
# from a numpy array
arraySeries = pd.Series(myarr)
arraySeries.head()

0    0
1    1
2    2
3    3
4    4
dtype: int64

In [6]:
# from a dict
dictSeries = pd.Series(mydict)
dictSeries.head()

a    0
b    1
c    2
e    3
d    4
dtype: int64

## 3. How to convert the index of a series into a column of a dataframe?

Convert the series ser into a dataframe with its index as another column on the dataframe.

In [7]:
# input
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))
ser = pd.Series(mydict)

In [8]:
# my solution
df = pd.DataFrame({'col1': ser.index, 'col2': ser.values})
df.head()

Unnamed: 0,col1,col2
0,a,0
1,b,1
2,c,2
3,e,3
4,d,4


In [9]:
# page solution
df = ser.to_frame().reset_index()
df.head()

Unnamed: 0,index,0
0,a,0
1,b,1
2,c,2
3,e,3
4,d,4


## 4. How to combine many series to form a dataframe?

Combine ser1 and ser2 to form a dataframe.

In [10]:
# input
ser1 = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
ser2 = pd.Series(np.arange(26))

In [11]:
# my solution
df = pd.DataFrame({'col1': ser1, 'col2': ser2})
df.head()

Unnamed: 0,col1,col2
0,a,0
1,b,1
2,c,2
3,e,3
4,d,4


In [12]:
# another page solution
df = pd.concat((ser1, ser2), axis=1)
df.head()

Unnamed: 0,0,1
0,a,0
1,b,1
2,c,2
3,e,3
4,d,4


## 5. How to assign name to the series’ index?

Give a name to the series ser calling it ‘alphabets’.

In [13]:
# input
ser = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))

In [14]:
# my solution is the same as page solution
ser.name = 'alphabets'
ser.head()

0    a
1    b
2    c
3    e
4    d
Name: alphabets, dtype: object

## 6. How to get the items of series A not present in series B?

From ser1 remove items present in ser2.

In [15]:
# input
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

In [16]:
# my solution is the same as page solution
ser1[~ser1.isin(ser2)]

0    1
1    2
2    3
dtype: int64

## 7. How to get the items not common to both series A and series B?

Get all items of ser1 and ser2 not common to both.

In [17]:
# input
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

In [18]:
# my solution
ser1[~ser1.isin(ser2)].append(ser2[~ser2.isin(ser1)])

0    1
1    2
2    3
2    6
3    7
4    8
dtype: int64

In [19]:
# page solution
ser_u = pd.Series(np.union1d(ser1, ser2))  # union
ser_i = pd.Series(np.intersect1d(ser1, ser2))  # intersect
ser_u[~ser_u.isin(ser_i)]

0    1
1    2
2    3
5    6
6    7
7    8
dtype: int64

## 8. How to get the minimum, 25th percentile, median, 75th, and max of a numeric series?

Compute the minimum, 25th percentile, median, 75th, and maximum of ser.

In [20]:
# input
ser = pd.Series(np.random.normal(10, 5, 25))

In [21]:
# my solution is the same as page solution
np.percentile(ser, [0, 25, 50, 75, 100])

array([ 1.6447494 ,  6.47471289, 10.1381234 , 14.68227108, 21.96743254])

## 9. How to get frequency counts of unique items of a series?

Calculte the frequency counts of each unique value ser.

In [22]:
# input
ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=30)))

In [23]:
# page solution
ser.value_counts()

e    6
d    6
b    4
a    4
c    4
f    3
h    2
g    1
dtype: int64

## 10. How to keep only top 2 most frequent values as it is and replace everything else as ‘Other’?

From ser, keep the top 2 most frequent items as it is and replace everything else as ‘Other’.

In [24]:
# input
np.random.RandomState(100)
ser = pd.Series(np.random.randint(1, 5, [12]))

In [25]:
# my solution
ser.replace(ser.value_counts()[2:].index, 'Other')

0         2
1         2
2     Other
3     Other
4         1
5         2
6     Other
7         1
8     Other
9         2
10        2
11        1
dtype: object

In [26]:
# page solution
ser[~ser.isin(ser.value_counts().index[:2])] = 'Other'
ser

0         2
1         2
2     Other
3     Other
4         1
5         2
6     Other
7         1
8     Other
9         2
10        2
11        1
dtype: object

## 11. How to bin a numeric series to 10 groups of equal size?

Bin the series ser into 10 equal deciles and replace the values with the bin name.

In [27]:
# input
ser = pd.Series(np.random.random(20))

In [28]:
# page solution
pd.qcut(ser, q=[0, .10, .20, .3, .4, .5, .6, .7, .8, .9, 1], 
        labels=['1st', '2nd', '3rd', '4th', '5th', '6th', 
                '7th', '8th', '9th', '10th']).head()

0    5th
1    9th
2    7th
3    4th
4    9th
dtype: category
Categories (10, object): [1st < 2nd < 3rd < 4th ... 7th < 8th < 9th < 10th]

## 12. How to convert a numpy array to a dataframe of given shape? 

Reshape the series ser into a dataframe with 7 rows and 5 columns.

In [29]:
# input
ser = pd.Series(np.random.randint(1, 10, 35))

In [30]:
# my solution is the same as page solution
pd.DataFrame(ser.values.reshape(7, 5))

Unnamed: 0,0,1,2,3,4
0,9,2,5,3,2
1,9,8,3,9,6
2,1,8,4,4,7
3,3,5,3,5,9
4,8,6,7,2,6
5,8,8,6,8,7
6,4,5,4,5,9


## 13. How to find the positions of numbers that are multiples of 3 from a series?

Find the positions of numbers that are multiples of 3 from ser.

In [31]:
# input
ser = pd.Series(np.random.randint(1, 10, 7))

In [32]:
# my solution
ser[ser %3 == 0].index

Int64Index([0, 2], dtype='int64')

In [33]:
# page solution
np.argwhere(ser % 3 == 0)

  return getattr(obj, method)(*args, **kwds)


array([[0],
       [2]])

## 14. How to extract items at given positions from a series

From ser, extract the items at positions in list pos.

In [34]:
# input
ser = pd.Series(list('abcdefghijklmnopqrstuvwxyz'))
pos = [0, 4, 8, 14, 20]

In [35]:
# my solution
ser[pos]

0     a
4     e
8     i
14    o
20    u
dtype: object

In [36]:
# page solution
ser.take(pos)

0     a
4     e
8     i
14    o
20    u
dtype: object

## 15. How to stack two series vertically and horizontally ?

Stack ser1 and ser2 vertically and horizontally (to form a dataframe).

In [37]:
# input
ser1 = pd.Series(range(5))
ser2 = pd.Series(list('abcde'))

In [38]:
# my solution
# vertical
pd.concat((ser1, ser2))

0    0
1    1
2    2
3    3
4    4
0    a
1    b
2    c
3    d
4    e
dtype: object

In [39]:
# horizontal
pd.DataFrame(zip(ser1, ser2))

Unnamed: 0,0,1
0,0,a
1,1,b
2,2,c
3,3,d
4,4,e


In [40]:
# page solution
# vertical
ser1.append(ser2)

0    0
1    1
2    2
3    3
4    4
0    a
1    b
2    c
3    d
4    e
dtype: object

In [41]:
# horizontal
pd.concat((ser1, ser2), axis=1)

Unnamed: 0,0,1
0,0,a
1,1,b
2,2,c
3,3,d
4,4,e


## 16. How to get the positions of items of series A in another series B?

Get the positions of items of ser2 in ser1 as a list.

In [42]:
# input
ser1 = pd.Series([10, 9, 6, 5, 3, 1, 12, 8, 13])
ser2 = pd.Series([1, 3, 10, 13])

In [43]:
# my solution
[np.where(ser1 == i)[0][0] for i in ser2]

[5, 4, 0, 8]

In [44]:
# page solution
[pd.Index(ser1).get_loc(i) for i in ser2]

[5, 4, 0, 8]

## 17. How to compute the mean squared error on a truth and predicted series?

Compute the mean squared error of truth and pred series.

In [45]:
# input
truth = pd.Series(range(10))
pred = pd.Series(range(10)) + np.random.random(10)

In [46]:
# my solution is the same as page solution
np.mean((pred - truth)**2)

0.35596981535689054

## 18. How to convert the first character of each element in a series to uppercase?

Change the first character of each word to upper case in each word of ser.

In [47]:
# input
ser = pd.Series(['how', 'to', 'kick', 'ass?'])

In [48]:
# my solution
ser.str.title()

0     How
1      To
2    Kick
3    Ass?
dtype: object

In [49]:
# page solution, not a best choice
# Solution 1
ser.map(lambda x: x.title())

# Solution 2
ser.map(lambda x: x[0].upper() + x[1:])

# Solution 3
pd.Series([i.title() for i in ser])

0     How
1      To
2    Kick
3    Ass?
dtype: object

## 19. How to calculate the number of characters in each word in a series?

In [50]:
# input
ser = pd.Series(['how', 'to', 'kick', 'ass?'])

In [51]:
# my solution
ser.str.len()

0    3
1    2
2    4
3    4
dtype: int64

In [52]:
# page solution, not a best choice
ser.map(lambda x: len(x))

0    3
1    2
2    4
3    4
dtype: int64

## 20. How to compute difference of differences between consequtive numbers of a series?

Difference of differences between the consequtive numbers of ser.

In [53]:
# input
ser = pd.Series([1, 3, 6, 10, 15, 21, 27, 35])

In [54]:
# my solution
# difference
ser = np.array([np.nan] + ser.to_numpy().tolist())
d = ser[1:] - ser[:-1]
d

array([nan,  2.,  3.,  4.,  5.,  6.,  6.,  8.])

In [55]:
# difference of difference
d = np.array([np.nan] + d.tolist())
dd = d[1:] - d[:-1]
dd

array([nan, nan,  1.,  1.,  1.,  1.,  0.,  2.])

In [56]:
# page solution, better!
# difference
ser = pd.Series([1, 3, 6, 10, 15, 21, 27, 35])
ser.diff().to_numpy()

array([nan,  2.,  3.,  4.,  5.,  6.,  6.,  8.])

In [57]:
# difference of difference
ser.diff().diff().to_numpy()

array([nan, nan,  1.,  1.,  1.,  1.,  0.,  2.])

## 21. How to convert a series of date-strings to a timeseries?

In [58]:
# input
ser = pd.Series(['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', 
                 '2014-05-05', '2015-06-06T12:20'])

In [59]:
# my solution
pd.Series(pd.DatetimeIndex(ser))

0   2010-01-01 00:00:00
1   2011-02-02 00:00:00
2   2012-03-03 00:00:00
3   2013-04-04 00:00:00
4   2014-05-05 00:00:00
5   2015-06-06 12:20:00
dtype: datetime64[ns]

In [60]:
# page solution
pd.to_datetime(ser)

0   2010-01-01 00:00:00
1   2011-02-02 00:00:00
2   2012-03-03 00:00:00
3   2013-04-04 00:00:00
4   2014-05-05 00:00:00
5   2015-06-06 12:20:00
dtype: datetime64[ns]

## 22. How to get the day of month, week number, day of year and day of week from a series of date strings?

Get the day of month, week number, day of year and day of week from ser.

In [61]:
# input
ser = pd.Series(['01 Jan 2010', '02-02-2011', '20120303', 
                 '2013/04/04', '2014-05-05', '2015-06-06T12:20'])

In [62]:
# my solution
# day of month
pd.DatetimeIndex(ser).day

Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')

In [63]:
# week number
pd.DatetimeIndex(ser).week

Int64Index([53, 5, 9, 14, 19, 23], dtype='int64')

In [64]:
# day of year
pd.DatetimeIndex(ser).dayofyear

Int64Index([1, 33, 63, 94, 125, 157], dtype='int64')

In [65]:
# day of week
pd.DatetimeIndex(ser).day_name()

Index(['Friday', 'Wednesday', 'Saturday', 'Thursday', 'Monday', 'Saturday'], dtype='object')

In [66]:
# page solution, not a best choice

# Solution
from dateutil.parser import parse
ser_ts = ser.map(lambda x: parse(x))

# day of month
print("Date: ", ser_ts.dt.day.tolist())

# week number
print("Week number: ", ser_ts.dt.weekofyear.tolist())

# day of year
print("Day number of year: ", ser_ts.dt.dayofyear.tolist())

# day of week
print("Day of week: ", ser_ts.dt.weekday_name.tolist())

Date:  [1, 2, 3, 4, 5, 6]
Week number:  [53, 5, 9, 14, 19, 23]
Day number of year:  [1, 33, 63, 94, 125, 157]
Day of week:  ['Friday', 'Wednesday', 'Saturday', 'Thursday', 'Monday', 'Saturday']


## 23. How to convert year-month string to dates corresponding to the 4th day of the month?

Change ser to dates that start with 4th of the respective months.

In [67]:
# input
ser = pd.Series(['Jan 2010', 'Feb 2011', 'Mar 2012'])

In [68]:
# my solution 1
import datetime
pd.DatetimeIndex(ser) + datetime.timedelta(days=3)

DatetimeIndex(['2010-01-04', '2011-02-04', '2012-03-04'], dtype='datetime64[ns]', freq=None)

In [69]:
# my solution 2
pd.to_datetime(ser) + pd.DateOffset(days=3)

0   2010-01-04
1   2011-02-04
2   2012-03-04
dtype: datetime64[ns]

## 24. How to filter words that contain atleast 2 vowels from a series?

From ser, extract words that contain atleast 2 vowels.

In [70]:
# input
ser = pd.Series(['Apple', 'Orange', 'Plan', 'Python', 'Money'])

In [71]:
# my solution
ser[ser.str.count(r'[aeiouAEIOU]') >= 2]

0     Apple
1    Orange
4     Money
dtype: object

## 25. How to filter valid emails from a series?

Extract the valid emails from the series emails. The regex pattern for valid emails is provided as reference.

In [72]:
# input
emails = pd.Series(['buying books at amazom.com', 'rameses@egypt.com', 
                    'matt@t.co', 'narendra@modi.com'])
pattern ='[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}'

In [73]:
# my solution
emails[emails.str.match(pattern)]

1    rameses@egypt.com
2            matt@t.co
3    narendra@modi.com
dtype: object

In [74]:
# page solution, only the most elegant one
import re
emails.str.findall(pattern, flags=re.IGNORECASE)

0                     []
1    [rameses@egypt.com]
2            [matt@t.co]
3    [narendra@modi.com]
dtype: object

## 26. How to get the mean of a series grouped by another series?

Compute the mean of weights of each fruit.

In [75]:
# input
fruit = pd.Series(np.random.choice(['apple', 'banana', 'carrot'], 10))
weights = pd.Series(np.linspace(1, 10, 10))
print(weights.tolist())
print(fruit.tolist())

[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
['banana', 'banana', 'carrot', 'banana', 'banana', 'apple', 'carrot', 'banana', 'banana', 'carrot']


In [76]:
# my solution
df = pd.DataFrame({'fruit': fruit, 'weight': weights})
df.groupby('fruit').mean()

Unnamed: 0_level_0,weight
fruit,Unnamed: 1_level_1
apple,6.0
banana,4.833333
carrot,6.666667


In [77]:
# page solution
weights.groupby(fruit).mean()

apple     6.000000
banana    4.833333
carrot    6.666667
dtype: float64

## 27. How to compute the euclidean distance between two series?

Compute the euclidean distance between series (points) p and q, without using a packaged formula.

In [78]:
# input
p = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
q = pd.Series([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])

In [79]:
# my solution
np.sqrt(((p-q)**2).sum())

18.16590212458495

In [80]:
# page solution
np.linalg.norm(p-q)

18.16590212458495

## 28. How to find all the local maxima (or peaks) in a numeric series?

Get the positions of peaks (values surrounded by smaller values on both sides) in ser.

In [81]:
# input
ser = pd.Series([2, 10, 3, 4, 9, 10, 2, 7, 3])

In [82]:
# my solution
arr = ser.to_numpy()
np.where((arr[1:-1]>arr[:-2]) & (arr[1:-1]>arr[2:]))[0]+1

array([1, 5, 7])

In [83]:
# page solution
dd = np.diff(np.sign(np.diff(ser)))
np.where(dd==-2)[0] + 1

array([1, 5, 7])

## 29. How to replace missing spaces in a string with the least frequent character?

Replace the spaces in my_str with the least frequent character.

In [84]:
# input
my_str = 'dbc deb abed gade'

In [85]:
# my solution
ser = pd.Series(list(my_str))
my_str.replace(' ', ser.groupby(ser).count().idxmin())

'dbccdebcabedcgade'

In [86]:
# page solution
ser = pd.Series(list('dbc deb abed gade'))
freq = ser.value_counts()
print(freq)
least_freq = freq.dropna().index[-1]
"".join(ser.replace(' ', least_freq))

d    4
b    3
e    3
     3
a    2
c    1
g    1
dtype: int64


'dbcgdebgabedggade'

## 30. How to create a TimeSeries starting ‘2000-01-01’ and 10 weekends (saturdays) after that having random numbers as values?

In [87]:
# my solution
index = pd.date_range('2000-1-1', periods=10, freq='7D')
pd.Series(np.random.randint(1, 10, 10), index=index)

2000-01-01    4
2000-01-08    6
2000-01-15    3
2000-01-22    2
2000-01-29    3
2000-02-05    7
2000-02-12    8
2000-02-19    2
2000-02-26    3
2000-03-04    2
Freq: 7D, dtype: int64

In [88]:
# page solution
ser = pd.Series(np.random.randint(1,10,10), pd.date_range('2000-01-01', 
                                                          periods=10, freq='W-SAT'))
ser

2000-01-01    6
2000-01-08    3
2000-01-15    4
2000-01-22    5
2000-01-29    1
2000-02-05    9
2000-02-12    8
2000-02-19    7
2000-02-26    4
2000-03-04    6
Freq: W-SAT, dtype: int64

## 31. How to fill an intermittent time series so all missing dates show up with values of previous non-missing date?

ser has missing dates and values. Make all missing dates appear and fill up with value from previous date.

In [89]:
# input
ser = pd.Series([1,10,3,np.nan], 
                index=pd.to_datetime(['2000-01-01', '2000-01-03', 
                                      '2000-01-06', '2000-01-08']))

In [90]:
# my solution is the same as page solution
ser.resample('1D').ffill()

2000-01-01     1.0
2000-01-02     1.0
2000-01-03    10.0
2000-01-04    10.0
2000-01-05    10.0
2000-01-06     3.0
2000-01-07     3.0
2000-01-08     NaN
Freq: D, dtype: float64

## 32. How to compute the autocorrelations of a numeric series?

Compute autocorrelations for the first 10 lags of ser. Find out which lag has the largest correlation.

In [91]:
# input
ser = pd.Series(np.arange(20) + np.random.normal(1, 10, 20))

In [92]:
# page solution
ac = [ser.autocorr(i).round(2) for i in range(1, 11)]
print(ac)
print('largest correlation lag: ', np.argmax(np.abs(ac)) + 1)

[0.27, 0.1, 0.51, -0.07, -0.19, -0.08, -0.54, -0.2, 0.08, -0.43]
largest correlation lag:  7


## 33. How to import only every nth row from a csv file to create a dataframe?

Import every 50th row of [BostonHousing dataset](https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv) as a dataframe.

In [93]:
# my solution, and is much better than page solution
df = pd.read_csv('BostonHousing.csv', skiprows=lambda x: x%50!=0)
df.head()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,medv
0,0.21977,0,6.91,0,0.448,5.602,62.0,6.0877,3,233,17.9,396.9,16.2,19.4
1,0.0686,0,2.89,0,0.445,7.416,62.5,3.4952,2,276,18.0,396.9,6.19,33.2
2,2.73397,0,19.58,0,0.871,5.597,94.9,1.5257,5,403,14.7,351.85,21.45,15.4
3,0.0315,95,1.47,0,0.403,6.975,15.3,7.6534,3,402,17.0,396.9,4.56,34.9
4,0.19073,22,5.86,0,0.431,6.718,17.5,7.8265,7,330,19.1,393.74,6.56,26.2


## 34. How to change column values when importing csv to a dataframe?

Import the boston housing dataset, but while importing change the 'medv' (median house value) column so that values < 25 becomes ‘Low’ and > 25 becomes ‘High’.

In [94]:
# my solution is the same as page solution
df = pd.read_csv('BostonHousing.csv', converters={
    'medv': lambda x: 'high' if float(x)>25 else 'low'})
df.head()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,medv
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,low
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,low
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,high
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,high
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,high


## 35. How to create a dataframe with rows as strides from a given series?

In [95]:
# input
L = pd.Series(range(15))

In [96]:
# page solution
def gen_strides(a, stride_len=5, window_len=5):
    n_strides = ((a.size-window_len)//stride_len) + 1
    return np.array([a[s:(s+window_len)] for s in 
                     np.arange(0, a.size, stride_len)[:n_strides]])

gen_strides(L, stride_len=2, window_len=4)

array([[ 0,  1,  2,  3],
       [ 2,  3,  4,  5],
       [ 4,  5,  6,  7],
       [ 6,  7,  8,  9],
       [ 8,  9, 10, 11],
       [10, 11, 12, 13]])

## 36. How to import only specified columns from a csv file?

Import ‘crim’ and ‘medv’ columns of the BostonHousing dataset as a dataframe.

In [97]:
# my solution
df = pd.read_csv('BostonHousing.csv', usecols=['crim', 'medv'])
df.head()

Unnamed: 0,crim,medv
0,0.00632,24.0
1,0.02731,21.6
2,0.02729,34.7
3,0.03237,33.4
4,0.06905,36.2


## 37. How to get the nrows, ncolumns, datatype, summary stats of each column of a dataframe? Also get the array and list equivalent.

Get the number of rows, columns, datatype and summary statistics of each column of the [Cars93 dataset](https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv). Also get the numpy array and list equivalent of the dataframe.

In [98]:
# my solution
df = pd.read_csv('Cars93_miss.csv')
print('row number: ', len(df.index))
print('col number: ', len(df.columns))
print('data type: ', df.dtypes)
print(df.describe())
print(df.to_numpy())
print(df.to_numpy().tolist())

row number:  93
col number:  27
data type:  Manufacturer           object
Model                  object
Type                   object
Min.Price             float64
Price                 float64
Max.Price             float64
MPG.city              float64
MPG.highway           float64
AirBags                object
DriveTrain             object
Cylinders              object
EngineSize            float64
Horsepower            float64
RPM                   float64
Rev.per.mile          float64
Man.trans.avail        object
Fuel.tank.capacity    float64
Passengers            float64
Length                float64
Wheelbase             float64
Width                 float64
Turn.circle           float64
Rear.seat.room        float64
Luggage.room          float64
Weight                float64
Origin                 object
Make                   object
dtype: object
       Min.Price      Price  Max.Price   MPG.city  MPG.highway  EngineSize  \
count  86.000000  91.000000  88.000000  84.000000    9

## 38. How to extract the row and column number of a particular cell with given criterion?

Which manufacturer, model and type has the highest Price? What is the row and column number of the cell with the highest Price value?

In [99]:
# input
df = pd.read_csv('Cars93_miss.csv')

In [100]:
# my solution
print(df.iloc[df['Price'].idxmax()][['Manufacturer', 'Model', 'Type']])
print(np.where(df.values==df['Price'].max()))

Manufacturer    Mercedes-Benz
Model                    300E
Type                  Midsize
Name: 58, dtype: object
(array([58]), array([4]))


## 39. How to rename a specific columns in a dataframe?

Rename the column Type as CarType in df and replace the ‘.’ in column names with ‘_’.

In [101]:
# input
df = pd.read_csv('Cars93_miss.csv')

In [102]:
# my solution
df.columns = df.columns.str.replace('Type', 'CarType').str.replace(r'\.', '_')
df.columns

Index(['Manufacturer', 'Model', 'CarType', 'Min_Price', 'Price', 'Max_Price',
       'MPG_city', 'MPG_highway', 'AirBags', 'DriveTrain', 'Cylinders',
       'EngineSize', 'Horsepower', 'RPM', 'Rev_per_mile', 'Man_trans_avail',
       'Fuel_tank_capacity', 'Passengers', 'Length', 'Wheelbase', 'Width',
       'Turn_circle', 'Rear_seat_room', 'Luggage_room', 'Weight', 'Origin',
       'Make'],
      dtype='object')

## 40. How to check if a dataframe has any missing values?

Check if df has any missing values.

In [103]:
# input
df = pd.read_csv('Cars93_miss.csv')

In [104]:
# my solution
np.any(df.isna())

True

## 41. How to count the number of missing values in each column?

Count the number of missing values in each column of df. Which column has the maximum number of missing values?

In [105]:
# input
df = pd.read_csv('Cars93_miss.csv')

In [106]:
# my solution
pd.Series(np.count_nonzero(df.isna(), axis=0), index=df.columns)

Manufacturer           4
Model                  1
Type                   3
Min.Price              7
Price                  2
Max.Price              5
MPG.city               9
MPG.highway            2
AirBags                6
DriveTrain             7
Cylinders              5
EngineSize             2
Horsepower             7
RPM                    3
Rev.per.mile           6
Man.trans.avail        5
Fuel.tank.capacity     8
Passengers             2
Length                 4
Wheelbase              1
Width                  6
Turn.circle            5
Rear.seat.room         4
Luggage.room          19
Weight                 7
Origin                 5
Make                   3
dtype: int64

In [107]:
# page solution
print(df.isnull().sum())
df.isna().sum().idxmax()

Manufacturer           4
Model                  1
Type                   3
Min.Price              7
Price                  2
Max.Price              5
MPG.city               9
MPG.highway            2
AirBags                6
DriveTrain             7
Cylinders              5
EngineSize             2
Horsepower             7
RPM                    3
Rev.per.mile           6
Man.trans.avail        5
Fuel.tank.capacity     8
Passengers             2
Length                 4
Wheelbase              1
Width                  6
Turn.circle            5
Rear.seat.room         4
Luggage.room          19
Weight                 7
Origin                 5
Make                   3
dtype: int64


'Luggage.room'

## 42. How to replace missing values of multiple numeric columns with the mean?

Replace missing values in Min.Price and Max.Price columns with their respective mean.

In [108]:
# input
df = pd.read_csv('Cars93_miss.csv')

In [109]:
df['Min.Price'] = df['Min.Price'].fillna(df['Min.Price'].mean())
df['Max.Price'] = df['Max.Price'].fillna(df['Max.Price'].mean())
df.head()

Unnamed: 0,Manufacturer,Model,Type,Min.Price,Price,Max.Price,MPG.city,MPG.highway,AirBags,DriveTrain,...,Passengers,Length,Wheelbase,Width,Turn.circle,Rear.seat.room,Luggage.room,Weight,Origin,Make
0,Acura,Integra,Small,12.9,15.9,18.8,25.0,31.0,,Front,...,5.0,177.0,102.0,68.0,37.0,26.5,,2705.0,non-USA,Acura Integra
1,,Legend,Midsize,29.2,33.9,38.7,18.0,25.0,Driver & Passenger,Front,...,5.0,195.0,115.0,71.0,38.0,30.0,15.0,3560.0,non-USA,Acura Legend
2,Audi,90,Compact,25.9,29.1,32.3,20.0,26.0,Driver only,Front,...,5.0,180.0,102.0,67.0,37.0,28.0,14.0,3375.0,non-USA,Audi 90
3,Audi,100,Midsize,17.118605,37.7,44.6,19.0,26.0,Driver & Passenger,,...,6.0,193.0,106.0,,37.0,31.0,17.0,3405.0,non-USA,Audi 100
4,BMW,535i,Midsize,17.118605,30.0,21.459091,22.0,30.0,,Rear,...,4.0,186.0,109.0,69.0,39.0,27.0,13.0,3640.0,non-USA,BMW 535i


In [110]:
# page solution
df[['Min.Price', 'Max.Price']] = df[['Min.Price', 'Max.Price']].apply(
    lambda x: x.fillna(x.mean()))
df.head()

Unnamed: 0,Manufacturer,Model,Type,Min.Price,Price,Max.Price,MPG.city,MPG.highway,AirBags,DriveTrain,...,Passengers,Length,Wheelbase,Width,Turn.circle,Rear.seat.room,Luggage.room,Weight,Origin,Make
0,Acura,Integra,Small,12.9,15.9,18.8,25.0,31.0,,Front,...,5.0,177.0,102.0,68.0,37.0,26.5,,2705.0,non-USA,Acura Integra
1,,Legend,Midsize,29.2,33.9,38.7,18.0,25.0,Driver & Passenger,Front,...,5.0,195.0,115.0,71.0,38.0,30.0,15.0,3560.0,non-USA,Acura Legend
2,Audi,90,Compact,25.9,29.1,32.3,20.0,26.0,Driver only,Front,...,5.0,180.0,102.0,67.0,37.0,28.0,14.0,3375.0,non-USA,Audi 90
3,Audi,100,Midsize,17.118605,37.7,44.6,19.0,26.0,Driver & Passenger,,...,6.0,193.0,106.0,,37.0,31.0,17.0,3405.0,non-USA,Audi 100
4,BMW,535i,Midsize,17.118605,30.0,21.459091,22.0,30.0,,Rear,...,4.0,186.0,109.0,69.0,39.0,27.0,13.0,3640.0,non-USA,BMW 535i


## 43. How to use apply function on existing columns with global variables as additional arguments?

In df, use apply method to replace the missing values in Min.Price with the column’s mean and those in Max.Price with the column’s median.

In [111]:
# input
df = pd.read_csv('Cars93_miss.csv')

In [112]:
# page solution
f = {'Min.Price': np.nanmean, 'Max.Price': np.nanmedian}
df[['Min.Price', 'Max.Price']] = df[['Min.Price', 'Max.Price']].apply(
lambda x, f: x.fillna(f[x.name](x)), args=(f, ))
df.head()

Unnamed: 0,Manufacturer,Model,Type,Min.Price,Price,Max.Price,MPG.city,MPG.highway,AirBags,DriveTrain,...,Passengers,Length,Wheelbase,Width,Turn.circle,Rear.seat.room,Luggage.room,Weight,Origin,Make
0,Acura,Integra,Small,12.9,15.9,18.8,25.0,31.0,,Front,...,5.0,177.0,102.0,68.0,37.0,26.5,,2705.0,non-USA,Acura Integra
1,,Legend,Midsize,29.2,33.9,38.7,18.0,25.0,Driver & Passenger,Front,...,5.0,195.0,115.0,71.0,38.0,30.0,15.0,3560.0,non-USA,Acura Legend
2,Audi,90,Compact,25.9,29.1,32.3,20.0,26.0,Driver only,Front,...,5.0,180.0,102.0,67.0,37.0,28.0,14.0,3375.0,non-USA,Audi 90
3,Audi,100,Midsize,17.118605,37.7,44.6,19.0,26.0,Driver & Passenger,,...,6.0,193.0,106.0,,37.0,31.0,17.0,3405.0,non-USA,Audi 100
4,BMW,535i,Midsize,17.118605,30.0,19.15,22.0,30.0,,Rear,...,4.0,186.0,109.0,69.0,39.0,27.0,13.0,3640.0,non-USA,BMW 535i


## 44. How to select a specific column from a dataframe as a dataframe instead of a series?

Get the first column (a) in df as a dataframe (rather than as a Series).

In [113]:
# input
df = pd.DataFrame(np.arange(20).reshape(-1, 5), columns=list('abcde'))

In [114]:
# my solution
df['a'].to_frame()

Unnamed: 0,a
0,0
1,5
2,10
3,15


In [115]:
# page solution
print(type(df[['a']]))
print(type(df.loc[:, ['a']]))
print(type(df.iloc[:, [0]]))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>


## 45. How to change the order of columns of a dataframe?

Actually 3 questions.

1. In df, interchange columns 'a' and 'c'.

2. Create a generic function to interchange two columns, without hardcoding column names.

3. Sort the columns in reverse alphabetical order, that is colume 'e' first through column 'a' last.

In [116]:
# input
df = pd.DataFrame(np.arange(20).reshape(-1, 5), columns=list('abcde'))

In [117]:
# my solution
# Q1
df[['a', 'c']] = df[['c', 'a']]
df.columns = list('cbade')
df

Unnamed: 0,c,b,a,d,e
0,2,1,0,3,4
1,7,6,5,8,9
2,12,11,10,13,14
3,17,16,15,18,19


In [118]:
# Q2
def interchange(df, col1, col2):
    df[[col1, col2]] = df[[col2, col1]]
    fs = df.columns.tolist()
    ind1, ind2 = fs.index(col1), fs.index(col2)
    fs[ind1], fs[ind2] = fs[ind2], fs[ind1]
    df.columns = fs

interchange(df, 'b', 'e')
df

Unnamed: 0,c,e,a,d,b
0,2,4,0,3,1
1,7,9,5,8,6
2,12,14,10,13,11
3,17,19,15,18,16


In [119]:
# Q3
df = pd.DataFrame(np.arange(20).reshape(-1, 5), columns=list('abcde'))
def desc_order(df):
    rev = df.columns.sort_values(ascending=False)
    return pd.DataFrame({i: df[i] for i in rev})

df = desc_order(df)
df

Unnamed: 0,e,d,c,b,a
0,4,3,2,1,0
1,9,8,7,6,5
2,14,13,12,11,10
3,19,18,17,16,15


In [120]:
# page solution
# Q1
df = pd.DataFrame(np.arange(20).reshape(-1, 5), columns=list('abcde'))
df[list('cbade')]

Unnamed: 0,c,b,a,d,e
0,2,1,0,3,4
1,7,6,5,8,9
2,12,11,10,13,14
3,17,16,15,18,19


In [121]:
# Q2 almost the same as my solution
def switch_columns(df, col1=None, col2=None):
    colnames = df.columns.tolist()
    i1, i2 = colnames.index(col1), colnames.index(col2)
    colnames[i2], colnames[i1] = colnames[i1], colnames[i2]
    return df[colnames]

switch_columns(df, 'a', 'c')

Unnamed: 0,c,b,a,d,e
0,2,1,0,3,4
1,7,6,5,8,9
2,12,11,10,13,14
3,17,16,15,18,19


In [122]:
# Q3
df[sorted(df.columns, reverse=True)]

Unnamed: 0,e,d,c,b,a
0,4,3,2,1,0
1,9,8,7,6,5
2,14,13,12,11,10
3,19,18,17,16,15


## 46. How to set the number of rows and columns displayed in the output?

Change the pamdas display settings on printing the dataframe df it shows a maximum of 10 rows and 10 columns.

In [123]:
# input
df = pd.read_csv('Cars93_miss.csv')

In [124]:
# my solution
rows = pd.get_option('display.max_rows')
cols = pd.get_option('display.max_columns')
pd.set_option('display.max_rows', 10)
pd.set_option('display.max_columns', 10)
df

Unnamed: 0,Manufacturer,Model,Type,Min.Price,Price,...,Rear.seat.room,Luggage.room,Weight,Origin,Make
0,Acura,Integra,Small,12.9,15.9,...,26.5,,2705.0,non-USA,Acura Integra
1,,Legend,Midsize,29.2,33.9,...,30.0,15.0,3560.0,non-USA,Acura Legend
2,Audi,90,Compact,25.9,29.1,...,28.0,14.0,3375.0,non-USA,Audi 90
3,Audi,100,Midsize,,37.7,...,31.0,17.0,3405.0,non-USA,Audi 100
4,BMW,535i,Midsize,,30.0,...,27.0,13.0,3640.0,non-USA,BMW 535i
...,...,...,...,...,...,...,...,...,...,...,...
88,Volkswagen,Eurovan,Van,16.6,19.7,...,34.0,,3960.0,,Volkswagen Eurovan
89,Volkswagen,Passat,Compact,17.6,20.0,...,31.5,14.0,2985.0,non-USA,Volkswagen Passat
90,Volkswagen,Corrado,Sporty,22.9,23.3,...,26.0,15.0,2810.0,non-USA,Volkswagen Corrado
91,Volvo,240,Compact,21.8,22.7,...,29.5,14.0,2985.0,non-USA,Volvo 240


In [125]:
# restore setup
pd.set_option('display.max_rows', rows)
pd.set_option('display.max_columns', cols)

## 47. How to format or suppress scientific notations in a pandas dataframe?

Suppress scientific notations like ‘e-03’ in df and print upto 4 numbers after decimal.

In [126]:
# input
df = pd.DataFrame(np.random.random(4)**10, columns=['random'])
df

Unnamed: 0,random
0,0.01685
1,0.080788
2,0.001323
3,0.002978


In [127]:
# my solution
pd.set_option('display.float_format', lambda x: f'{x:.4f}')
df

Unnamed: 0,random
0,0.0169
1,0.0808
2,0.0013
3,0.003


In [128]:
# restore setup
pd.set_option('display.float_format', None)

In [129]:
# page solution
# Solution 1: Rounding
print(df.round(4))

# Solution 2: Use apply to change format
print(df.apply(lambda x: '%.4f' % x, axis=1))
# or
df.applymap(lambda x: '%.4f' % x)

   random
0  0.0169
1  0.0808
2  0.0013
3  0.0030
0    0.0169
1    0.0808
2    0.0013
3    0.0030
dtype: object


Unnamed: 0,random
0,0.0169
1,0.0808
2,0.0013
3,0.003


## 48. How to format all the values in a dataframe as percentages?

Format the values in column 'random' of df as percentages.

In [130]:
# input
df = pd.DataFrame(np.random.random(4), columns=['random'])
df

Unnamed: 0,random
0,0.175611
1,0.133848
2,0.723795
3,0.308585


In [131]:
# my solution
df.applymap(lambda x: f'{x*100:.2f}%')

Unnamed: 0,random
0,17.56%
1,13.38%
2,72.38%
3,30.86%


In [132]:
# page solution
df.style.format({'random': '{0:.2%}'.format})

Unnamed: 0,random
0,17.56%
1,13.38%
2,72.38%
3,30.86%


## 49. How to filter every nth row in a dataframe?

From df, filter the 'Manufacturer', 'Model' and 'Type' for every 20th row starting from 1st (row 0).

In [133]:
# input
df = pd.read_csv('Cars93_miss.csv')

In [134]:
# my solution
df.iloc[::20][['Manufacturer', 'Model', 'Type']]

Unnamed: 0,Manufacturer,Model,Type
0,Acura,Integra,Small
20,Chrysler,LeBaron,Compact
40,Honda,Prelude,Sporty
60,Mercury,Cougar,Midsize
80,Subaru,Loyale,Small


## 50. How to create a primary key index by combining relevant columns?

In df, Replace NaNs with ‘missing’ in columns 'Manufacturer', 'Model' and 'Type' and create a index as a combination of these three columns and check if the index is a primary key.

In [135]:
# input
df = pd.read_csv('Cars93_miss.csv', usecols=[0,1,2,3,5])

In [136]:
# my solution
df[['Manufacturer', 'Model', 'Type']] = df[['Manufacturer', 'Model', 'Type']].fillna(
    'missing')
ind = pd.Index(df['Manufacturer'] + df['Model'] + df['Type'])
print(ind)
ind.is_unique

Index(['AcuraIntegraSmall', 'missingLegendMidsize', 'Audi90Compact',
       'Audi100Midsize', 'BMW535iMidsize', 'BuickCenturyMidsize',
       'BuickLeSabreLarge', 'BuickRoadmasterLarge', 'BuickRivieraMidsize',
       'CadillacDeVilleLarge', 'CadillacSevilleMidsize',
       'ChevroletCavalierCompact', 'ChevroletCorsicaCompact',
       'ChevroletCamaroSporty', 'ChevroletLuminaMidsize',
       'ChevroletLumina_APVVan', 'ChevroletAstroVan', 'ChevroletCapriceLarge',
       'ChevroletCorvetteSporty', 'missingConcordeLarge',
       'ChryslerLeBaronCompact', 'ChryslerImperialLarge', 'DodgeColtSmall',
       'DodgeShadowSmall', 'DodgeSpiritCompact', 'DodgeCaravanVan',
       'DodgeDynastyMidsize', 'DodgeStealthSporty', 'EagleSummitSmall',
       'EagleVisionLarge', 'FordFestivaSmall', 'FordEscortSmall',
       'FordTempoCompact', 'FordMustangSporty', 'FordProbeSporty',
       'FordAerostarVan', 'FordTaurusMidsize', 'FordCrown_VictoriaLarge',
       'GeoMetroSmall', 'GeoStormSporty', 'HondaPrelu

True

## 51. How to get the row number of the nth largest value in a column?

Find the row position of the 5th largest value of column 'a' in df.

In [137]:
# input
df = pd.DataFrame(np.random.randint(1, 30, 30).reshape(10,-1), columns=list('abc'))

In [138]:
# my solution
df['a'].sort_values(ascending=False).index[4]

6

In [139]:
# page solution
df['a'].argsort()[::-1][4]

2

## 52. How to find the position of the nth largest value greater than a given value?

In ser, find the position of the 2nd largest value greater than the mean.

In [140]:
# input
ser = pd.Series(np.random.randint(1, 100, 15))

In [141]:
# my solution
ser[ser>ser.mean()].sort_values().index[1]

9

## 53. How to get the last n rows of a dataframe with row sum > 100?

Get the last two rows of df whose row sum is greater than 100.

In [142]:
# input
df = pd.DataFrame(np.random.randint(10, 40, 60).reshape(-1, 4))

In [143]:
# mysolution
df[df.sum(axis=1)>100].tail(2)

Unnamed: 0,0,1,2,3
13,22,25,35,38
14,24,35,32,39


In [144]:
# page solution
df.iloc[np.where(df.sum(axis=1) > 100)[0][-2:]]

Unnamed: 0,0,1,2,3
13,22,25,35,38
14,24,35,32,39


## 54. How to find and cap outliers from a series or dataframe column?

Replace all values of ser in the lower 5%ile and greater than 95%ile with respective 5th and 95th %ile value.

In [145]:
# input
ser = pd.Series(np.logspace(-2, 2, 30))
ser

0       0.010000
1       0.013738
2       0.018874
3       0.025929
4       0.035622
5       0.048939
6       0.067234
7       0.092367
8       0.126896
9       0.174333
10      0.239503
11      0.329034
12      0.452035
13      0.621017
14      0.853168
15      1.172102
16      1.610262
17      2.212216
18      3.039195
19      4.175319
20      5.736153
21      7.880463
22     10.826367
23     14.873521
24     20.433597
25     28.072162
26     38.566204
27     52.983169
28     72.789538
29    100.000000
dtype: float64

In [146]:
# my solution
low_thresh, high_thresh = np.quantile(ser, [0.05, 0.95])
ser[ser<low_thresh] = low_thresh
ser[ser>high_thresh] = high_thresh
ser

0      0.016049
1      0.016049
2      0.018874
3      0.025929
4      0.035622
5      0.048939
6      0.067234
7      0.092367
8      0.126896
9      0.174333
10     0.239503
11     0.329034
12     0.452035
13     0.621017
14     0.853168
15     1.172102
16     1.610262
17     2.212216
18     3.039195
19     4.175319
20     5.736153
21     7.880463
22    10.826367
23    14.873521
24    20.433597
25    28.072162
26    38.566204
27    52.983169
28    63.876672
29    63.876672
dtype: float64

## 55. How to reshape a dataframe to the largest possible square after removing the negative values?

Reshape df to the largest possible square with negative values removed. Drop the smallest values if need be. The order of the positive numbers in the result should remain the same as the original.

In [147]:
# input
df = pd.DataFrame(np.random.randint(-20, 50, 100).reshape(10,-1))

In [148]:
# page solution, and don't know what this question is intended to do
print(df)

# Solution
# Step 1: remove negative values from arr
arr = df[df > 0].values.flatten()
arr_qualified = arr[~np.isnan(arr)]

# Step 2: find side-length of largest possible square
n = int(np.floor(arr_qualified.shape[0]**.5))

# Step 3: Take top n^2 items without changing positions
top_indexes = np.argsort(arr_qualified)[::-1]
output = np.take(arr_qualified, sorted(top_indexes[:n**2])).reshape(n, -1)
print(output)

    0   1   2   3   4   5   6   7   8   9
0  12  28  15  13  13  49  42   8  46  42
1  31  15  28  -9  10  28   3  -8  43  -9
2  31  49 -15  27   0  -2   6   2 -12  -8
3   4  21  -5  -6 -10  47   7 -20  -2  25
4  45   4  37  38 -20  -7  49  49 -13  48
5  -2  24  -3  20   6  46  47   9   1  31
6  -6  48  36  29  10  39   6  27  -3 -19
7   8  29  32  13  31  48   8  20  26  38
8  40  18 -20  -6  15  39   6  18  35  34
9   6  -8  -2  27   8  -6   3  16  36  32
[[12. 28. 15. 13. 13. 49. 42.  8.]
 [46. 42. 31. 15. 28. 10. 28. 43.]
 [31. 49. 27. 21. 47.  7. 25. 45.]
 [37. 38. 49. 49. 48. 24. 20. 46.]
 [47.  9. 31. 48. 36. 29. 10. 39.]
 [27.  8. 29. 32. 13. 31. 48.  8.]
 [20. 26. 38. 40. 18. 15. 39. 18.]
 [35. 34.  6. 27.  8. 16. 36. 32.]]


## 56. How to swap two rows of a dataframe?

Swap rows 1 and 2 in df.

In [149]:
# input
df = pd.DataFrame(np.arange(25).reshape(5, -1))
df

Unnamed: 0,0,1,2,3,4
0,0,1,2,3,4
1,5,6,7,8,9
2,10,11,12,13,14
3,15,16,17,18,19
4,20,21,22,23,24


In [150]:
# my solution 1
arr = df.values
arr[[0, 1]] = arr[[1, 0]]
df.index = [1, 0, 2, 3, 4]
df

Unnamed: 0,0,1,2,3,4
1,5,6,7,8,9
0,0,1,2,3,4
2,10,11,12,13,14
3,15,16,17,18,19
4,20,21,22,23,24


In [151]:
# my solution 2
df = pd.DataFrame(np.arange(25).reshape(5, -1))
df.iloc[[1, 0, 2, 3, 4]]

Unnamed: 0,0,1,2,3,4
1,5,6,7,8,9
0,0,1,2,3,4
2,10,11,12,13,14
3,15,16,17,18,19
4,20,21,22,23,24


## 57. How to reverse the rows of a dataframe?

Reverse all the rows of dataframe df.

In [152]:
# input
df = pd.DataFrame(np.arange(25).reshape(5, -1))

In [153]:
# my solution
df.iloc[df.index[::-1]]

Unnamed: 0,0,1,2,3,4
4,20,21,22,23,24
3,15,16,17,18,19
2,10,11,12,13,14
1,5,6,7,8,9
0,0,1,2,3,4


In [154]:
# page solution
df.iloc[::-1]

Unnamed: 0,0,1,2,3,4
4,20,21,22,23,24
3,15,16,17,18,19
2,10,11,12,13,14
1,5,6,7,8,9
0,0,1,2,3,4


## 58. How to create one-hot encodings of a categorical variable (dummy variables)?

Get one-hot encodings for column 'a' in the dataframe df and append it as columns.

In [155]:
# input
df = pd.DataFrame(np.arange(25).reshape(5,-1), columns=list('abcde'))