# From 1 to 100 Pandas Exercises for Data Analysis (Python)

This is the first serie of pandas exercises collected from different sources.
>**_The goal:_** practicing, learning and also teaching using this set of exercises.

If you find an error or a better solution, please feel free to open an issue or a pull request :D

_Sources: [here](sources.json)_

** 1.- How to import pandas and check the version? **

In [2]:
import pandas as pd
pd.__version__

# Alternatively use
# all pandas dependencies version
# print(pd.show_versions(as_json=True))

'0.22.0'

** 2.- How to create a series from a list, numpy array and dict? **

In [3]:
import numpy as np

# input 
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))

# output
serieList = pd.Series(mylist)
print("Series from numpy list: \n ", serieList.head(5))
serieArray = pd.Series(myarr)
print("Series from numpy array: \n ", serieArray.head(5))
serieDict = pd.Series(mydict)
print("Series from numpy dict: \n ", serieDict.head(5))

Series from numpy list: 
  0    a
1    b
2    c
3    e
4    d
dtype: object
Series from numpy array: 
  0    0
1    1
2    2
3    3
4    4
dtype: int64
Series from numpy dict: 
  a    0
b    1
c    2
d    4
e    3
dtype: int64


** 3.- How to convert the index of a series into a column of a dataframe? **
   - Convert the series ser into a dataframe with its index as another column on the dataframe.

In [25]:
# input
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))
ser = pd.Series(mydict)

# output

serDF = ser.to_frame().reset_index()
serDF.head()


Unnamed: 0,index,0
0,a,0
1,b,1
2,c,2
3,d,4
4,e,3


** 4.- How to combine many series to form a dataframe? **
   - Combine ser1 and ser2 to form a dataframe.



In [19]:
# input 
import numpy as np
ser1 = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
ser2 = pd.Series(np.arange(26))

# output
combinedSerieDF = pd.concat([ser1, ser2], axis=1)
combinedSerieDF.head()

# Alternatively use
# combinedSerieDF = pd.DataFrame({'column 1': ser1, 'column 2': ser2})
# combinedSerieDF.head()

Unnamed: 0,column 1,column 2
0,a,0
1,b,1
2,c,2
3,e,3
4,d,4


** 5.- How to assign name to the series’ index? **
   - Give a name to the series ser calling it ‘alphabets’.

In [26]:
# input
ser = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
# output
ser.name = "alphabets"
ser.head()

# alternatively use 
# ser.rename("alphabets") 
# ser.head()

0    a
1    b
2    c
3    e
4    d
Name: alphabets, dtype: object

** 6.- How to get the items of series A not present in series B? **
   - From ser1 remove items present in ser2.

In [31]:
# input 
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

# output
ser1[~(ser1.isin(ser2))]

0    1
1    2
2    3
dtype: int64

** 7.- How to get the items not common to both series A and series B? **
   - Get all items of ser1 and ser2 not common to both.

In [37]:
# input 
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

# output
serieU = pd.Series(np.union1d(ser1, ser2)) # get sorted elements that are either of the two arrays 
serieI = pd.Series(np.intersect1d(ser1,ser2)) # get unique elements that are in both arrays
serieU[~serieU.isin(serieI)]


0    1
1    2
2    3
5    6
6    7
7    8
dtype: int64

** 8.- How to get the minimum, 25th percentile, median, 75th, and max of a numeric series? **
   - Compute the minimum, 25th percentile, median, 75th, and maximum of ser.

In [55]:
# input
np.random.seed(123)
ser = pd.Series(np.random.normal(10, 5, 25))

# output
print("Min: %f, 25th percentile: %f, median: %f, 75th percentile: %f, Max: %f, " % ((ser.min(), ser.quantile(.25), \
ser.median(), ser.quantile(.75), ser.max())))

# Alternatively use
# np.percentile(ser, q=[0, 25, 50, 75, 100])


Min: -2.133396, 25th percentile: 6.605569, median: 9.526455, 75th percentile: 15.879145, Max: 21.029650, 


** 9.- How to get frequency counts of unique items of a series? **
   - Calculte the frequency counts of each unique value ser.

In [69]:
# input 
np.random.seed(123)
ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=30)))

# output
ser.value_counts()

g    7
b    6
a    5
c    3
d    3
e    2
f    2
h    2
dtype: int64

** 10.- How to keep only top 2 most frequent values as it is and replace everything else as ‘Other’? **
   - From ser, keep the top 2 most frequent items as it is and replace everything else as ‘Other’.

In [82]:
# intput data
np.random.seed(123)
ser = pd.Series(np.random.randint(1, 5, [12]))
print("Input data:\n", ser)
print(" More frequent values:\n", ser.value_counts())

# output
ser[~ser.isin(ser.value_counts().index[:2])] = 'Other'
ser

Input data:
 0     3
1     2
2     3
3     3
4     1
5     3
6     3
7     2
8     4
9     3
10    4
11    2
dtype: int64
 More frequent values:
 3    6
2    3
4    2
1    1
dtype: int64


0         3
1         2
2         3
3         3
4     Other
5         3
6         3
7         2
8     Other
9         3
10    Other
11        2
dtype: object

** 11.- How to bin a numeric series to 10 groups of equal size? **
   - Bin the series ser into 10 equal deciles and replace the values with the bin name.

In [85]:
# input
np.random.seed(123)
ser = pd.Series(np.random.random(20))
print("Input data: \n", ser)
pd.qcut(ser, q=[0, .10, .20, .3, .4, .5, .6, .7, .8, .9, 1], 
        labels=['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th']).head()

Input data: 
 0     0.696469
1     0.286139
2     0.226851
3     0.551315
4     0.719469
5     0.423106
6     0.980764
7     0.684830
8     0.480932
9     0.392118
10    0.343178
11    0.729050
12    0.438572
13    0.059678
14    0.398044
15    0.737995
16    0.182492
17    0.175452
18    0.531551
19    0.531828
dtype: float64


0    8th
1    3rd
2    2nd
3    7th
4    9th
dtype: category
Categories (10, object): [1st < 2nd < 3rd < 4th ... 7th < 8th < 9th < 10th]

** 12.- How to convert a numpy array to a dataframe of given shape? (L1) **
   - Reshape the series ser into a dataframe with 7 rows and 5 columns

In [96]:
# input
np.random.seed(123)
ser = pd.Series(np.random.randint(1, 10, 35))

#output
pd.DataFrame(ser.values.reshape(7,5))

Unnamed: 0,0,1,2,3,4
0,3,3,7,2,4
1,7,2,1,2,1
2,1,4,5,1,1
3,5,2,8,4,3
4,5,8,3,5,9
5,1,8,4,5,7
6,2,6,7,3,2


** 13.- How to find the positions of numbers that are multiples of 3 from a series? ** 
   - Find the positions of numbers that are multiples of 3 from ser.



In [111]:
# input
np.random.seed(123)
ser = pd.Series(np.random.randint(1, 10, 7))
print("Input data: \n", ser)

# output
ser[ser%3==0]

# Alternatively use
# np.where(ser%3==0)[0]

Input data: 
 0    3
1    3
2    7
3    2
4    4
5    7
6    2
dtype: int64


0    3
1    3
dtype: int64

** 14.- How to extract items at given positions from a series **
   - From ser, extract the items at positions in list pos.

In [120]:
# input
ser = pd.Series(list('abcdefghijklmnopqrstuvwxyz'))
pos = [0, 4, 8, 14, 20]

# output
ser[pos]

# Alternatively use 
# ser.take(pos)

0     a
4     e
8     i
14    o
20    u
dtype: object

** 15.- How to stack two series vertically and horizontally ? **
   - Stack ser1 and ser2 vertically and horizontally (to form a dataframe).

In [160]:
# input
ser1 = pd.Series(range(5))
ser2 = pd.Series(list('abcde'))

# output
v = pd.concat((ser1,ser2), axis=0)
v.to_frame()

h = pd.concat((ser1,ser2), axis=1)
h


Unnamed: 0,0,1
0,0,a
1,1,b
2,2,c
3,3,d
4,4,e


** 16.- How to get the positions of items of series A in another series B? **
   - Get the positions of items of ser2 in ser1 as a list.

In [177]:
# input
ser1 = pd.Series([10, 9, 6, 5, 3, 1, 12, 8, 13])
ser2 = pd.Series([1, 3, 10, 13])

# output
[pd.Index(ser1).get_loc(i) for i in ser2]

[5, 4, 0, 8]

** 17.- How to compute the mean squared error on a truth and predicted series? **
   - Compute the mean squared error of truth and pred series.

In [179]:
# input
np.random.seed(123)
truth = pd.Series(range(10))
pred = pd.Series(range(10)) + np.random.random(10)

# output
np.mean((truth-pred)**2)

0.3434951102792991

** 18.- How to convert the first character of each element in a series to uppercase? **
   - Change the first character of each word to upper case in each word of ser.

In [182]:
# input
ser = pd.Series(['how', 'to', 'kick', 'ass?'])

# output
ser.str.capitalize()

# Alternatively use
# pd.Series([i.title() for i in ser])

0     How
1      To
2    Kick
3    Ass?
dtype: object

** 19.- How to calculate the number of characters in each word in a series? **

In [199]:
# input
ser = pd.Series(['how', 'to', 'kick', 'ass?'])

# output
[len(i) for i in ser]

# alternatively use
# ser.map(lambda x: len(x))

[3, 2, 4, 4]

** 20.- How to compute difference of differences between consequtive numbers of a series? **
   - Difference of differences between the consequtive numbers of ser.

In [215]:
# input
ser = pd.Series([1, 3, 6, 10, 15, 21, 27, 35])
ser

# output
# np.diff(ser).tolist() # this does not compute the first number (position 0 - nan)
print(ser.diff().tolist())
print(ser.diff().diff().tolist())

[nan, 2.0, 3.0, 4.0, 5.0, 6.0, 6.0, 8.0]
[nan, nan, 1.0, 1.0, 1.0, 1.0, 0.0, 2.0]


** 21.- How to convert a series of date-strings to a timeseries? **

In [223]:
# input 
ser = pd.Series(['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', '2014-05-05', '2015-06-06T12:20'])

# output
pd.to_datetime(ser)

0   2010-01-01 00:00:00
1   2011-02-02 00:00:00
2   2012-03-03 00:00:00
3   2013-04-04 00:00:00
4   2014-05-05 00:00:00
5   2015-06-06 12:20:00
dtype: datetime64[ns]

** 22.- How to get the day of month, week number, day of year and day of week from a series of date strings? **
   - Get the day of month, week number, day of year and day of week from ser.

In [259]:
# input
ser = pd.Series(['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', '2014-05-05', '2015-06-06T12:20'])
s = pd.to_datetime(ser)

# output

print("Day of the month: ", s.dt.day.tolist())
print("Week number: ", s.dt.week.tolist())
print("Day of the year: ", s.dt.dayofyear.tolist())
print("Day of week: ", s.dt.weekday_name.tolist())


Day of the month:  [1, 2, 3, 4, 5, 6]
Week number:  [53, 5, 9, 14, 19, 23]
Day of the year:  [1, 33, 63, 94, 125, 157]
Day of week:  ['Friday', 'Wednesday', 'Saturday', 'Thursday', 'Monday', 'Saturday']


** 23.- How to convert year-month string to dates corresponding to the 4th day of the month? **
   - Change ser to dates that start with 4th of the respective months.


In [265]:
# input data
from dateutil.parser import parse
ser = pd.Series(['Jan 2010', 'Feb 2011', 'Mar 2012'])

# output
ser.map(lambda x: parse('04 ' + x))

0   2010-01-04
1   2011-02-04
2   2012-03-04
dtype: datetime64[ns]

** 24.- How to filter words that contain atleast 2 vowels from a series? **
   - From ser, extract words that contain atleast 2 vowels.

In [295]:
# input
ser = pd.Series(['Apple', 'Orange', 'Plan', 'Python', 'Money'])
vowels = ['a','e','i','o','u']

# output
for idx, val in enumerate(ser):
    v = np.isin(vowels, list(val.lower()))
    if len(v[v==True]) > 1:
        print(ser[idx])
    

Apple
Orange
Money


** 25.- How to filter valid emails from a series? **
   - Extract the valid emails from the series emails. The regex pattern for valid emails is provided as reference.

In [302]:
# input 
emails = pd.Series(['buying books at amazom.com', 'rameses@egypt.com', 'matt@t.co', 'narendra@modi.com'])
pattern ='[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}'

# output
emails[emails.str.contains(pattern)]

1    rameses@egypt.com
2            matt@t.co
3    narendra@modi.com
dtype: object

** 26.- How to get the mean of a series grouped by another series? **
   - Compute the mean of weights of each fruit.

In [326]:
# input
np.random.seed(123)
fruit = pd.Series(np.random.choice(['apple', 'banana', 'carrot'], 10))
weight = pd.Series(np.linspace(1, 10, 10))
print(weight.tolist())
print(fruit.tolist())

# output
print(weight.groupby(fruit).mean())

[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
['carrot', 'banana', 'carrot', 'carrot', 'apple', 'carrot', 'carrot', 'banana', 'carrot', 'banana']
apple     5.000000
banana    6.666667
carrot    5.000000
dtype: float64


** 27.- How to compute the euclidean distance between two series? **
   - Compute the euclidean distance between series (points) p and q, without using a packaged formula.

In [328]:
# input
p = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
q = pd.Series([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])

# output
np.sqrt(((p-q)**2).sum())

# Alternatively use
# np.linalg.norm(p-q)

18.16590212458495

** 28.- How to find all the local maxima (or peaks) in a numeric series? **
   - Get the positions of peaks (values surrounded by smaller values on both sides) in ser.

In [352]:
# input
ser = pd.Series([22, 10, 3, 4, 9, 10, 2, 7, 3])

# output
for idx, val in enumerate(ser):
    if (idx !=0) & (idx<len(ser)-1):
        if (val > ser[idx-1]) & (val > ser[idx+1]):
            print(idx)

# Alternatively use
# dd = np.diff(np.sign(np.diff(ser)))
# peak_locs = np.where(dd == -2)[0] + 1
# peak_locs     

5
7


** 29.- How to replace missing spaces in a string with the least frequent character? **
   - Replace the spaces in my_str with the least frequent character.

In [400]:
# input
my_str = 'dbc deb abed gade'
my_str

#output
ser = pd.Series(list(my_str))
lessFreq = ser.value_counts().dropna().index[-1]
my_str.replace(' ',lessFreq)

'dbcgdebgabedggade'

** 30.- How to create a TimeSeries starting ‘2000-01-01’ and 10 weekends (saturdays) after that having random numbers as values? **

In [415]:
np.random.seed(123)
ts = pd.Series(np.random.randint(1,10,10), pd.date_range("2000-01-01", periods=10, freq="W-SAT"))
ts

2000-01-01    3
2000-01-08    3
2000-01-15    7
2000-01-22    2
2000-01-29    4
2000-02-05    7
2000-02-12    2
2000-02-19    1
2000-02-26    2
2000-03-04    1
Freq: W-SAT, dtype: int64

** 31.- How to fill an intermittent time series so all missing dates show up with values of previous non-missing date? **
   - ser has missing dates and values. Make all missing dates appear and fill up with value from previous date.



In [418]:
# input
ser = pd.Series([1,10,3,np.nan], index=pd.to_datetime(['2000-01-01', '2000-01-03', '2000-01-06', '2000-01-08']))
print(ser)

# output
ser.resample('D').ffill()

2000-01-01     1.0
2000-01-03    10.0
2000-01-06     3.0
2000-01-08     NaN
dtype: float64


2000-01-01     1.0
2000-01-02     1.0
2000-01-03    10.0
2000-01-04    10.0
2000-01-05    10.0
2000-01-06     3.0
2000-01-07     3.0
2000-01-08     NaN
Freq: D, dtype: float64

** 32.- How to compute the autocorrelations of a numeric series? **
   - Compute autocorrelations for the first 10 lags of ser. Find out which lag has the largest correlation.

In [422]:
# input 
np.random.seed(123)
ser = pd.Series(np.arange(20) + np.random.normal(1, 10, 20))

# output
autocorrelations = [ser.autocorr(i).round(2) for i in range(11)]
print(autocorrelations[1:])
print('Lag having highest correlation: ', np.argmax(np.abs(autocorrelations[1:]))+1)

[0.34, 0.15, 0.48, 0.49, 0.07, 0.28, 0.42, 0.1, 0.21, -0.01]
Lag having highest correlation:  4


** 33.- How to import only every nth row from a csv file to create a dataframe? **
   - Import every 50th row of BostonHousing dataset as a dataframe.

In [16]:
data = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv', chunksize=50)

df = pd.DataFrame()
for chunk in data:
    df = df.append(chunk.iloc[0,:])
    
df.head()

Unnamed: 0,age,b,chas,crim,dis,indus,lstat,medv,nox,ptratio,rad,rm,tax,zn
0,65.2,396.9,0.0,0.00632,4.09,2.31,4.98,24.0,0.538,15.3,1.0,6.575,296.0,18.0
50,45.7,395.56,0.0,0.08873,6.8147,5.64,13.45,19.7,0.439,16.8,4.0,5.963,243.0,21.0
100,79.9,394.76,0.0,0.14866,2.7778,8.56,9.42,27.5,0.52,20.9,5.0,6.727,384.0,0.0
150,97.3,372.8,0.0,1.6566,1.618,19.58,14.1,21.5,0.871,14.7,5.0,6.122,403.0,0.0
200,13.9,384.3,0.0,0.01778,7.6534,1.47,4.45,32.9,0.403,17.0,3.0,7.135,402.0,95.0


** 34.- How to change column values when importing csv to a dataframe? **
   - Import the boston housing dataset, but while importing change the 'medv' (median house value) column so that values < 25 becomes ‘Low’ and > 25 becomes ‘High’.

In [18]:
data = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv', 
                 converters={'medv': lambda x: 'High' if float(x) > 25 else 'Low'})
data.head()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,medv
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,Low
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,Low
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,High
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,High
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,High


** 35.- How to create a dataframe with rows as strides from a given series? **

In [20]:
# input 
L = pd.Series(range(15))
L.to_frame

def gen_strides(a, stride_len=5, window_len=5):
    n_strides = ((a.size-window_len)//stride_len) + 1
    return np.array([a[s:(s+window_len)] for s in np.arange(0, a.size, stride_len)[:n_strides]])

gen_strides(L, stride_len=2, window_len=4)

array([[ 0,  1,  2,  3],
       [ 2,  3,  4,  5],
       [ 4,  5,  6,  7],
       [ 6,  7,  8,  9],
       [ 8,  9, 10, 11],
       [10, 11, 12, 13]])

** 36.- How to import only specified columns from a csv file? **
   - Import ‘crim’ and ‘medv’ columns of the BostonHousing dataset as a dataframe.

In [22]:
data = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv', usecols=['crim', 'medv'])
data.head()

Unnamed: 0,crim,medv
0,0.00632,24.0
1,0.02731,21.6
2,0.02729,34.7
3,0.03237,33.4
4,0.06905,36.2


** 37.- How to get the nrows, ncolumns, datatype, summary stats of each column of a dataframe? Also get the array and list equivalent. **
   - Get the number of rows, columns, datatype and summary statistics of each column of the Cars93 dataset. Also get the numpy array and list equivalent of the dataframe.

In [46]:
data = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')
print("Number of rows: %d, Number of columns: %d" % (data.shape[0], data.shape[1]))
print(data.dtypes)
print("The data types: \n", df.get_dtype_counts())
# some statistics
data.describe()
# to array 
dfArr = df.values
print(dfArr)

# to list
dfList = df.values.tolist()
print(dfList)

Number of rows: 93, Number of columns: 27
Manufacturer           object
Model                  object
Type                   object
Min.Price             float64
Price                 float64
Max.Price             float64
MPG.city              float64
MPG.highway           float64
AirBags                object
DriveTrain             object
Cylinders              object
EngineSize            float64
Horsepower            float64
RPM                   float64
Rev.per.mile          float64
Man.trans.avail        object
Fuel.tank.capacity    float64
Passengers            float64
Length                float64
Wheelbase             float64
Width                 float64
Turn.circle           float64
Rear.seat.room        float64
Luggage.room          float64
Weight                float64
Origin                 object
Make                   object
dtype: object
The data types: 
 float64    14
dtype: int64
[[6.52000e+01 3.96900e+02 0.00000e+00 6.32000e-03 4.09000e+00 2.31000e+00
  4.98000e+00 2

** 38.- How to extract the row and column number of a particular cell with given criterion? **
   - Which manufacturer, model and type has the highest Price? What is the row and column number of the cell with the highest Price value?

In [71]:
# input
data = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')
data.head()

# output
print(data.loc[data.Price == np.max(data.Price), ['Manufacturer', 'Model', 'Type']])
row, col = np.where(data.values == np.max(data.Price))
data.iloc[row, col]

     Manufacturer Model     Type
58  Mercedes-Benz  300E  Midsize


Unnamed: 0,Price
58,61.9


** 39.- How to rename a specific columns in a dataframe? **
   - Rename the column Type as CarType in df and replace the ‘.’ in column names with ‘_’.

In [87]:
# input 
data = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')
print(data.columns)

# output
data.rename(columns={"Type": "CarType", "Max.Price":"Max_price", "MPG.city":"MPG_city", "MPG.highway":"MPG_highway", \
                                "Rev.per.mile":"Rev_per_mile", "Man.trans.avail":"Man_trans_avail", \
                                "Fuel.tank.capacity":"Fuel_tank_capacity", "Turn.circle":"Turn_circle", \
                                "Rear.seat.room":"Rear_seat_room", "Luggage.room":"Luggage_room"}, inplace=True)

# Alternatvely use
# data.columns.values[2] = "CarType"
# data.columns = data.columns.map(lambda x: x.replace('.', '_'))

print(data.columns)

Index(['Manufacturer', 'Model', 'Type', 'Min.Price', 'Price', 'Max.Price',
       'MPG.city', 'MPG.highway', 'AirBags', 'DriveTrain', 'Cylinders',
       'EngineSize', 'Horsepower', 'RPM', 'Rev.per.mile', 'Man.trans.avail',
       'Fuel.tank.capacity', 'Passengers', 'Length', 'Wheelbase', 'Width',
       'Turn.circle', 'Rear.seat.room', 'Luggage.room', 'Weight', 'Origin',
       'Make'],
      dtype='object')
Index(['Manufacturer', 'Model', 'CarType', 'Min.Price', 'Price', 'Max_price',
       'MPG_city', 'MPG_highway', 'AirBags', 'DriveTrain', 'Cylinders',
       'EngineSize', 'Horsepower', 'RPM', 'Rev_per_mile', 'Man_trans_avail',
       'Fuel_tank_capacity', 'Passengers', 'Length', 'Wheelbase', 'Width',
       'Turn_circle', 'Rear_seat_room', 'Luggage_room', 'Weight', 'Origin',
       'Make'],
      dtype='object')


** 40.- How to check if a dataframe has any missing values? **
   - Check if df has any missing values.

In [95]:
# input

data = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')
data.head()

# output
data.isna().values.any()
# or
data.isnull().values.any()

True