### Numpy

- Numerical Python Library.
- Allows data storage and calculations by providing data structures, algorithm and other useful utilities.
- Contains : Basic Linear Algebra functions, Fourier Transforms, and Advances Random Number capabilities.

### Scikit Learn

- ML libarbry.
- Classification Models : SVM, Random Forest, Decision Tree
- Regression Analysis : Linear Regression, Logistic Regression
- Clustering Methods : K-means Clustering
- Data Reduction Models : Principal Component Analysis, Feature Selection
- Model Tuning
- Selection : with features like Grid Search, Cross-Validation

### Pandas

- Powerful library that provides tools for data wrangling.
- Large variety of functions for : data imports, export, indexing and manipulation.
- Provides a special data structure : Dataframe. And efficient methods for handling them.
- Metghods such as : Reshape, Merge, Split, and Aggregate Data.

### Matplotlib

- Library for Data Visualization.

### Seaborn

- Another library for data visualization. It is based on Matplotlib

## Numpy

### 1. Creating Arrays

- Array from Lists

In [1]:
import numpy as np

In [3]:
arr = np.array([1,2,3,4])

In [4]:
type(arr)

numpy.ndarray

In [5]:
lists = [[0,1,2], [3,4,5], [6,7,8]]

In [6]:
arr2d = np.array(lists)

In [7]:
arr2d

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [8]:
lists

[[0, 1, 2], [3, 4, 5], [6, 7, 8]]

In [9]:
arr * 2

array([2, 4, 6, 8])

In [10]:
arr + 1

array([2, 3, 4, 5])

In [11]:
arr + arr

array([2, 4, 6, 8])

- Arrays from Scratch

In [12]:
np.zeros(100, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [14]:
np.ones((3,3), dtype=float)

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [15]:
np.arange(0, 20, 3)

array([ 0,  3,  6,  9, 12, 15, 18])

In [18]:
np.linspace(0, 99, 100)

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
       13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
       26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38.,
       39., 40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51.,
       52., 53., 54., 55., 56., 57., 58., 59., 60., 61., 62., 63., 64.,
       65., 66., 67., 68., 69., 70., 71., 72., 73., 74., 75., 76., 77.,
       78., 79., 80., 81., 82., 83., 84., 85., 86., 87., 88., 89., 90.,
       91., 92., 93., 94., 95., 96., 97., 98., 99.])

In [20]:
np.random.random((3,3)) #between 0 and 1

array([[0.72272688, 0.26273811, 0.91751955],
       [0.66655019, 0.06559376, 0.45631847],
       [0.84127621, 0.02784223, 0.22171306]])

In [21]:
np.random.randint(0, 10, (3,3)) # 3x3 array of random integers b/w 0 and 10

array([[8, 2, 9],
       [7, 9, 4],
       [0, 5, 8]])

In [23]:
np.random.normal(0 ,1 ,(3,3)) # 3x3 array of normally distributed values with - mean:0 and SD:1

array([[ 0.96877465, -1.02894186,  0.1408769 ],
       [-0.92826259,  1.60744995, -1.03088496],
       [ 1.80911917, -0.97937305, -0.25256432]])

In [24]:
np.random.randint(10, size=6) # 1-D array of random integers

array([6, 1, 9, 3, 1, 6])

In [25]:
np.random.randint(10, size=(3,3)) # 2-D array of random integers

array([[3, 6, 0],
       [6, 7, 6],
       [4, 9, 0]])

In [26]:
np.random.randint(10, size=(3,3, 3)) # 3-D array of random integers

array([[[1, 3, 4],
        [8, 2, 7],
        [4, 7, 1]],

       [[2, 8, 6],
        [7, 4, 4],
        [2, 3, 1]],

       [[0, 4, 8],
        [9, 8, 7],
        [1, 5, 8]]])

### 2.Array Attributes

- Each array has the following attributes
- **ndim**: the number of dimensions
- **shape**: size of each dim.
- **size**: total size of the array
- **dtype**: the datatype of the array
- **itemsize**: the size in byutes of each array element
- **nbytes**: the total size in bytes of the array

In [30]:
arr.ndim

1

In [32]:
arr.shape

(4,)

In [33]:
arr.size

4

In [34]:
arr.dtype

dtype('int32')

In [35]:
arr.itemsize

4

In [36]:
arr.nbytes

16

### 3.Array Indexing and Slicing

In [37]:
x1 = np.array([1,3,4,4,6,4])

In [38]:
x1[0]

1

In [39]:
x2 = np.array([[3,2,5,5],[0,1,5,8],[3,0,5,0]])

In [45]:
x2[2,0]

3

In [47]:
x2[2,-2]

5

- Array Slicing

In [48]:
a1 = np.arange(10)
a1

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [49]:
a1[:5]

array([0, 1, 2, 3, 4])

In [50]:
a1[4:]

array([4, 5, 6, 7, 8, 9])

In [51]:
a1[4:7]

array([4, 5, 6])

In [52]:
a1[::3]

array([0, 3, 6, 9])

In [57]:
a1[1::2]

array([1, 3, 5, 7, 9])

In [62]:
a1[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [63]:
a1[5::-2] 

array([5, 3, 1])

In [64]:
a2 = np.array([[0,1,2],[3,4,5],[6,7,8]])
a2

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [65]:
a2[:2,:2]

array([[0, 1],
       [3, 4]])

In [69]:
a2[1:,1:]

array([[4, 5],
       [7, 8]])

In [70]:
a2[:3,:2]

array([[0, 1],
       [3, 4],
       [6, 7]])

In [77]:
a2[::-1,]

array([[6, 7, 8],
       [3, 4, 5],
       [0, 1, 2]])

In [76]:
a2[::-1,::-1]

array([[8, 7, 6],
       [5, 4, 3],
       [2, 1, 0]])

In [78]:
a2_subcopy = a2[::-1,::-1].copy()

In [79]:
a2_subcopy

array([[8, 7, 6],
       [5, 4, 3],
       [2, 1, 0]])

### 4.Reshaping and Concatenation

- Reshaping of Arrays

In [82]:
a1 = np.arange(1,10)
a1

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [83]:
reshaped  = a1.reshape((3,3))

In [84]:
reshaped

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [85]:
x = np.array([1,2,3])
x

array([1, 2, 3])

In [87]:
x_rv = x.reshape((1,3)) # row vector
x_rv 

array([[1, 2, 3]])

In [89]:
x_cv = x.reshape((3,1)) # column vector
x_cv

array([[1],
       [2],
       [3]])

- Concatenation and Splitting of Arrays

In [90]:
x = np.array([1,2,3])
y = np.array([3,2,1])
z = [11,11,11]

np.concatenate([x,y,z])

array([ 1,  2,  3,  3,  2,  1, 11, 11, 11])

In [91]:
grid = np.array([[1,2,3], [4,5,6]])
np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [92]:
x = np.array([3,4,5])
grid = np.array([[1,2,3],[9,10,11]])

np.vstack([x, grid]) # vertically stack the arrays

array([[ 3,  4,  5],
       [ 1,  2,  3],
       [ 9, 10, 11]])

In [94]:
z = np.array([[19],[19]])
np.hstack([grid, z]) # horizontally stack the arrays

array([[ 1,  2,  3, 19],
       [ 9, 10, 11, 19]])

- Splitting

In [95]:
x = np.arange(10)

In [96]:
x1, x2, x3 = np.split(x, [3,6])

In [97]:
x1

array([0, 1, 2])

In [98]:
x2

array([3, 4, 5])

In [99]:
x3

array([6, 7, 8, 9])

In [101]:
grid = np.arange(16).reshape((4,4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [102]:
upper, lower = np.vsplit(grid, [2]) # vertical split

In [103]:
upper

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [104]:
lower

array([[ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [105]:
left, right = np.hsplit(grid, [2])

In [106]:
left

array([[ 0,  1],
       [ 4,  5],
       [ 8,  9],
       [12, 13]])

In [107]:
right

array([[ 2,  3],
       [ 6,  7],
       [10, 11],
       [14, 15]])

### 5. Numpy Arithmetic and Statistics

#### Computation on Numpy Arrays

- Mathematical function

In [110]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [109]:
np.add(x,5)

array([ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [111]:
np.subtract(x, 5)

array([-5, -4, -3, -2, -1,  0,  1,  2,  3,  4])

In [112]:
np.multiply(x, 5)

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45])

In [113]:
np.divide(x, 5)

array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8])

In [114]:
np.power(x,2)

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81], dtype=int32)

In [115]:
np.mod(x,2)

array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1], dtype=int32)

In [117]:
# trigonometric functions
theta = np.linspace(0, np.pi, 4)
theta

array([0.        , 1.04719755, 2.0943951 , 3.14159265])

In [118]:
np.sin(theta)

array([0.00000000e+00, 8.66025404e-01, 8.66025404e-01, 1.22464680e-16])

In [119]:
np.cos(theta)

array([ 1. ,  0.5, -0.5, -1. ])

In [120]:
np.tan(theta)

array([ 0.00000000e+00,  1.73205081e+00, -1.73205081e+00, -1.22464680e-16])

In [121]:
# exponential and logrithmic
x = [1,2,3]
x

[1, 2, 3]

In [122]:
np.exp(x)

array([ 2.71828183,  7.3890561 , 20.08553692])

In [123]:
np.exp2(x)

array([2., 4., 8.])

In [124]:
np.power(3,x)

array([ 3,  9, 27], dtype=int32)

In [125]:
np.log(x)

array([0.        , 0.69314718, 1.09861229])

In [126]:
np.log2(x)

array([0.       , 1.       , 1.5849625])

In [127]:
np.log10(x)

array([0.        , 0.30103   , 0.47712125])

- Universal Function Method

In [128]:
# Calling the reduce method

x = np.arange(1,6)
sum_all = np.add.reduce(x)
x

array([1, 2, 3, 4, 5])

In [129]:
sum_all

15

In [130]:
# Calling the accumulate function

x = np.arange(1, 6)
sum_acc = np.add.accumulate(x)
x

array([1, 2, 3, 4, 5])

In [131]:
sum_acc

array([ 1,  3,  6, 10, 15], dtype=int32)

- Aggregation

In [133]:
x = np.random.random(100)
x

array([0.89471125, 0.0657693 , 0.91248548, 0.14457525, 0.19292728,
       0.34024681, 0.35205705, 0.57258303, 0.48464835, 0.01423418,
       0.38055698, 0.31257249, 0.18186752, 0.19951357, 0.99402595,
       0.74008555, 0.92756274, 0.73823772, 0.176844  , 0.86710231,
       0.35475982, 0.17282542, 0.00455679, 0.56005884, 0.63432314,
       0.88439103, 0.84654374, 0.98727242, 0.88376328, 0.42814916,
       0.41461516, 0.87539437, 0.67965076, 0.52317947, 0.41490397,
       0.06571871, 0.9863163 , 0.24098464, 0.33994013, 0.8291768 ,
       0.33378844, 0.55969569, 0.68087119, 0.10785153, 0.23575741,
       0.33310979, 0.84659849, 0.85763505, 0.30742126, 0.24203615,
       0.88965113, 0.93470026, 0.9283705 , 0.30955147, 0.87364746,
       0.44830455, 0.4076283 , 0.79200008, 0.86348504, 0.52444444,
       0.83499123, 0.92556441, 0.77154419, 0.85109436, 0.00958409,
       0.0512138 , 0.51824268, 0.96270201, 0.74807518, 0.01018514,
       0.41712431, 0.25742808, 0.87873799, 0.99013723, 0.30393

In [134]:
np.sum(x) # sum of all values

51.61916208161917

In [135]:
np.mean(x) # mean value

0.5161916208161917

In [136]:
x.sum() # or this way 

51.61916208161917

In [137]:
x.mean()

0.5161916208161917

In [138]:
x.max()

0.9972663508580849

In [139]:
x.min()

0.004556788634932585

In [161]:
grid = np.random.random((3,4))
grid

array([[0.04832776, 0.12082794, 0.24289419, 0.58899404],
       [0.46870267, 0.68293512, 0.59763599, 0.740813  ],
       [0.78061686, 0.50879808, 0.92457353, 0.70353401]])

In [141]:
grid.sum()

5.2526579790452885

In [142]:
grid.min()

0.0759996438585111

In [143]:
np.amin(grid, axis=0)

array([0.37883263, 0.14317914, 0.07599964, 0.09077294])

In [144]:
np.amin(grid, axis=1)

array([0.5330446 , 0.07599964, 0.14317914])

### Comparisions and Boolean Masks

#### Comparisions

In [146]:
x = np.array([1,2,3,4,5])
x

array([1, 2, 3, 4, 5])

In [147]:
x<2

array([ True, False, False, False, False])

In [148]:
x>=4

array([False, False, False,  True,  True])

In [149]:
(2*x)==(x**2)

array([False,  True, False, False, False])

In [150]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [151]:
np.count_nonzero(x<6)

6

In [152]:
np.any(x>8)

True

In [155]:
np.all(x<10)

True

In [156]:
x = np.random.randint(0, 10, (3,3))
x

array([[9, 9, 6],
       [5, 0, 9],
       [6, 6, 8]])

In [157]:
x<6

array([[False, False, False],
       [ True,  True, False],
       [False, False, False]])

In [158]:
x[x<6]

array([5, 0])

In [187]:
x

array([[9, 9, 6],
       [5, 0, 9],
       [6, 6, 8]])


## Pandas

### 1.Pandas Series from Lists and Arrays

- A Pandas Series is a 1D array of indexed data essentially a column.
- It can be created from a list or any array using the ```pd.Series()```

In [188]:
import pandas as pd

In [190]:
series = pd.Series([0,1,2,5])
series

0    0
1    1
2    2
3    5
dtype: int64

In [191]:
series.values

array([0, 1, 2, 5], dtype=int64)

In [192]:
series.index

RangeIndex(start=0, stop=4, step=1)

- Pandas Series are m,uch more general and flexible tan the 1D Numpy arrays.
- Essential difference is presence of the index.
- Numpy array have implicitly defined integer index.
- Pandas Series has an explicitly defined integer index.

In [193]:
data = pd.Series([12,24,13,54], index=['a','b','c','d'])

In [194]:
data

a    12
b    24
c    13
d    54
dtype: int64

### 2.Pandas Series from Dictionaries

In [195]:
fruits_dict = {'apples':10,
               'oranges':8,
               'bananas':3,
               'strawberries':20}

fruits_dict

{'apples': 10, 'oranges': 8, 'bananas': 3, 'strawberries': 20}

In [196]:
fruits = pd.Series(fruits_dict)
fruits

apples          10
oranges          8
bananas          3
strawberries    20
dtype: int64

In [199]:
fruits['bananas':'strawberries']

bananas          3
strawberries    20
dtype: int64

### 3.The DataFrame Object

- While a series is essentially a column, a DataFrame is a multi-Dimensional table made up of a collection of series.
- DataFrames allow us to store and manipulate, tabular data.
- Where **Rows consists of observations and Columns represent variables**.

#### DataFrame from a Series Object

In [201]:
data_s1 = pd.Series([12,24,33,15], index=['apples','bananas','strawberries','oranges'])
data_s1

apples          12
bananas         24
strawberries    33
oranges         15
dtype: int64

In [202]:
dataFrame1 = pd.DataFrame(data_s1, columns=['quantity'])
dataFrame1

Unnamed: 0,quantity
apples,12
bananas,24
strawberries,33
oranges,15


#### Constructing a DataFrame from a Dictionary

In [203]:
dict = {"country":["Norway", "Sweden", "Spain", "France"], 
        "capital":["Oslo", "Stockholm", "Madrid", "Paris"],
        "SomeColumn":["100", "200", "300", "400"]}
dict

{'country': ['Norway', 'Sweden', 'Spain', 'France'],
 'capital': ['Oslo', 'Stockholm', 'Madrid', 'Paris'],
 'SomeColumn': ['100', '200', '300', '400']}

In [205]:
data = pd.DataFrame(dict)
data

Unnamed: 0,country,capital,SomeColumn
0,Norway,Oslo,100
1,Sweden,Stockholm,200
2,Spain,Madrid,300
3,France,Paris,400


In [208]:
quantity = pd.Series([12,24,33,15],
                    index=['apples', 'bananas', 'strawberries', 'oranges'])
price = pd.Series([4,4.5,8,7.5],
                 index=['apples', 'bananas', 'strawberries', 'oranges'])
df = pd.DataFrame({'quantity':quantity,
                   'price':price})
df

Unnamed: 0,quantity,price
apples,12,4.0
bananas,24,4.5
strawberries,33,8.0
oranges,15,7.5


#### Constructing a DataFrame by Importing Data From a File

In [210]:
df = pd.read_csv('data1.csv')
df

Unnamed: 0,"ICO;""year"";""def"";""class"";""acid.test"";""debt.ratio"";""asset.turn"";""returns"""
0,10559655;2008;0;0;2.83499361430396;0.076154063...
1,10559655;2009;0;0;4.53730056058646;0.075103170...
2,10559655;2010;0;0;5.79582183681514;0.082956868...
3,10559655;2011;0;0;5.65494106980961;0.090329008...
4,10559655;2012;0;0;3.1528676389955;0.1222033522...
5,10559655;2013;0;0;2.26041379310345;0.133867456...
6,112402;2008;0;0;1.50984346489964;0.35932796120...
7,112402;2009;0;0;0.568969527797162;0.3822041081...


### 4. Pandas DataFrame Operations - Read, View and Extract Information

#### Learning Pandas with IMDB - Movies Dataset

In [214]:
movies_df = pd.read_csv("IMDB-Movie-Data.csv")
# 1.Set the index at load time
# movies_df_title_indexed = pd.read_csv("IMDB-Movie-Data.csv", index_col="Title")
# 2.Set index after DF has been created
movies_df_title_indexed = movies_df.set_index("Title")

In [215]:
movies_df.head()

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
2,3,Split,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
3,4,Sing,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0
4,5,Suicide Squad,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0


In [216]:
movies_df.head(2)

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0


In [217]:
movies_df_title_indexed.head()

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Guardians of the Galaxy,1,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
Split,3,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
Sing,4,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0
Suicide Squad,5,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0


In [218]:
movies_df_title_indexed.head(2)

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Guardians of the Galaxy,1,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0


In [219]:
movies_df_title_indexed.tail(2)

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Search Party,999,"Adventure,Comedy",A pair of friends embark on a mission to reuni...,Scot Armstrong,"Adam Pally, T.J. Miller, Thomas Middleditch,Sh...",2014,93,5.6,4881,,22.0
Nine Lives,1000,"Comedy,Family,Fantasy",A stuffy businessman finds himself trapped ins...,Barry Sonnenfeld,"Kevin Spacey, Jennifer Garner, Robbie Amell,Ch...",2016,87,5.3,12435,19.64,11.0


In [220]:
movies_df_title_indexed.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1000 entries, Guardians of the Galaxy to Nine Lives
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Rank                1000 non-null   int64  
 1   Genre               1000 non-null   object 
 2   Description         1000 non-null   object 
 3   Director            1000 non-null   object 
 4   Actors              1000 non-null   object 
 5   Year                1000 non-null   int64  
 6   Runtime (Minutes)   1000 non-null   int64  
 7   Rating              1000 non-null   float64
 8   Votes               1000 non-null   int64  
 9   Revenue (Millions)  872 non-null    float64
 10  Metascore           936 non-null    float64
dtypes: float64(3), int64(4), object(4)
memory usage: 93.8+ KB


In [221]:
movies_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Rank                1000 non-null   int64  
 1   Title               1000 non-null   object 
 2   Genre               1000 non-null   object 
 3   Description         1000 non-null   object 
 4   Director            1000 non-null   object 
 5   Actors              1000 non-null   object 
 6   Year                1000 non-null   int64  
 7   Runtime (Minutes)   1000 non-null   int64  
 8   Rating              1000 non-null   float64
 9   Votes               1000 non-null   int64  
 10  Revenue (Millions)  872 non-null    float64
 11  Metascore           936 non-null    float64
dtypes: float64(3), int64(4), object(5)
memory usage: 93.9+ KB


In [223]:
movies_df.shape

(1000, 12)

In [224]:
movies_df_title_indexed.shape

(1000, 11)

In [225]:
movies_df.describe()

Unnamed: 0,Rank,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
count,1000.0,1000.0,1000.0,1000.0,1000.0,872.0,936.0
mean,500.5,2012.783,113.172,6.7232,169808.3,82.956376,58.985043
std,288.819436,3.205962,18.810908,0.945429,188762.6,103.25354,17.194757
min,1.0,2006.0,66.0,1.9,61.0,0.0,11.0
25%,250.75,2010.0,100.0,6.2,36309.0,13.27,47.0
50%,500.5,2014.0,111.0,6.8,110799.0,47.985,59.5
75%,750.25,2016.0,123.0,7.4,239909.8,113.715,72.0
max,1000.0,2016.0,191.0,9.0,1791916.0,936.63,100.0


In [226]:
movies_df_title_indexed.describe()

Unnamed: 0,Rank,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
count,1000.0,1000.0,1000.0,1000.0,1000.0,872.0,936.0
mean,500.5,2012.783,113.172,6.7232,169808.3,82.956376,58.985043
std,288.819436,3.205962,18.810908,0.945429,188762.6,103.25354,17.194757
min,1.0,2006.0,66.0,1.9,61.0,0.0,11.0
25%,250.75,2010.0,100.0,6.2,36309.0,13.27,47.0
50%,500.5,2014.0,111.0,6.8,110799.0,47.985,59.5
75%,750.25,2016.0,123.0,7.4,239909.8,113.715,72.0
max,1000.0,2016.0,191.0,9.0,1791916.0,936.63,100.0


In [227]:
genre_col = movies_df['Genre']
genre_col

0       Action,Adventure,Sci-Fi
1      Adventure,Mystery,Sci-Fi
2               Horror,Thriller
3       Animation,Comedy,Family
4      Action,Adventure,Fantasy
                 ...           
995         Crime,Drama,Mystery
996                      Horror
997         Drama,Music,Romance
998            Adventure,Comedy
999       Comedy,Family,Fantasy
Name: Genre, Length: 1000, dtype: object

In [228]:
col_as_series = movies_df['Genre']
col_as_series.head()

0     Action,Adventure,Sci-Fi
1    Adventure,Mystery,Sci-Fi
2             Horror,Thriller
3     Animation,Comedy,Family
4    Action,Adventure,Fantasy
Name: Genre, dtype: object

In [229]:
col_as_df = movies_df[['Genre']]
col_as_df

Unnamed: 0,Genre
0,"Action,Adventure,Sci-Fi"
1,"Adventure,Mystery,Sci-Fi"
2,"Horror,Thriller"
3,"Animation,Comedy,Family"
4,"Action,Adventure,Fantasy"
...,...
995,"Crime,Drama,Mystery"
996,Horror
997,"Drama,Music,Romance"
998,"Adventure,Comedy"


In [230]:
extracted_cols = movies_df_title_indexed[['Genre','Rating','Revenue (Millions)']]
extracted_cols.head()

Unnamed: 0_level_0,Genre,Rating,Revenue (Millions)
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Guardians of the Galaxy,"Action,Adventure,Sci-Fi",8.1,333.13
Prometheus,"Adventure,Mystery,Sci-Fi",7.0,126.46
Split,"Horror,Thriller",7.3,138.12
Sing,"Animation,Comedy,Family",7.2,270.32
Suicide Squad,"Action,Adventure,Fantasy",6.2,325.02


In [236]:
gog = movies_df_title_indexed.loc["Guardians of the Galaxy"]
gog

Rank                                                                  1
Genre                                           Action,Adventure,Sci-Fi
Description           A group of intergalactic criminals are forced ...
Director                                                     James Gunn
Actors                Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...
Year                                                               2014
Runtime (Minutes)                                                   121
Rating                                                              8.1
Votes                                                            757074
Revenue (Millions)                                               333.13
Metascore                                                          76.0
Name: Guardians of the Galaxy, dtype: object

In [242]:
movies_df_title_indexed.iloc[0]

Rank                                                                  1
Genre                                           Action,Adventure,Sci-Fi
Description           A group of intergalactic criminals are forced ...
Director                                                     James Gunn
Actors                Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...
Year                                                               2014
Runtime (Minutes)                                                   121
Rating                                                              8.1
Votes                                                            757074
Revenue (Millions)                                               333.13
Metascore                                                          76.0
Name: Guardians of the Galaxy, dtype: object

In [240]:
movies_df_title_indexed.loc["Guardians of the Galaxy":"Sing"]

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Guardians of the Galaxy,1,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
Split,3,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
Sing,4,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0


In [241]:
movies_df_title_indexed.iloc[0:4]

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Guardians of the Galaxy,1,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
Split,3,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
Sing,4,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0


In [243]:
movies_df_title_indexed.loc[:'Sing', :'Director']

Unnamed: 0_level_0,Rank,Genre,Description,Director
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Guardians of the Galaxy,1,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott
Split,3,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan
Sing,4,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet


In [250]:
movies_df_title_indexed.iloc[:2,:3]

Unnamed: 0_level_0,Rank,Genre,Description
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Guardians of the Galaxy,1,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te..."


In [255]:
movies_df_title_indexed[movies_df_title_indexed['Year']==2016].shape

(297, 11)

In [257]:
movies_df_title_indexed[movies_df_title_indexed['Rating']>8.0].shape

(59, 11)

In [259]:
movies_df.groupby('Director').sum()

Unnamed: 0_level_0,Rank,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Director,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Aamir Khan,992,2007,165,8.5,102697,1.20,42.0
Abdellatif Kechiche,312,2013,180,7.8,103150,2.20,88.0
Adam Leon,784,2016,82,6.5,1031,0.00,77.0
Adam McKay,1910,8039,443,28.0,806827,438.14,262.0
Adam Shankman,1460,4019,240,12.6,167467,157.33,128.0
...,...,...,...,...,...,...,...
Xavier Dolan,1588,4030,236,15.1,44218,3.49,122.0
Yimou Zhang,6,2016,103,6.1,56036,45.13,42.0
Yorgos Lanthimos,479,4024,213,14.4,172259,8.81,155.0
Zack Snyder,904,10055,683,35.2,2301544,975.74,240.0


In [260]:
movies_df.groupby('Director')[['Rating']].mean()

Unnamed: 0_level_0,Rating
Director,Unnamed: 1_level_1
Aamir Khan,8.50
Abdellatif Kechiche,7.80
Adam Leon,6.50
Adam McKay,7.00
Adam Shankman,6.30
...,...
Xavier Dolan,7.55
Yimou Zhang,6.10
Yorgos Lanthimos,7.20
Zack Snyder,7.04


In [262]:
movies_df.groupby('Director')[['Revenue (Millions)']].sum().sort_values(['Revenue (Millions)'], ascending=False).head()

Unnamed: 0_level_0,Revenue (Millions)
Director,Unnamed: 1_level_1
J.J. Abrams,1683.45
David Yates,1630.51
Christopher Nolan,1515.09
Michael Bay,1421.32
Francis Lawrence,1299.81


In [264]:
movies_df_title_indexed.isnull().head()

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Guardians of the Galaxy,False,False,False,False,False,False,False,False,False,False,False
Prometheus,False,False,False,False,False,False,False,False,False,False,False
Split,False,False,False,False,False,False,False,False,False,False,False
Sing,False,False,False,False,False,False,False,False,False,False,False
Suicide Squad,False,False,False,False,False,False,False,False,False,False,False


In [265]:
movies_df_title_indexed.isnull().sum()

Rank                    0
Genre                   0
Description             0
Director                0
Actors                  0
Year                    0
Runtime (Minutes)       0
Rating                  0
Votes                   0
Revenue (Millions)    128
Metascore              64
dtype: int64

In [267]:
# drop all rows with missing data
movies_df_title_indexed.dropna()

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Guardians of the Galaxy,1,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
Split,3,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
Sing,4,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0
Suicide Squad,5,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0
...,...,...,...,...,...,...,...,...,...,...,...
Resident Evil: Afterlife,994,"Action,Adventure,Horror",While still out to destroy the evil Umbrella C...,Paul W.S. Anderson,"Milla Jovovich, Ali Larter, Wentworth Miller,K...",2010,97,5.9,140900,60.13,37.0
Project X,995,Comedy,3 high school seniors throw a birthday party t...,Nima Nourizadeh,"Thomas Mann, Oliver Cooper, Jonathan Daniel Br...",2012,88,6.7,164088,54.72,48.0
Hostel: Part II,997,Horror,Three American college students studying abroa...,Eli Roth,"Lauren German, Heather Matarazzo, Bijou Philli...",2007,94,5.5,73152,17.54,46.0
Step Up 2: The Streets,998,"Drama,Music,Romance",Romantic sparks occur between two dance studen...,Jon M. Chu,"Robert Hoffman, Briana Evigan, Cassie Ventura,...",2008,98,6.2,70699,58.01,50.0


In [268]:
movies_df_title_indexed.dropna(axis=1)

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Guardians of the Galaxy,1,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820
Split,3,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606
Sing,4,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545
Suicide Squad,5,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727
...,...,...,...,...,...,...,...,...,...
Secret in Their Eyes,996,"Crime,Drama,Mystery","A tight-knit team of rising investigators, alo...",Billy Ray,"Chiwetel Ejiofor, Nicole Kidman, Julia Roberts...",2015,111,6.2,27585
Hostel: Part II,997,Horror,Three American college students studying abroa...,Eli Roth,"Lauren German, Heather Matarazzo, Bijou Philli...",2007,94,5.5,73152
Step Up 2: The Streets,998,"Drama,Music,Romance",Romantic sparks occur between two dance studen...,Jon M. Chu,"Robert Hoffman, Briana Evigan, Cassie Ventura,...",2008,98,6.2,70699
Search Party,999,"Adventure,Comedy",A pair of friends embark on a mission to reuni...,Scot Armstrong,"Adam Pally, T.J. Miller, Thomas Middleditch,Sh...",2014,93,5.6,4881


In [274]:
revenue = movies_df_title_indexed['Revenue (Millions)']
revenue_mean = revenue.mean()
print(revenue)
print(revenue_mean)

Title
Guardians of the Galaxy    333.13
Prometheus                 126.46
Split                      138.12
Sing                       270.32
Suicide Squad              325.02
                            ...  
Secret in Their Eyes          NaN
Hostel: Part II             17.54
Step Up 2: The Streets      58.01
Search Party                  NaN
Nine Lives                  19.64
Name: Revenue (Millions), Length: 1000, dtype: float64
82.95637614678898


In [275]:
revenue.fillna(revenue_mean,inplace=True)

In [276]:
revenue

Title
Guardians of the Galaxy    333.130000
Prometheus                 126.460000
Split                      138.120000
Sing                       270.320000
Suicide Squad              325.020000
                              ...    
Secret in Their Eyes        82.956376
Hostel: Part II             17.540000
Step Up 2: The Streets      58.010000
Search Party                82.956376
Nine Lives                  19.640000
Name: Revenue (Millions), Length: 1000, dtype: float64

In [278]:
movies_df_title_indexed['Revenue per Min']=movies_df_title_indexed['Revenue (Millions)']/movies_df_title_indexed['Runtime (Minutes)']

In [280]:
movies_df_title_indexed

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore,Revenue per Min
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Guardians of the Galaxy,1,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.130000,76.0,2.753140
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.460000,65.0,1.019839
Split,3,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.120000,62.0,1.180513
Sing,4,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.320000,59.0,2.502963
Suicide Squad,5,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.020000,40.0,2.642439
...,...,...,...,...,...,...,...,...,...,...,...,...
Secret in Their Eyes,996,"Crime,Drama,Mystery","A tight-knit team of rising investigators, alo...",Billy Ray,"Chiwetel Ejiofor, Nicole Kidman, Julia Roberts...",2015,111,6.2,27585,82.956376,45.0,0.747355
Hostel: Part II,997,Horror,Three American college students studying abroa...,Eli Roth,"Lauren German, Heather Matarazzo, Bijou Philli...",2007,94,5.5,73152,17.540000,46.0,0.186596
Step Up 2: The Streets,998,"Drama,Music,Romance",Romantic sparks occur between two dance studen...,Jon M. Chu,"Robert Hoffman, Briana Evigan, Cassie Ventura,...",2008,98,6.2,70699,58.010000,50.0,0.591939
Search Party,999,"Adventure,Comedy",A pair of friends embark on a mission to reuni...,Scot Armstrong,"Adam Pally, T.J. Miller, Thomas Middleditch,Sh...",2014,93,5.6,4881,82.956376,22.0,0.892004


In [281]:
# pivot table
movies_df_title_indexed.pivot_table('Revenue (Millions)', index='Director', aggfunc='sum', columns='Year')

Year,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Director,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Aamir Khan,,1.20,,,,,,,,,
Abdellatif Kechiche,,,,,,,,2.20,,,
Adam Leon,,,,,,,,,,,82.956376
Adam McKay,148.21,,100.47,,119.22,,,,,70.24,
Adam Shankman,,118.82,,,,,38.51,,,,
...,...,...,...,...,...,...,...,...,...,...,...
Xavier Dolan,,,,,,,,,3.49,,82.956376
Yimou Zhang,,,,,,,,,,,45.130000
Yorgos Lanthimos,,,,0.11,,,,,,8.70,
Zack Snyder,210.59,,,107.50,,36.38,,291.02,,,330.250000


- **Applying Functions**

In [282]:
# put movies in a bucket based on their Rating
def rating_bucket(x):
    if x >= 8.0:
        return "great"
    elif x>=7.0:
        return "good"
    elif x>= 6.0:
        return "average"
    else:
        return "bad"

movies_df_title_indexed["Rating_Category"] = movies_df_title_indexed["Rating"].apply(rating_bucket)

movies_df_title_indexed.head(10)[["Rating", "Rating_Category"]]

Unnamed: 0_level_0,Rating,Rating_Category
Title,Unnamed: 1_level_1,Unnamed: 2_level_1
Guardians of the Galaxy,8.1,great
Prometheus,7.0,good
Split,7.3,good
Sing,7.2,good
Suicide Squad,6.2,average
The Great Wall,6.1,average
La La Land,8.3,great
Mindhorn,6.4,average
The Lost City of Z,7.1,good
Passengers,7.0,good


In [289]:
movies_df.iloc[0]

Rank                                                                  1
Title                                           Guardians of the Galaxy
Genre                                           Action,Adventure,Sci-Fi
Description           A group of intergalactic criminals are forced ...
Director                                                     James Gunn
Actors                Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...
Year                                                               2014
Runtime (Minutes)                                                   121
Rating                                                              8.1
Votes                                                            757074
Revenue (Millions)                                               333.13
Metascore                                                          76.0
Name: 0, dtype: object

In [311]:
top = movies_df[movies_df['Votes']>100000]
top = top[top['Rating']>8.5]

In [312]:
top

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
36,37,Interstellar,"Adventure,Drama,Sci-Fi",A team of explorers travel through a wormhole ...,Christopher Nolan,"Matthew McConaughey, Anne Hathaway, Jessica Ch...",2014,169,8.6,1047747,187.99,74.0
54,55,The Dark Knight,"Action,Crime,Drama",When the menace known as the Joker wreaks havo...,Christopher Nolan,"Christian Bale, Heath Ledger, Aaron Eckhart,Mi...",2008,152,9.0,1791916,533.32,82.0
80,81,Inception,"Action,Adventure,Sci-Fi","A thief, who steals corporate secrets through ...",Christopher Nolan,"Leonardo DiCaprio, Joseph Gordon-Levitt, Ellen...",2010,148,8.8,1583625,292.57,74.0
249,250,The Intouchables,"Biography,Comedy,Drama",After he becomes a quadriplegic from a paragli...,Olivier Nakache,"François Cluzet, Omar Sy, Anne Le Ny, Audrey F...",2011,112,8.6,557965,13.18,57.0
