**Pandas** is a Python library for data analysis and manipulation.
It offers simple tools to work with structured data like tables and time series. 
Learning pandas is important because it makes data cleaning, transformation, and visualization easier. Whether you have small or large datasets, pandas is the key tool for efficient data processing in data science.

In this series of notebooks, we’ll cover the basics of pandas. You’ll learn enough to get started and perform complex data analyses with confidence.

For a more detailed understanding of pandas features and functions, visit the [pandas User Guide](https://pandas.pydata.org/docs/user_guide/index.html).
It’s an excellent resource with comprehensive examples and clear explanations.

# The pandas DataFrame

We’ll use this convention for pandas: `import pandas as pd`. Whenever you see `pd`, it refers to pandas.

In [1]:
import pandas as pd 

Pandas has two main data structures: `Series` and `DataFrames`.

- **`Series`**: A one-dimensional array-like object with a sequence of values, all of the same type, and an associated index of labels.

- **`DataFrame`**: A rectangular table of data with an ordered collection of columns, which can be of different types. It has both row and column labels.

**Table of Contents:**

- [Series introduction](#1.-Series-Introduction)
- [DataFrames introduction](#2.-DataFrame-Introduction)
- [Selecting a Series from a DataFrame](#3.-Selecting-a-Series-from-a-DataFrame)
- [Renaming columns in a pandas DataFrame](#4.-Renaming-columns-in-a-pandas-DataFrame)
- [Removing columns from a pandas DataFrame](#5.-Removing-columns-from-a-pandas-DataFrame)
- [Selecting multiple rows and columns from a pandas DataFrame](#6.-Selecting-multiple-rows-and-columns-from-a-pandas-DataFrame)
- [Handling missing values in pandas](#7.-Handling-missing-values-in-pandas)

## 1. Series Introduction

Series are used to represent one-dimensional data. You can create a Series from an array of values.

In [2]:
s1 = pd.Series([3,10,0,1,20])
s1

0     3
1    10
2     0
3     1
4    20
dtype: int64

The leftmost column is the index, which by default consists of monotonically increasing integers.
The rightmost column contains the values of the Series. The image below labels all the major components of a Series.

<img src="images\series_anatomy.png" alt="drawing" width="700"/>

A Series in pandas has several **attributes** that provide useful information about its structure. 
You can access these attributes using dot notation. 
For example, you can use `.name`, `.index`, `.values`, and `.dtype` to access key aspects of a Series. 

The `.name` attribute stores the name of the Series, `.index` gives the labels for the data, `.values` returns the data itself as an array, and .dtype shows the data type of the values. 

In [3]:
# Create a toy Series with an index of letters
s2 = pd.Series([3,10,0,1,20],
               index=['a','b','c','d','e'],
               name='my first series')
s2

a     3
b    10
c     0
d     1
e    20
Name: my first series, dtype: int64

In [4]:
# Access the name of the Series
s2.name

'my first series'

In [5]:
# Access the index of the Series
s2.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [6]:
# Access the values of the Series
s2.values

array([ 3, 10,  0,  1, 20], dtype=int64)

In [7]:
# Access the data type of the Series
s2.dtype

dtype('int64')

**Common `pandas` data types:**

| Type | Description |
| --- | :-- |
| `float64` | Numpy **float** (decimal) type |
| `Int64` | Numpy **integer** type |
| `object` | Numpy type for storing **strings** |
| `category` | pandas **categorical** type |
| `bool` | Numpy **Boolean** type |
| `datetime64[ns]` | NumPy **date** type | 

You can use index labels to select a single value or a set of values.

In [8]:
# select one value use the index
s2['c']

0

When you want to select multiple values from a Series, use a list of index labels inside double brackets. The list lets you specify more than one label at a time, returning the corresponding values in the Series.

In [9]:
# Select multiple values using index labels
s2[['a','d']]

a    3
d    1
Name: my first series, dtype: int64

You can also select values in any orde

In [10]:
s2[['e','d','c']]

e    20
d     1
c     0
Name: my first series, dtype: int64

**Loading a Series from a CSV (comma-separate value) file:** 
Pandas makes it easy to load data from a CSV file using the `.read_csv` method. 
This method reads the file and automatically parses its content into a Series or DataFrame, depending on the structure of the data. 
For more details, check out the official documentation for [`read_csv`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)

The following Series contains data on reported crime incidents in Montana.

In [11]:
path = 'https://raw.githubusercontent.com/um-perez-alvaro/Data-Science-Practice/master/Data/incidents_MT.csv'
s = pd.read_csv(path, index_col='county').squeeze("columns")
s

county
YELLOWSTONE        10095
MISSOULA            6195
CASCADE             5740
FLATHEAD            3876
GALLATIN            3690
LEWIS AND CLARK     3246
SILVER BOW          2534
LAKE                1385
HILL                1165
RAVALLI             1040
ROOSEVELT            580
CUSTER               499
DEER LODGE           489
PARK                 409
RICHLAND             409
LINCOLN              378
GLACIER              325
CARBON               290
JEFFERSON            265
FERGUS               230
BIG HORN             226
VALLEY               224
STILLWATER           205
TOOLE                188
DAWSON               184
POWELL               153
MADISON              152
BROADWATER           145
SANDERS              142
PHILLIPS             141
BEAVERHEAD           129
MUSSELSHELL          112
SWEET GRASS          107
TETON                104
ROSEBUD               99
WHEATLAND             52
FALLON                45
MEAGHER               39
PONDERA               35
SHERIDAN          

In this example, we use ``.squeeze("columns")`` to convert the result into a Series instead of a DataFrame. By default, `pd.read_csv` returns a DataFrame, even if there’s only one column.

## 2. DataFrame Introduction

Let’s load a CSV file containing information about movies.

In [12]:
path = 'https://raw.githubusercontent.com/um-perez-alvaro/Data-Science-Practice/master/Data/movies.csv'
df = pd.read_csv(path)
df

Unnamed: 0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
4913,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
4914,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


We use `df` as a common shorthand for "DataFrame." It’s a standard convention that keeps the code clear and concise.

The image below shows a labeled diagram of all the major components of a DataFrame.

<img src="dataframe_anatomy.png" alt="drawing" width="700"/>

<div class="admonition note alert alert-info">
<p class="first admonition-title" style="font-weight: bold;">Note</p>
<p>pandas uses NaN (Not a Number) to represent missing values.</p>
</div>

Like Series, DataFrames also have attributes that help you work with their structure.
You can use the `.columns`, `.index`, and `.values` attributes to access the columns, index, and data of a DataFrame, respectively. 

In [13]:
# dataframe columns
df.columns

Index(['color', 'director name', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
       'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
       'movie title', 'num_voted_users', 'cast_total_facebook_likes',
       'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
       'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',
       'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',
       'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],
      dtype='object')

In [14]:
# dataframe index
df.index

RangeIndex(start=0, stop=4916, step=1)

In [15]:
# dataframe data
df.values

array([['Color', 'James Cameron', 723.0, ..., 7.9, 1.78, 33000],
       ['Color', 'Gore Verbinski', 302.0, ..., 7.1, 2.35, 0],
       ['Color', 'Sam Mendes', 602.0, ..., 6.8, 2.35, 85000],
       ...,
       ['Color', 'Benjamin Roberds', 13.0, ..., 6.3, nan, 16],
       ['Color', 'Daniel Hsia', 14.0, ..., 6.3, 2.35, 660],
       ['Color', 'Jon Gunn', 43.0, ..., 6.6, 1.85, 456]], dtype=object)

You can use the `.dtypes` attribute to display the data type of each column along with its name.

In [16]:
df.dtypes

color                         object
director name                 object
num_critic_for_reviews       float64
duration                     float64
director_facebook_likes      float64
actor_3_facebook_likes       float64
actor_2_name                  object
actor_1_facebook_likes       float64
gross                        float64
genres                        object
actor_1_name                  object
movie title                   object
num_voted_users                int64
cast_total_facebook_likes      int64
actor_3_name                  object
facenumber_in_poster         float64
plot_keywords                 object
movie_imdb_link               object
num_user_for_reviews         float64
language                      object
country                       object
content_rating                object
budget                       float64
title_year                   float64
actor_2_facebook_likes       float64
imdb_score                   float64
aspect_ratio                 float64
m

When the data type is `object`, it typically refers to strings.

<div class="admonition note alert alert-info">
<p class="first admonition-title" style="font-weight: bold;">Note</p>
<p>In general, data can be classified as either continuous or categorical.</p>
<p> <b>Continuous</b> represents measurements, like duration or score, and can take on an infinite range of values.
<p> <b>Categorical</b> represents discrete, finite values, such as car color or movie genre.
</div>

Let’s start exploring our DataFrame by looking at some key details and summary information.

The `.head()` method displays the first five rows of the DataFrame by default. 

In [17]:
# examine the first 5 rows
df.head()

Unnamed: 0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0


The `.head()` method includes parentheses because it can take an optional argument specifying how many rows to display. For example, `head(5)` shows the first 5 rows, which is the same as `.head()` since 5 is the default.

In [18]:
df.head(5)

Unnamed: 0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0


However, we can use different arguments to display more or fewer rows.
For example, `.head(10)` shows the first 10 rows.

In [19]:
df.head(10)

Unnamed: 0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
5,Color,Andrew Stanton,462.0,132.0,475.0,530.0,Samantha Morton,640.0,73058679.0,Action|Adventure|Sci-Fi,...,738.0,English,USA,PG-13,263700000.0,2012.0,632.0,6.6,2.35,24000
6,Color,Sam Raimi,392.0,156.0,0.0,4000.0,James Franco,24000.0,336530303.0,Action|Adventure|Romance,...,1902.0,English,USA,PG-13,258000000.0,2007.0,11000.0,6.2,2.35,0
7,Color,Nathan Greno,324.0,100.0,15.0,284.0,Donna Murphy,799.0,200807262.0,Adventure|Animation|Comedy|Family|Fantasy|Musi...,...,387.0,English,USA,PG,260000000.0,2010.0,553.0,7.8,1.85,29000
8,Color,Joss Whedon,635.0,141.0,0.0,19000.0,Robert Downey Jr.,26000.0,458991599.0,Action|Adventure|Sci-Fi,...,1117.0,English,USA,PG-13,250000000.0,2015.0,21000.0,7.5,2.35,118000
9,Color,David Yates,375.0,153.0,282.0,10000.0,Daniel Radcliffe,25000.0,301956980.0,Adventure|Family|Fantasy|Mystery,...,973.0,English,UK,PG,250000000.0,2009.0,11000.0,7.5,2.35,10000


There are equivalent methods to display other parts of the DataFrame. 
You can use `.tail()` to show the last rows.

In [20]:
# examine the last 5 rows
df.tail()

Unnamed: 0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.0,32000
4913,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
4914,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660
4915,Color,Jon Gunn,43.0,90.0,16.0,16.0,Brian Herzlinger,86.0,85222.0,Documentary,...,84.0,English,USA,PG,1100.0,2004.0,23.0,6.6,1.85,456


In [21]:
# examine the last 10 rows
df.tail(10)

Unnamed: 0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
4906,Color,Shane Carruth,143.0,77.0,291.0,8.0,David Sullivan,291.0,424760.0,Drama|Sci-Fi|Thriller,...,371.0,English,USA,PG-13,7000.0,2004.0,45.0,7.0,1.85,19000
4907,Color,Neill Dela Llana,35.0,80.0,0.0,0.0,Edgar Tancangco,0.0,70071.0,Thriller,...,35.0,English,Philippines,Not Rated,7000.0,2005.0,0.0,6.3,,74
4908,Color,Robert Rodriguez,56.0,81.0,0.0,6.0,Peter Marquardt,121.0,2040920.0,Action|Crime|Drama|Romance|Thriller,...,130.0,Spanish,USA,R,7000.0,1992.0,20.0,6.9,1.37,0
4909,Color,Anthony Vallone,,84.0,2.0,2.0,John Considine,45.0,,Crime|Drama,...,1.0,English,USA,PG-13,3250.0,2005.0,44.0,7.8,,4
4910,Color,Edward Burns,14.0,95.0,0.0,133.0,Caitlin FitzGerald,296.0,4584.0,Comedy|Drama,...,14.0,English,USA,Not Rated,9000.0,2011.0,205.0,6.4,,413
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.0,32000
4913,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
4914,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660
4915,Color,Jon Gunn,43.0,90.0,16.0,16.0,Brian Herzlinger,86.0,85222.0,Documentary,...,84.0,English,USA,PG,1100.0,2004.0,23.0,6.6,1.85,456


You can use `.sample()` to view a random sample of rows. By default, `.sample()` returns just one row, but you can specify a different number if needed.

In [22]:
# display a random sample of 10 rows
df.sample(10)

Unnamed: 0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
4157,Color,Terence Young,167.0,115.0,92.0,177.0,Daniela Bianchi,559.0,24800000.0,Action|Adventure|Thriller,...,358.0,English,UK,Approved,2000000.0,1963.0,201.0,7.5,1.37,0
431,Color,Sylvester Stallone,424.0,113.0,13000.0,5000.0,Sylvester Stallone,26000.0,102981571.0,Action|Adventure|Thriller,...,741.0,English,USA,R,80000000.0,2010.0,13000.0,6.5,2.35,57000
1728,Black and White,Lawrence Kasdan,65.0,112.0,759.0,933.0,Alfre Woodard,11000.0,4554569.0,Comedy|Drama,...,123.0,English,USA,R,28000000.0,1999.0,1000.0,6.9,2.35,209
559,Color,Brian De Palma,117.0,98.0,0.0,697.0,Mike Starr,12000.0,55585389.0,Crime|Mystery|Thriller,...,241.0,English,USA,R,69000000.0,1998.0,854.0,5.9,2.35,880
4268,Color,Maggie Greenwald,39.0,109.0,9.0,280.0,David Patrick Kelly,767.0,3050934.0,Drama|Music,...,78.0,English,USA,PG-13,1800000.0,2000.0,380.0,7.3,1.85,0
2671,Color,Wil Shriner,49.0,91.0,6.0,596.0,Neil Flynn,8000.0,8080116.0,Adventure|Comedy|Family,...,66.0,English,USA,PG,15000000.0,2006.0,625.0,5.6,1.85,647
2506,Color,Peter Farrelly,84.0,113.0,137.0,481.0,Mike Starr,879.0,127175354.0,Comedy,...,438.0,English,USA,PG-13,16000000.0,1994.0,854.0,7.3,1.85,0
2625,Color,Elizabeth Allen Rosenbaum,81.0,103.0,20.0,125.0,Hutch Dano,512.0,26161406.0,Adventure|Comedy|Family|Fantasy,...,52.0,English,USA,G,15000000.0,2010.0,316.0,6.7,2.35,0
4553,Color,Victor Nunez,11.0,114.0,9.0,86.0,Todd Field,159.0,1001437.0,Drama|Romance,...,28.0,English,USA,R,800000.0,1993.0,143.0,7.2,1.85,81
1201,Color,Michael Mann,89.0,117.0,0.0,322.0,Terry Kinney,855.0,72455275.0,Action|Adventure|Drama|Romance|War,...,382.0,English,USA,R,40000000.0,1992.0,363.0,7.8,2.35,0


To check the size of a DataFrame, you can use Python’s built-in `len()` function or the pandas `.shape` attribute.

In [23]:
# len(df) returns the number of rows in the DataFrame.
len(df)

4916

In [24]:
# .shape provides both the number of rows and columns as a tuple,
df.shape

(4916, 28)

## 3. Selecting a Series from a DataFrame

When you select a single column from a DataFrame, it returns a pandas Series that shares the same index as the DataFrame. 
You can select a column as a Series using either **dictionary-like bracket notation or attribute dot notation**.

In [25]:
# Select the 'imdb_score' column using dot notation
df.imdb_score

0       7.9
1       7.1
2       6.8
3       8.5
4       7.1
       ... 
4911    7.7
4912    7.5
4913    6.3
4914    6.3
4915    6.6
Name: imdb_score, Length: 4916, dtype: float64

In [26]:
# Alternatively, use bracket notation
df['imdb_score']

0       7.9
1       7.1
2       6.8
3       8.5
4       7.1
       ... 
4911    7.7
4912    7.5
4913    6.3
4914    6.3
4915    6.6
Name: imdb_score, Length: 4916, dtype: float64

We can access more than one column by passing a list of column names inside the brackets. This returns a new DataFrame containing only the selected columns.

In [27]:
df[['director name','movie title','imdb_score']]

Unnamed: 0,director name,movie title,imdb_score
0,James Cameron,Avatar,7.9
1,Gore Verbinski,Pirates of the Caribbean: At World's End,7.1
2,Sam Mendes,Spectre,6.8
3,Christopher Nolan,The Dark Knight Rises,8.5
4,Doug Walker,Star Wars: Episode VII - The Force Awakens,7.1
...,...,...,...
4911,Scott Smith,Signed Sealed Delivered,7.7
4912,,The Following,7.5
4913,Benjamin Roberds,A Plague So Pleasant,6.3
4914,Daniel Hsia,Shanghai Calling,6.3


<div class="alert alert-block alert-danger"> 
<p><b>Warning</b></p>
<p>Bracket notation will always work, but dot notation has some limitations:</p> 
<ul>
  <li> Dot notation doesn’t work if the column name contains spaces (see Example 1 below).</li>
  <li> Dot notation doesn’t work if the column name conflicts with a DataFrame method or attribute (like 'head' or 'shape').</li>
  <li> Dot notation can’t be used to define a new column (see Example 2 below). </li>
</ul>
</div>

**Example 1**

Let us select the ``director name`` column

In [28]:
df['director name']

0           James Cameron
1          Gore Verbinski
2              Sam Mendes
3       Christopher Nolan
4             Doug Walker
              ...        
4911          Scott Smith
4912                  NaN
4913     Benjamin Roberds
4914          Daniel Hsia
4915             Jon Gunn
Name: director name, Length: 4916, dtype: object

Since the column name contains a space, you must use bracket notation instead of dot notation.

In [29]:
df.director name

SyntaxError: invalid syntax (41898870.py, line 1)

**Example 2:** The DataFrame contains several columns with data on the number of Facebook likes:

In [30]:
df[['actor_1_facebook_likes','actor_2_facebook_likes','actor_3_facebook_likes','director_facebook_likes']]

Unnamed: 0,actor_1_facebook_likes,actor_2_facebook_likes,actor_3_facebook_likes,director_facebook_likes
0,1000.0,936.0,855.0,0.0
1,40000.0,5000.0,1000.0,563.0
2,11000.0,393.0,161.0,0.0
3,27000.0,23000.0,23000.0,22000.0
4,131.0,12.0,,131.0
...,...,...,...,...
4911,637.0,470.0,318.0,2.0
4912,841.0,593.0,319.0,
4913,0.0,0.0,0.0,0.0
4914,946.0,719.0,489.0,0.0


Let’s sum the Facebook likes for all actors and the director, then store the result in a new column called `total_likes`.

If we try to create the new column using dot notation, pandas won’t allow it.

In [31]:
df.total_likes = df.actor_1_facebook_likes + df.actor_2_facebook_likes + df.actor_3_facebook_likes + df.director_facebook_likes

  df.total_likes = df.actor_1_facebook_likes + df.actor_2_facebook_likes + df.actor_3_facebook_likes + df.director_facebook_likes


To create the new column, we must use bracket notation.

In [33]:
df['total_likes'] = df.actor_1_facebook_likes + df.actor_2_facebook_likes + df.actor_3_facebook_likes + df.director_facebook_likes

In [35]:
df.columns

Index(['color', 'director name', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
       'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
       'movie title', 'num_voted_users', 'cast_total_facebook_likes',
       'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
       'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',
       'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',
       'imdb_score', 'aspect_ratio', 'movie_facebook_likes', 'total_likes'],
      dtype='object')

**Challenge:** Subtract `budget` from `gross` and assign the result to the `profit` column

In [None]:
# your code here


**Extra:** How to set the DataFrame index using existing columns

In [36]:
df.set_index('movie title')

Unnamed: 0_level_0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes,total_likes
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000,2791.0
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0,46563.0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000,11554.0
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000,95000.0
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,12.0,7.1,,0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Signed Sealed Delivered,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,English,Canada,,,2013.0,470.0,7.7,,84,1427.0
The Following,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,English,USA,TV-14,,,593.0,7.5,16.00,32000,
A Plague So Pleasant,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,English,USA,,1400.0,2013.0,0.0,6.3,,16,0.0
Shanghai Calling,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660,2154.0


Notice that ``movie title`` is now the index.
However, if we check the df DataFrame again, the index returns to the original integer sequence. 

In [37]:
df

Unnamed: 0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes,total_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000,2791.0
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0,46563.0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000,11554.0
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000,95000.0
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,12.0,7.1,,0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,English,Canada,,,2013.0,470.0,7.7,,84,1427.0
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,English,USA,TV-14,,,593.0,7.5,16.00,32000,
4913,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,English,USA,,1400.0,2013.0,0.0,6.3,,16,0.0
4914,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660,2154.0


This happens because the change was made in place and didn't create a new DataFrame.
Most pandas methods have an ``inplace`` parameter that controls whether changes are applied directly to the original DataFrame or return a new one.

To modify the original DataFrame, you need to set the ``inplace`` parameter to ``True`` in the method.

In [39]:
df.set_index('movie title', inplace=True)

In [40]:
df

Unnamed: 0_level_0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes,total_likes
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000,2791.0
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0,46563.0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000,11554.0
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000,95000.0
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,12.0,7.1,,0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Signed Sealed Delivered,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,English,Canada,,,2013.0,470.0,7.7,,84,1427.0
The Following,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,English,USA,TV-14,,,593.0,7.5,16.00,32000,
A Plague So Pleasant,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,English,USA,,1400.0,2013.0,0.0,6.3,,16,0.0
Shanghai Calling,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660,2154.0


To access the movie titles (which are now the index), we cannot use bracket or dot notation.

In [43]:
df['movie title']

KeyError: 'movie title'

Instead, we need to use the .index method.

In [44]:
df.index

Index(['Avatar', 'Pirates of the Caribbean: At World's End', 'Spectre',
       'The Dark Knight Rises', 'Star Wars: Episode VII - The Force Awakens',
       'John Carter', 'Spider-Man 3', 'Tangled', 'Avengers: Age of Ultron',
       'Harry Potter and the Half-Blood Prince',
       ...
       'Primer', 'Cavite', 'El Mariachi', 'The Mongol King', 'Newlyweds',
       'Signed Sealed Delivered', 'The Following', 'A Plague So Pleasant',
       'Shanghai Calling', 'My Date with Drew'],
      dtype='object', name='movie title', length=4916)

We can use the ``.reset_index`` method to set the index back to monotonically increasing integers.

In [45]:
df.reset_index(inplace=True,
                   drop=False) # insert index into dataframe colum

In [47]:
df.index

RangeIndex(start=0, stop=4916, step=1)

The `movie title` is restored as a column in the DataFrame.

In [49]:
df['movie title']

0                                           Avatar
1         Pirates of the Caribbean: At World's End
2                                          Spectre
3                            The Dark Knight Rises
4       Star Wars: Episode VII - The Force Awakens
                           ...                    
4911                       Signed Sealed Delivered
4912                                 The Following
4913                          A Plague So Pleasant
4914                              Shanghai Calling
4915                             My Date with Drew
Name: movie title, Length: 4916, dtype: object

## 4. Renaming columns in a pandas DataFrame

In [57]:
# reload the movies dataframe
df = pd.read_csv(path)

In [52]:
# examine the column names
df.columns

Index(['color', 'director name', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
       'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
       'movie title', 'num_voted_users', 'cast_total_facebook_likes',
       'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
       'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',
       'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',
       'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],
      dtype='object')

Let’s rename the columns `director name` and `movie title` using the rename method. For more details, see the [rename documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html)

In [59]:
# Create a dictionary with old column names as keys and new names as values
new_column_names = {'director name':'director_name', 'movie title':'movie_title'}

# rename columns
df.rename(columns=new_column_names, inplace=True)
df.columns

Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
       'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
       'movie_title', 'num_voted_users', 'cast_total_facebook_likes',
       'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
       'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',
       'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',
       'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],
      dtype='object')

## 5. Removing columns and/or rows from a pandas DataFrame

You can remove columns or rows from a pandas DataFrame using the `.drop` method. 
Use the `axis` parameter to specify whether you want to drop rows (`axis=0`) or columns (`axis=1`). 
For more details, see the [drop documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html)

In the following examples, we will set the `inplace` parameter to `True`, which modifies the original DataFrame directly. In general, use the `inplace` parameter with caution, as it changes the DataFrame without creating a new copy.

In [60]:
# remove a single column (axis=1 refers to columns)
df.drop('director_name', axis=1, inplace=True) 
df.columns

Index(['color', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
       'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
       'movie_title', 'num_voted_users', 'cast_total_facebook_likes',
       'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
       'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',
       'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',
       'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],
      dtype='object')

In [62]:
# remove multiple columns at once
df.drop(['color', 'duration'], axis=1, inplace=True)
df.columns

Index(['num_critic_for_reviews', 'director_facebook_likes',
       'actor_3_facebook_likes', 'actor_2_name', 'actor_1_facebook_likes',
       'gross', 'genres', 'actor_1_name', 'movie_title', 'num_voted_users',
       'cast_total_facebook_likes', 'actor_3_name', 'facenumber_in_poster',
       'plot_keywords', 'movie_imdb_link', 'num_user_for_reviews', 'language',
       'country', 'content_rating', 'budget', 'title_year',
       'actor_2_facebook_likes', 'imdb_score', 'aspect_ratio',
       'movie_facebook_likes'],
      dtype='object')

To remove rows, specify their indices, which are integers by default in this example.

In [64]:
# remove multiple rows at once (axis=0 refers to rows)
df.drop([0, 3], axis=0, inplace=True) # drop rows 0 and 3
df.head()

Unnamed: 0,num_critic_for_reviews,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,movie_title,num_voted_users,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
1,302.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,Johnny Depp,Pirates of the Caribbean: At World's End,471220,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,602.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,Christoph Waltz,Spectre,275868,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
4,,131.0,,Rob Walker,131.0,,Documentary,Doug Walker,Star Wars: Episode VII - The Force Awakens,8,...,,,,,,,12.0,7.1,,0
5,462.0,475.0,530.0,Samantha Morton,640.0,73058679.0,Action|Adventure|Sci-Fi,Daryl Sabara,John Carter,212204,...,738.0,English,USA,PG-13,263700000.0,2012.0,632.0,6.6,2.35,24000
6,392.0,0.0,4000.0,James Franco,24000.0,336530303.0,Action|Adventure|Romance,J.K. Simmons,Spider-Man 3,383056,...,1902.0,English,USA,PG-13,258000000.0,2007.0,11000.0,6.2,2.35,0


## 6. Selecting multiple rows and columns from a pandas DataFrame

You can perform various data selection operations on DataFrames using `loc` and `iloc`.
The `.loc` method is label-based, meaning you select rows and columns by their labels. 
The `.iloc` method is integer-based, so you select rows and columns by their integer index positions.

- [The loc attribute](#6.1.-The-loc-attribute)
- [The iloc attribute](#6.2.-The-iloc-attribute)

In [65]:
# Reload the movies DataFrame, setting 'movie title' as the index
df = pd.read_csv(path, index_col='movie title')
df.head()

Unnamed: 0_level_0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0


### 6.1. The loc attribute

The `.loc` attribute lets you filter rows and select columns using their labels (names).

In [66]:
# select Avatar, The Avengers and Toy Story, including all columns
df.loc[['Avatar','The Avengers','Toy Story'],:] 

Unnamed: 0_level_0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
The Avengers,Color,Joss Whedon,703.0,173.0,0.0,19000.0,Robert Downey Jr.,26000.0,623279547.0,Action|Adventure|Sci-Fi,...,1722.0,English,USA,PG-13,220000000.0,2012.0,21000.0,8.1,1.85,123000
Toy Story,Color,John Lasseter,166.0,74.0,487.0,802.0,John Ratzenberger,15000.0,191796233.0,Adventure|Animation|Comedy|Family|Fantasy,...,391.0,English,USA,G,30000000.0,1995.0,1000.0,8.3,1.85,0


In [68]:
# Select movies from 'Spectre' to 'Harry Potter and the Half-Blood Prince', including all columns
df.loc['Spectre':'Harry Potter and the Half-Blood Prince',:] 

Unnamed: 0_level_0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
John Carter,Color,Andrew Stanton,462.0,132.0,475.0,530.0,Samantha Morton,640.0,73058679.0,Action|Adventure|Sci-Fi,...,738.0,English,USA,PG-13,263700000.0,2012.0,632.0,6.6,2.35,24000
Spider-Man 3,Color,Sam Raimi,392.0,156.0,0.0,4000.0,James Franco,24000.0,336530303.0,Action|Adventure|Romance,...,1902.0,English,USA,PG-13,258000000.0,2007.0,11000.0,6.2,2.35,0
Tangled,Color,Nathan Greno,324.0,100.0,15.0,284.0,Donna Murphy,799.0,200807262.0,Adventure|Animation|Comedy|Family|Fantasy|Musi...,...,387.0,English,USA,PG,260000000.0,2010.0,553.0,7.8,1.85,29000
Avengers: Age of Ultron,Color,Joss Whedon,635.0,141.0,0.0,19000.0,Robert Downey Jr.,26000.0,458991599.0,Action|Adventure|Sci-Fi,...,1117.0,English,USA,PG-13,250000000.0,2015.0,21000.0,7.5,2.35,118000
Harry Potter and the Half-Blood Prince,Color,David Yates,375.0,153.0,282.0,10000.0,Daniel Radcliffe,25000.0,301956980.0,Adventure|Family|Fantasy|Mystery,...,973.0,English,UK,PG,250000000.0,2009.0,11000.0,7.5,2.35,10000


In [70]:
# Select all rows and the 'color' column
df.loc[:,'color'] 

movie title
Avatar                                        Color
Pirates of the Caribbean: At World's End      Color
Spectre                                       Color
The Dark Knight Rises                         Color
Star Wars: Episode VII - The Force Awakens      NaN
                                              ...  
Signed Sealed Delivered                       Color
The Following                                 Color
A Plague So Pleasant                          Color
Shanghai Calling                              Color
My Date with Drew                             Color
Name: color, Length: 4916, dtype: object

In [73]:
# Select all rows and the 'director name' and 'imdb_score' columns
df.loc[:,['director name','imdb_score']] 

Unnamed: 0_level_0,director name,imdb_score
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1
Avatar,James Cameron,7.9
Pirates of the Caribbean: At World's End,Gore Verbinski,7.1
Spectre,Sam Mendes,6.8
The Dark Knight Rises,Christopher Nolan,8.5
Star Wars: Episode VII - The Force Awakens,Doug Walker,7.1
...,...,...
Signed Sealed Delivered,Scott Smith,7.7
The Following,,7.5
A Plague So Pleasant,Benjamin Roberds,6.3
Shanghai Calling,Daniel Hsia,6.3


In [74]:
df.columns

Index(['color', 'director name', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
       'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
       'num_voted_users', 'cast_total_facebook_likes', 'actor_3_name',
       'facenumber_in_poster', 'plot_keywords', 'movie_imdb_link',
       'num_user_for_reviews', 'language', 'country', 'content_rating',
       'budget', 'title_year', 'actor_2_facebook_likes', 'imdb_score',
       'aspect_ratio', 'movie_facebook_likes'],
      dtype='object')

In [76]:
# Select all rows and columns from 'gross' through 'budget'
df.loc[:,'gross':'budget'] 

Unnamed: 0_level_0,gross,genres,actor_1_name,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Avatar,760505847.0,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,886204,4834,Wes Studi,0.0,avatar|future|marine|native|paraplegic,http://www.imdb.com/title/tt0499549/?ref_=fn_t...,3054.0,English,USA,PG-13,237000000.0
Pirates of the Caribbean: At World's End,309404152.0,Action|Adventure|Fantasy,Johnny Depp,471220,48350,Jack Davenport,0.0,goddess|marriage ceremony|marriage proposal|pi...,http://www.imdb.com/title/tt0449088/?ref_=fn_t...,1238.0,English,USA,PG-13,300000000.0
Spectre,200074175.0,Action|Adventure|Thriller,Christoph Waltz,275868,11700,Stephanie Sigman,1.0,bomb|espionage|sequel|spy|terrorist,http://www.imdb.com/title/tt2379713/?ref_=fn_t...,994.0,English,UK,PG-13,245000000.0
The Dark Knight Rises,448130642.0,Action|Thriller,Tom Hardy,1144337,106759,Joseph Gordon-Levitt,0.0,deception|imprisonment|lawlessness|police offi...,http://www.imdb.com/title/tt1345836/?ref_=fn_t...,2701.0,English,USA,PG-13,250000000.0
Star Wars: Episode VII - The Force Awakens,,Documentary,Doug Walker,8,143,,0.0,,http://www.imdb.com/title/tt5289954/?ref_=fn_t...,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Signed Sealed Delivered,,Comedy|Drama,Eric Mabius,629,2283,Crystal Lowe,2.0,fraud|postal worker|prison|theft|trial,http://www.imdb.com/title/tt3000844/?ref_=fn_t...,6.0,English,Canada,,
The Following,,Crime|Drama|Mystery|Thriller,Natalie Zea,73839,1753,Sam Underwood,1.0,cult|fbi|hideout|prison escape|serial killer,http://www.imdb.com/title/tt2071645/?ref_=fn_t...,359.0,English,USA,TV-14,
A Plague So Pleasant,,Drama|Horror|Thriller,Eva Boehnke,38,0,David Chandler,0.0,,http://www.imdb.com/title/tt2107644/?ref_=fn_t...,3.0,English,USA,,1400.0
Shanghai Calling,10443.0,Comedy|Drama|Romance,Alan Ruck,1255,2386,Eliza Coupe,5.0,,http://www.imdb.com/title/tt2070597/?ref_=fn_t...,9.0,English,USA,PG-13,


In [80]:
# Select movies from 'Spectre' to 'Harry Potter and the Half-Blood Prince' and columns from 'gross' to 'budget'
df.loc['Spectre':'Harry Potter and the Half-Blood Prince','gross':'budget']   

Unnamed: 0_level_0,gross,genres,actor_1_name,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Spectre,200074175.0,Action|Adventure|Thriller,Christoph Waltz,275868,11700,Stephanie Sigman,1.0,bomb|espionage|sequel|spy|terrorist,http://www.imdb.com/title/tt2379713/?ref_=fn_t...,994.0,English,UK,PG-13,245000000.0
The Dark Knight Rises,448130642.0,Action|Thriller,Tom Hardy,1144337,106759,Joseph Gordon-Levitt,0.0,deception|imprisonment|lawlessness|police offi...,http://www.imdb.com/title/tt1345836/?ref_=fn_t...,2701.0,English,USA,PG-13,250000000.0
Star Wars: Episode VII - The Force Awakens,,Documentary,Doug Walker,8,143,,0.0,,http://www.imdb.com/title/tt5289954/?ref_=fn_t...,,,,,
John Carter,73058679.0,Action|Adventure|Sci-Fi,Daryl Sabara,212204,1873,Polly Walker,1.0,alien|american civil war|male nipple|mars|prin...,http://www.imdb.com/title/tt0401729/?ref_=fn_t...,738.0,English,USA,PG-13,263700000.0
Spider-Man 3,336530303.0,Action|Adventure|Romance,J.K. Simmons,383056,46055,Kirsten Dunst,0.0,sandman|spider man|symbiote|venom|villain,http://www.imdb.com/title/tt0413300/?ref_=fn_t...,1902.0,English,USA,PG-13,258000000.0
Tangled,200807262.0,Adventure|Animation|Comedy|Family|Fantasy|Musi...,Brad Garrett,294810,2036,M.C. Gainey,1.0,17th century|based on fairy tale|disney|flower...,http://www.imdb.com/title/tt0398286/?ref_=fn_t...,387.0,English,USA,PG,260000000.0
Avengers: Age of Ultron,458991599.0,Action|Adventure|Sci-Fi,Chris Hemsworth,462669,92000,Scarlett Johansson,4.0,artificial intelligence|based on comic book|ca...,http://www.imdb.com/title/tt2395427/?ref_=fn_t...,1117.0,English,USA,PG-13,250000000.0
Harry Potter and the Half-Blood Prince,301956980.0,Adventure|Family|Fantasy|Mystery,Alan Rickman,321795,58753,Rupert Grint,3.0,blood|book|love|potion|professor,http://www.imdb.com/title/tt0417741/?ref_=fn_t...,973.0,English,UK,PG,250000000.0


### 6.2. The iloc attribute

The `iloc` attribute is used to filter rows and select columns based on their integer positions.

In [82]:
# select all rows, columns 0 and 3
df.iloc[:,[0,3]] 

Unnamed: 0_level_0,color,duration
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1
Avatar,Color,178.0
Pirates of the Caribbean: At World's End,Color,169.0
Spectre,Color,148.0
The Dark Knight Rises,Color,164.0
Star Wars: Episode VII - The Force Awakens,,
...,...,...
Signed Sealed Delivered,Color,87.0
The Following,Color,43.0
A Plague So Pleasant,Color,76.0
Shanghai Calling,Color,100.0


When using `iloc`, the end index is excluded, meaning you select up to, but not including, the specified index. 
This is different from `loc`, where the end label is included.

In [84]:
# select all rows, columns 0 through 3
df.iloc[:,0:4] 

Unnamed: 0_level_0,color,director name,num_critic_for_reviews,duration
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Avatar,Color,James Cameron,723.0,178.0
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0
Spectre,Color,Sam Mendes,602.0,148.0
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,
...,...,...,...,...
Signed Sealed Delivered,Color,Scott Smith,1.0,87.0
The Following,Color,,43.0,43.0
A Plague So Pleasant,Color,Benjamin Roberds,13.0,76.0
Shanghai Calling,Color,Daniel Hsia,14.0,100.0


In [85]:
# rows 0 through 2, all columns
df.iloc[0:3,:] 

Unnamed: 0_level_0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000


## 7. Handling missing values in pandas

- [Droping rows/columns with missing values](#7.1.-Droping-rows-with-missing-values)
- [Filling in missing values](#7.2.-Filling-in-missing-values)

If we examine our DataFrame closely, we’ll notice some values marked as `NaN`.

In [86]:
df

Unnamed: 0_level_0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Signed Sealed Delivered,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
The Following,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
A Plague So Pleasant,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
Shanghai Calling,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


What does ``NaN`` mean?

``NaN`` stands for "Not a Number" and represents a missing value.
It is not a string but a special value from numpy, specifically `numpy.nan`. 
When `read_csv` loads data, it automatically detects missing values and replaces them with `NaN`.

The `isna` method returns a DataFrame of boolean values: `True` for missing values and `False` for non-missing values.

In [88]:
df.isna()

Unnamed: 0_level_0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Pirates of the Caribbean: At World's End,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Spectre,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
The Dark Knight Rises,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Star Wars: Episode VII - The Force Awakens,True,False,True,True,False,True,False,False,True,False,...,True,True,True,True,True,True,False,False,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Signed Sealed Delivered,False,False,False,False,False,False,False,False,True,False,...,False,False,False,True,True,False,False,False,True,False
The Following,False,True,False,False,True,False,False,False,True,False,...,False,False,False,False,True,True,False,False,False,False
A Plague So Pleasant,False,False,False,False,False,False,False,False,True,False,...,False,False,False,True,False,False,False,False,True,False
Shanghai Calling,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,True,False,False,False,False,False


The `notna` method returns the opposite of `isna`: `True` for non-missing values and `False` for missing values.

In [90]:
df.notna()

Unnamed: 0_level_0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,True,True,True,True,True,True,True,True,True,True,...,True,True,True,True,True,True,True,True,True,True
Pirates of the Caribbean: At World's End,True,True,True,True,True,True,True,True,True,True,...,True,True,True,True,True,True,True,True,True,True
Spectre,True,True,True,True,True,True,True,True,True,True,...,True,True,True,True,True,True,True,True,True,True
The Dark Knight Rises,True,True,True,True,True,True,True,True,True,True,...,True,True,True,True,True,True,True,True,True,True
Star Wars: Episode VII - The Force Awakens,False,True,False,False,True,False,True,True,False,True,...,False,False,False,False,False,False,True,True,False,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Signed Sealed Delivered,True,True,True,True,True,True,True,True,False,True,...,True,True,True,False,False,True,True,True,False,True
The Following,True,False,True,True,False,True,True,True,False,True,...,True,True,True,True,False,False,True,True,True,True
A Plague So Pleasant,True,True,True,True,True,True,True,True,False,True,...,True,True,True,False,True,True,True,True,False,True
Shanghai Calling,True,True,True,True,True,True,True,True,True,True,...,True,True,True,True,False,True,True,True,True,True


For more details, see the documentation for  [isna](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isnull.html) and [notna](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.notnull.html)

We can use the `sum` method to count the number of missing values in each column.

In [None]:
movies.isna().sum()

This calculation works because:

- The sum method for a DataFrame operates on axis=0 by default (and thus produces column sums).
- Python treats `True` as 1 and `False` as 0, so adding them counts the number of `True` values (or missing values).

This is an example of **method chaining**, where one method returns a DataFrame, and the next method is applied to that DataFrame, continuing the process.

**How you handle missing values** depends on your dataset and analysis needs. Here are some options:

### 7.1. Dropping rows/columns with missing values

You can use the `dropna` method to remove rows or columns with missing values. 
By default, the `inplace` parameter is set to `False`, so the rows or columns are only removed temporarily unless you explicitly set `inplace=True`.
For more details, check the [dropna documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html) 

**Example 1:**  Drop a row if **all** of its values are missing.

In [91]:
df.dropna(axis=0, how='all')

Unnamed: 0_level_0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Signed Sealed Delivered,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
The Following,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
A Plague So Pleasant,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
Shanghai Calling,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


**Example 2:** Drop a row if **any** of its values are missing.

In [92]:
df.dropna(axis=0, how='any')

Unnamed: 0_level_0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
John Carter,Color,Andrew Stanton,462.0,132.0,475.0,530.0,Samantha Morton,640.0,73058679.0,Action|Adventure|Sci-Fi,...,738.0,English,USA,PG-13,263700000.0,2012.0,632.0,6.6,2.35,24000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Clean,Color,Olivier Assayas,81.0,110.0,107.0,45.0,Béatrice Dalle,576.0,136007.0,Drama|Music|Romance,...,39.0,French,France,R,4500.0,2004.0,133.0,6.9,2.35,171
The Circle,Color,Jafar Panahi,64.0,90.0,397.0,0.0,Nargess Mamizadeh,5.0,673780.0,Drama,...,26.0,Persian,Iran,Not Rated,10000.0,2000.0,0.0,7.5,1.85,697
Primer,Color,Shane Carruth,143.0,77.0,291.0,8.0,David Sullivan,291.0,424760.0,Drama|Sci-Fi|Thriller,...,371.0,English,USA,PG-13,7000.0,2004.0,45.0,7.0,1.85,19000
El Mariachi,Color,Robert Rodriguez,56.0,81.0,0.0,6.0,Peter Marquardt,121.0,2040920.0,Action|Crime|Drama|Romance|Thriller,...,130.0,Spanish,USA,R,7000.0,1992.0,20.0,6.9,1.37,0



**Example 3:** Drop a row if any values are missing in the `director name` or `country` columns.

In [95]:
df.dropna(axis=0, how='any',subset=['director name', 'country'])

Unnamed: 0_level_0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
John Carter,Color,Andrew Stanton,462.0,132.0,475.0,530.0,Samantha Morton,640.0,73058679.0,Action|Adventure|Sci-Fi,...,738.0,English,USA,PG-13,263700000.0,2012.0,632.0,6.6,2.35,24000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Newlyweds,Color,Edward Burns,14.0,95.0,0.0,133.0,Caitlin FitzGerald,296.0,4584.0,Comedy|Drama,...,14.0,English,USA,Not Rated,9000.0,2011.0,205.0,6.4,,413
Signed Sealed Delivered,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
A Plague So Pleasant,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
Shanghai Calling,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


**Example 4:** Drop a row if all values are missing in the `director name` and `country` columns.

In [96]:
df.dropna(axis=0, how='all',subset=['director name', 'country'])

Unnamed: 0_level_0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Signed Sealed Delivered,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
The Following,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
A Plague So Pleasant,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
Shanghai Calling,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


**Example 5:** Drop a column if any of its values are missing.

In [97]:
df.dropna(axis=1, how='any')

Unnamed: 0_level_0,genres,num_voted_users,cast_total_facebook_likes,movie_imdb_link,imdb_score,movie_facebook_likes
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Avatar,Action|Adventure|Fantasy|Sci-Fi,886204,4834,http://www.imdb.com/title/tt0499549/?ref_=fn_t...,7.9,33000
Pirates of the Caribbean: At World's End,Action|Adventure|Fantasy,471220,48350,http://www.imdb.com/title/tt0449088/?ref_=fn_t...,7.1,0
Spectre,Action|Adventure|Thriller,275868,11700,http://www.imdb.com/title/tt2379713/?ref_=fn_t...,6.8,85000
The Dark Knight Rises,Action|Thriller,1144337,106759,http://www.imdb.com/title/tt1345836/?ref_=fn_t...,8.5,164000
Star Wars: Episode VII - The Force Awakens,Documentary,8,143,http://www.imdb.com/title/tt5289954/?ref_=fn_t...,7.1,0
...,...,...,...,...,...,...
Signed Sealed Delivered,Comedy|Drama,629,2283,http://www.imdb.com/title/tt3000844/?ref_=fn_t...,7.7,84
The Following,Crime|Drama|Mystery|Thriller,73839,1753,http://www.imdb.com/title/tt2071645/?ref_=fn_t...,7.5,32000
A Plague So Pleasant,Drama|Horror|Thriller,38,0,http://www.imdb.com/title/tt2107644/?ref_=fn_t...,6.3,16
Shanghai Calling,Comedy|Drama|Romance,1255,2386,http://www.imdb.com/title/tt2070597/?ref_=fn_t...,6.3,660


**Example 6 (advanced):**  Drop a column only if more than 15% of its values are missing.

In [100]:
df.loc[:,100*df.isnull().sum()/len(df)<15]

Unnamed: 0_level_0,color,director name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,genres,actor_1_name,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,Action|Adventure|Fantasy,Johnny Depp,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,Action|Adventure|Thriller,Christoph Waltz,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,Action|Thriller,Tom Hardy,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,Documentary,Doug Walker,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Signed Sealed Delivered,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,Comedy|Drama,Eric Mabius,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
The Following,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,Crime|Drama|Mystery|Thriller,Natalie Zea,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
A Plague So Pleasant,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,Drama|Horror|Thriller,Eva Boehnke,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
Shanghai Calling,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,Comedy|Drama|Romance,Alan Ruck,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


How does it work?

- `100 * df.isnull().sum() / len(df)` calculates the percentage of missing values for each column by dividing the total number of missing values by the number of rows.
- `100 * df.isnull().sum() / len(df) < 15` creates a boolean Series that is True for columns where less than 15% of the values are missing.
- `df.loc[:, 100 * df.isnull().sum() / len(df) < 15]` selects all rows but keeps only the columns where less than 15% of the values are missing.

The result is a DataFrame without columns that have too many missing values. 
This is more advanced than what we’ve covered so far, but it will make more sense as we move forward. 
I wanted to give you a glimpse of what you’ll be able to do soon.

### 7.2. Filling in missing values

You can handle missing data by filling in the gaps using the `fillna` method. 
For more details, see the [fillna documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html)

Let’s check how many missing values are in the 'director name' column.

In [103]:
df['director name'].isna().sum()

102

We can fill in missing values in a column with a specific value. For example

In [108]:
# Fill in missing values in the 'director name' column with 'unknown'
df.loc[:,'director name'].fillna(value='unknown', inplace=True)

We can now confirm that the column no longer has any missing values.

In [109]:
df['director name'].isna().sum()

0