# Counting missing rows with left join
The Movie Database is supported by volunteers going out into the world, collecting data, and entering it into the database. This includes financial data, such as movie budget and revenue. If you wanted to know which movies are still missing data, you could use a left join to identify them. Practice using a left join by merging the movies table and the financials table.

In [1]:
import pandas as pd

In [7]:
# Load datasets
movies = pd.read_pickle(r'datasets/movies.p')
financials = pd.read_pickle(r'datasets/financials.p')
print(f'Movies {movies.shape}:')
print(movies.head(3))
print(f'Financials {financials.shape}:')
print(financials.head(3))

Movies (4803, 4):
      id                 title  popularity release_date
0    257          Oliver Twist   20.415572   2005-09-23
1  14290  Better Luck Tomorrow    3.877036   2002-01-12
2  38365             Grown Ups   38.864027   2010-06-24
Financials (3229, 3):
       id     budget       revenue
0   19995  237000000  2.787965e+09
1     285  300000000  9.610000e+08
2  206647  245000000  8.806746e+08


Merge the movies table, as the left table, with the financials table using a left join, and save the result to movies_financials.

In [13]:
# Merge the movies table with the financials table with a left join
movies_financials = pd.merge(movies, financials, on='id', how='left')

# Count the number of rows in the budget column that are missing
number_of_missing_fin = movies_financials['budget'].isna().sum()

# Print the number of movies missing financials
print(number_of_missing_fin)

1574


Great job! You used a left join to find out which rows in the financials table were missing data. When performing a left join, the .merge() method returns a row full of null values for columns in the right table if the key column does not have a matching value in both tables. We see that there are at least 1,500 rows missing data. Wow! That sounds like a lot of work.