# Updated Data Dictionary

This notebook contains information related to the the cleaned and tranformed IMBd datasets that were loaded into the `movies` database.

___
## Updated ER Diagram

![updated_model.png](attachment:ec4029cd-7b59-4960-bd91-a4ba073b39d6.png)

## Tables Descriptions
The data validation and cleaning steps have been completed, resulting in the final tables below.

### movies

Contains all movies in IMDb database and relevant information about each movie.

__Original Tables:__ title.basics, title.ratings

__Columns:__
- movie_id (string) - alphanumeric unique identifier of the movie
- movie_title (string) – title of movie
- release_year (integer) – year movie was released in YYYY format
- length_minutes (integer) – length of the movie in minutes
- avg_rating (float) - average rating out of film (out of 10) rounded to one decimal
- num_ratings (integer) - number of ratings a film has

In [None]:
# Sample Output

Unnamed: 0,movie_id,movie_title,release_year,length_minutes,avg_rating,num_ratings
0,tt0000009,Miss Jerry,1894,45.0,5.3,206
1,tt0000147,The Corbett-Fitzsimmons Fight,1897,100.0,5.3,475
2,tt0000574,The Story of the Kelly Gang,1906,70.0,6.0,832
3,tt0000591,The Prodigal Son,1907,90.0,4.4,20
4,tt0000615,Robbery Under Arms,1907,,4.3,24


___
### actors

Stores information related to actors in movies.

__Original Table:__ name.basics

__Columns:__

- alphanumeric unique identifier of the actor/actress
- name (string) - name of actor/actress
- birth_year (integer) - year person was born in YYYY format
- death_year (integer) - year person died, if applicable, in YYYY format

In [191]:
# Sample Output

Unnamed: 0,actor_id,actor_name,birth_year,death_year
0,nm0000001,Fred Astaire,1899,1987.0
1,nm0000002,Lauren Bacall,1924,2014.0
2,nm0000003,Brigitte Bardot,1934,
3,nm0000004,John Belushi,1949,1982.0
4,nm0000005,Ingmar Bergman,1918,2007.0


___
### movie_cast

Mapping table between actors and movies tables identifying all actors in every movie.

__Original Table:__ title.principals

__Columns:__

- movie_id (string) - alphanumeric unique identifier of the movie
- actor_id (string) - alphanumeric unique identifier of the actor/actress
- known_for (boolean) - indicates where the actor is well known for the role in the movie

In [192]:
# Sample Output

Unnamed: 0,movie_id,actor_id,known_for
0,tt0000009,nm0063086,True
1,tt0000009,nm0183823,True
2,tt0000009,nm1309758,True
3,tt0000574,nm0846887,True
4,tt0000574,nm0846894,True


___
### genres

Contains all distinct genres a movie can be tagged as.

__Original Table:__ title.basics (genres column)

__Columns:__

- genre_id (integer) - integer unique identifier of genre
- genre_name (string) - name of the genre

In [196]:
# Sample Output

Unnamed: 0,genre_id,genre_name
0,1,Romance
1,2,Documentary
2,3,News
3,4,Sport
4,5,Action


___
### movie_genres

Mapping table between movies and genres table identifying all genres of a movie.

__Original Table:__ title.basics (genres column)

__Columns:__

- movie_id (string) - alphanumeric unique identifier of the movie
- genre_id (integer) - integer unique identifier of genre

In [194]:
# Sample Output

Unnamed: 0,movie_id,genre_id
0,tt0000009,1
1,tt0002423,1
2,tt0003022,1
3,tt0003442,1
4,tt0003595,1
