## Imports

In [1]:
import movielens_analysis

In [10]:
movies = movielens_analysis.Movies('data/movies.csv')
tags = movielens_analysis.Tags('data/tags.csv')
ratings = movielens_analysis.Ratings('data/ratings.csv')
links = movielens_analysis.Links('data/links.csv')

---
## Movies

In [None]:
%%timeit
movies.dist_by_release()

Our dataset contains a lot of movies, but for the first look it seems a little bit messy, so getting data sorted by release year, gives us the following:
- most of the films are released in 2002, 311 movies were released in this year
- top-5 years:
    - 2002
    - 2006
    - 2001
    - 2007
    - 2000
- 2002 year is the only one with number of released movies over 300
- the tendency shows lower the year of release lower the number of released movies in the dataset
- there are 8 years with the number of released movies equal to 1

In [11]:
%%timeit
movies.dist_by_genres()

263 ms ± 3.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Lets give a look at genres. Data observation shows that the most popular genre is `Drama`, almost half of all the movies is consideres as Drama. The least popular genre occurs to be `Film-Noir`.\
The top-5 popular genres with their count out of 9.5k movies:
- Drama: 4361
- Comedy: 3756
- Thriller: 1894
- Action: 1828
- Romance: 1596


In [12]:
%%timeit
movies.most_genres()

13.4 ms ± 542 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


But not all movies decided on a specific genre. There are a bunch of movies with over 5 genres.\
The movie `Rubber` has even 10 genres, and is top-1 movies by number of genres.\
`Patlabor` takes the second place with number of 8.\
`Mulan`, `Who Framed Roger Rabbit?` and `Osmosis Jones` share the 3 place with the number of 7.

---
## Tags

Now lets have a look at tags that were left by the viewers on movies in our dataset.

In [14]:
%%timeit
tags.most_words()

1.48 ms ± 48.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [15]:
%%timeit
tags.longest()

1.18 ms ± 53.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [16]:
%%timeit
tags.most_words_and_longest()

1.56 ms ± 46.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [17]:
%%timeit
tags.most_popular()

254 ms ± 3.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [19]:
%%timeit
tags.tags_with('funny')

321 µs ± 2.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


Most popular tag is `In Netflix queue` (131). People seem to love wathing movies on Netflix, moreover this tag is quite positive.\
Other tags from top-5 are: art (69), comedy (60), dark (46), atmospheric (36)\
They are positive except dark, quick look to the dataset gave an awareness that dark refers to humor.\
Longest tags and tags with most words at top have a slight difference, and longest tags with most words more overlap with longest ones. Most of them are quite negative.

---
## Ratings

In [22]:
%%timeit
ratings.movies.dist_by_year()

105 ms ± 3.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Over 10k ratings were posted at 2000. That was a year with the most interested users\
1998 is the only year that has number of rating under 1000 (507)\
For the last 4 years tendency stays on the same level, the average number of ratings there is around 6700.

In [23]:
%%timeit
ratings.movies.dist_by_rating()

{0.5: 1370,
 1.0: 2811,
 1.5: 1791,
 2.0: 7551,
 2.5: 5550,
 3.0: 20047,
 3.5: 13136,
 4.0: 26818,
 4.5: 8551,
 5.0: 13211}

Most people aren't ready to give the highest rate and decide to leave `4.0` over `4.5` or `5.0`.\
The least number of people dares to leave the lowest rate.\
The average rate `3.0` also is very popular

In [24]:
%%timeit
ratings.movies.top_by_num_of_ratings()

{'Forrest Gump': 329,
 '"Shawshank Redemption, The"': 317,
 'Pulp Fiction': 307,
 '"Silence of the Lambs, The"': 279,
 '"Matrix, The"': 278}

The most reted film is `Forrest Gump` with 329 rates.\
Top-5 films by rate:
1. Forrest Gump 329
1. Shawshank Redemption 317
1. Pulp Fiction 307
1. Silence of the Lambs 279
1. Matrix 278

In [25]:
# %%timeit
ratings.movies.top_by_ratings()

{'The Jinx: The Life and Deaths of Robert Durst': 5.0,
 'Galaxy of Terror (Quest)': 5.0,
 'Alien Contamination': 5.0,
 "I'm the One That I Want": 5.0,
 'Lesson Faust': 5.0}

Top by average rate:
1. The Jinx: The Life and Deaths of Robert Durst 5.0
1. Galaxy of Terror (Quest) 5.0
1. Alien Contamination 5.0
1. I'm the One That I Want 5.0
1. Lesson Faust 5.0

These movies have the small number of views, therefore the their medium rate is max