# Scraping and Analyzing the Top 100 Movies of All Time

In this project, I am going to scrape a list of the top 100 movies from an archived page of Empire Online—a reputable source for movie rankings and reviews.

The goal is to:
1. Extract the movie titles from the webpage.
2. Reverse the order of the list so that it starts with the #1 movie.
3. Save the list to a file for further use.
4. Display the list in this notebook for quick reference.

This exercise showcases the power of web scraping, a technique that allows me to programmatically extract and process information from the web, turning it into structured data that I can analyze or store.

### Importing Libraries

In [None]:
from bs4 import BeautifulSoup

In [None]:
import requests

- `BeautifulSoup` from the `bs4` package: This is a powerful library for parsing HTML and XML documents. I'll use it to navigate the webpage's structure and extract the movie titles
- `requests`: This library allows me to send HTTP requests to the webpage and retrieve its content

### Fetching and Parsing the Webpage Content

Now I reach out to the specified URL, which points to an archived version of Empire Online's "Top 100 Movies" list. The `requests.get()` method fetches the raw HTML content of the page.

Once I have the HTML content, I pass it to `BeautifulSoup` for parsing. `BeautifulSoup` takes the raw HTML and converts it into a structured format, allowing me to easily navigate through the elements of the page and find the information I'm looking for—in this case, movie titles.

In [None]:
# URL of the webpage to scrape
URL = "https://web.archive.org/web/20200518073855/https://www.empireonline.com/movies/features/best-movies-2/"

response = requests.get(URL)
website_html = response.text

soup = BeautifulSoup(website_html, "html.parser")

### Extracting and Reversing the Movie Titles

With the parsed HTML in hand, I now look for the specific elements that contain the movie titles. By inspecting the HTML structure of the webpage, I find that each movie title is contained within an `h3` tag with the class `"title"`.

Using `soup.find_all()`, I locate all such tags and extract the text (i.e., the movie titles) from them. The titles are then stored in a list.

However, the list I extracted starts with the 100th movie and ends with the 1st. To correct this, I need to reverse the order of the list so that it starts with the #1 movie.

In [None]:
all_movies = soup.find_all(name="h3", class_="title")

movie_titles = [movie.getText() for movie in all_movies]
movies = movie_titles[::-1]

### Saving the Movie Titles to a File

To ensure that my list of movies is preserved for future reference, I save it to a text file named `movies.txt`. I open the file in write mode (`"w"`) with UTF-8 encoding, which supports a wide range of characters, ensuring that even movies with special characters in their titles are saved correctly.

Each movie title is written to the file on a new line, creating a clean and organized list.

In [None]:
with open("movies.txt", mode="w", encoding="utf-8") as file:
    for movie in movies:
        file.write(f"{movie}\n")

### Displaying the Movie Titles in the Notebook

For quick reference and to verify my work, I also display the list of movies directly within this notebook. This allows me to see the final result immediately, without needing to open the text file.

In [None]:
for movie in movies:
    print(movie)

## Conclusion

Web scraping is a powerful tool that allows us to automate the process of gathering information from the web. In this notebook, I demonstrated how to scrape a list of the top 100 movies from a well-regarded source, process that data, and store it in a reusable format.

This project not only provided me with a list of movies to enjoy but also showcased the essential steps in web scraping: fetching content, parsing HTML, extracting relevant data, and saving it for future use. Whether you're compiling lists, analyzing trends, or gathering data for research, web scraping is an invaluable skill that opens up a world of possibilities.

Now that I have my list, the next step could be analyzing the genres, directors, or even the IMDb ratings of these movies. The potential applications are vast and exciting!