## IMDB Top250電影排名網站，爬蟲並儲存至TXT文檔

代碼內容一覽。
1. 創建叫做"MovieData"的Class，儲存電影資訊。
2. 爬蟲IMDB Top250電影排名網站，並儲存IMDB Top250電影資訊。
3. 創建虛擬的電影資訊Likes，證明此資料是獨一無二的。(客製化)
4. 輸出電影資訊成為TXT文檔，以備後續提供給ChatGPI使用，作為客製化資料庫範例(以電影資訊為例)。

## IMDB Top 250 movie ranking website, webscraping and save to TXT document.

Overview of the code:

1. Create a class named "MovieData" to store movie information.
2. Webscraping the IMDB Top 250 movie ranking website and save the IMDB Top 250 movie information.
3. Create a virtual movie information "Likes" to prove that this data is unique. (Customized)
4. Output the movie information as a TXT document for later use by ChatGPI as a customized database example (using movie information as an example).

In [1]:
import requests
from bs4 import BeautifulSoup
import numpy as np

In [2]:
class MovieData:
    def __init__(self, rank, name, rating, director, stars, url, likes):
        """
        Initialize a new MovieData object with the given rank, name, rating, director, stars, URL, and likes.

        Args:
            rank (int): The movie's ranking on the list.
            name (str): The movie's name.
            rating (float): The movie's rating.
            director (str): The movie's director.
            stars (str): The movie's stars.
            url (str): The URL of the movie's IMDb page.
            likes (int): The number of likes the movie has.
        """
        self.rank = rank
        self.name = name.strip()
        self.rating = rating.strip()
        self.director = director.strip()
        self.stars = stars.strip()
        self.url = url.strip()
        self.likes = likes

    def __str__(self):
        return f'{self.rank}. {self.name} ({self.rating}) {self.director}  {self.stars} | {self.url} | Likes: {self.likes}'

    def write_to_file(self, filename):
        """
        Write the movie data to a file.

        Args:
            filename (str): The name of the file to write to.
        """
        try:
            with open(filename, 'a', encoding='utf-8') as file:
                # Write the movie data to the file in a specific format.
                file.write(f'This is the Top {self.rank} movie in the IMDB top 250 ranking.\
                The Chinese movie name is {self.name}.\
                The rating of movie in IMDB is {self.rating}.\
                The director of the movie is {self.director}.\
                The actors and movie stars of the movie are {self.stars}.\
                The information web page of this movie in IMDB website is {self.url}.\
                The number of likes of this movie is {self.likes}.\n')
        except Exception as e:
            print(f'Error writing to file: {e}')


            
    def get_init_info(self):
        """
        Get information about each item in the initializer.

        Returns:
            dict: A dictionary containing the name and value of each item in the initializer.
        """
        return {"rank": self.rank, "name": self.name, "rating": self.rating, "director": self.director,\
                "stars": self.stars , "url": self.url, "likes": self.likes}

In [3]:
def generate_like():
    """
    Generate a random number of likes for the movie.

    Returns:
        int: A random integer between 0 and 1000, generated from a normal distribution.
    """
    mean = 500
    std_dev = 150
    likes = int(np.random.normal(mean, std_dev))
    likes = min(max(likes, 0), 1000)  # ensure likes is between 0 and 1000
    return likes


def scrape_imdb_top_movies(url):
    """
    Scrapes the IMDb Top 250 movies list from the given URL and returns a list of MovieData objects, 
    each containing information about a single movie.

    Args:
        url (str): The URL of the IMDb Top 250 movies list.

    Returns:
        list: A list of MovieData objects, each containing information about a single movie.
    """
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    table = soup.find('table', {'class': 'chart full-width'})
    rows = table.find_all('tr')
    movies = []

    for i, row in enumerate(rows):
        title_column = row.find('td', {'class': 'titleColumn'})
        if title_column is not None:
            movie_link = title_column.find('a')

            # get movie name
            movie_name = movie_link.text

            # get movie title
            movie_title = movie_link["title"].split('(dir.),')
            movie_director = movie_title[0]
            movie_stars =  movie_title[1]

            # get movie rating
            rating_column = row.find('td', {'class': 'ratingColumn'})
            movie_rating = rating_column.find('strong').text

            # get movie detail URL
            movie_url = f'https://www.imdb.com{movie_link["href"]}'
            
            # get movie likes
            movie_likes = generate_like()
            
            # create a class to store all movie information
            movies.append(MovieData(i, movie_name, movie_rating, movie_director, movie_stars, movie_url, movie_likes))
            

    return movies


def show_movie(file_path):
    # Open the file in write mode to empty its contents
    with open(file_path, 'w', encoding='utf-8'):
        pass

    # Scrape the top movies and write their info to the file
    imdb_movie_url = 'https://www.imdb.com/chart/top/?ref_=nv_mv_250'
    movies = scrape_imdb_top_movies(imdb_movie_url)
    for movie in movies:
        movie.write_to_file(file_path)
        print(movie)

In [4]:
filename = "data/imdb_movie.txt"
show_movie(filename)

1. 刺激1995 (9.2) Frank Darabont  Tim Robbins, Morgan Freeman | https://www.imdb.com/title/tt0111161/ | Likes: 333
2. 教父 (9.2) Francis Ford Coppola  Marlon Brando, Al Pacino | https://www.imdb.com/title/tt0068646/ | Likes: 616
3. 黑暗騎士 (9.0) Christopher Nolan  Christian Bale, Heath Ledger | https://www.imdb.com/title/tt0468569/ | Likes: 546
4. 教父2 (9.0) Francis Ford Coppola  Al Pacino, Robert De Niro | https://www.imdb.com/title/tt0071562/ | Likes: 534
5. 十二怒漢 (9.0) Sidney Lumet  Henry Fonda, Lee J. Cobb | https://www.imdb.com/title/tt0050083/ | Likes: 672
6. 辛德勒的名單 (8.9) Steven Spielberg  Liam Neeson, Ralph Fiennes | https://www.imdb.com/title/tt0108052/ | Likes: 345
7. 魔戒三部曲：王者再臨 (8.9) Peter Jackson  Elijah Wood, Viggo Mortensen | https://www.imdb.com/title/tt0167260/ | Likes: 472
8. 黑色追緝令 (8.8) Quentin Tarantino  John Travolta, Uma Thurman | https://www.imdb.com/title/tt0110912/ | Likes: 336
9. 魔戒首部曲：魔戒現身 (8.8) Peter Jackson  Elijah Wood, Ian McKellen | https://www.imdb.com/title/tt012