### Cell 2: Data Loading and Preprocessing

**Markdown Explanation:**

This cell defines the function `load_data`, which is responsible for loading the movies and ratings data from CSV files, converting timestamps to datetime format, merging the datasets on the movie ID, extracting the release year from movie titles, and one-hot encoding genres. The function also includes error handling to log and raise exceptions if issues occur during data loading.

In [None]:
def load_data(movies_file, ratings_file):
    """
    Load and preprocess movies and ratings data.

    This function loads movie and rating data from CSV files, converts timestamps to datetime objects,
    merges the data on movie IDs, extracts release years from titles, and one-hot encodes genres.

    Parameters:
        movies_file (str): Path to the movies data file.
        ratings_file (str): Path to the ratings data file.

    Returns:
        tuple: Three DataFrames (merged_df, movies_df, ratings_df)
               - merged_df: DataFrame containing merged movies and ratings data with additional features.
               - movies_df: DataFrame containing the original movies data.
               - ratings_df: DataFrame containing the original ratings data.
    """
    try:
        # Load movie data from CSV into a DataFrame
        movies_df = pd.read_csv(movies_file)

        # Load rating data from CSV into a DataFrame
        ratings_df = pd.read_csv(ratings_file)

        # Convert the timestamp column in ratings data to datetime format for easier manipulation
        ratings_df['timestamp'] = pd.to_datetime(ratings_df['timestamp'], unit='s')

        # Merge the movie and rating data on the movieId column to combine information from both datasets
        merged_df = pd.merge(ratings_df, movies_df, on='movieId')

        # Extract the release year from the title column using regular expressions, if not already present
        if 'release_year' not in merged_df.columns:
            merged_df['release_year'] = merged_df['title'].str.extract(r'\((\d{4})\)')[0].astype(float)

        # One-hot encode the genres by creating binary columns for each genre
        genre_list = list(set(itertools.chain.from_iterable(merged_df['genres'].str.split('|'))))
        for genre in genre_list:
            genre_pattern = re.escape(genre)  # Escape genre name to handle special characters
            merged_df[genre] = merged_df['genres'].str.contains(r'\b' + genre_pattern + r'\b').astype(int)

        # Return the preprocessed DataFrames
        return merged_df, movies_df, ratings_df

    except FileNotFoundError as fnf_error:
        # Log an error if a file is not found and re-raise the exception
        logging.error(f"File not found: {fnf_error}")
        raise

    except Exception as e:
        # Log any other exceptions that occur during data loading and re-raise the exception
        logging.error(f"Error loading data: {e}")
        raise
