## ETL pipeline for Loading Historical Youtube Trending Data into BigQuery

1. **Extract:** Extract Parquet Data from Google Cloud Storage
2. **Transform:** Do Transformations on Data 
3. **Load:** Load Transformed Data in BigQuery Table

In [None]:
# !pip install --upgrade google-cloud-bigquery google-cloud-storage

Collecting google-cloud-bigquery
  Using cached google_cloud_bigquery-3.31.0-py3-none-any.whl.metadata (7.7 kB)
Collecting grpcio-status<2.0.dev0,>=1.33.2 (from google-api-core[grpc]<3.0.0,>=2.11.1->google-cloud-bigquery)
  Downloading grpcio_status-1.72.0rc1-py3-none-any.whl.metadata (1.1 kB)
Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<7.0.0,>=3.19.5 (from google-api-core<3.0.0,>=2.11.1->google-api-core[grpc]<3.0.0,>=2.11.1->google-cloud-bigquery)
  Downloading protobuf-6.31.0rc1-cp39-abi3-macosx_10_9_universal2.whl.metadata (596 bytes)
Collecting grpcio<2.0dev,>=1.33.2 (from google-api-core[grpc]<3.0.0,>=2.11.1->google-cloud-bigquery)
  Downloading grpcio-1.72.0rc1-cp311-cp311-macosx_11_0_universal2.whl.metadata (3.8 kB)
Using cached google_cloud_bigquery-3.31.0-py3-none-any.whl (250 kB)
Downloading grpcio_status-1.72.0rc1-py3-none-any.whl (14 kB)
Downloading grpcio-1.72.0rc1-cp311-cp311-macosx_11_0_universal2.whl (11.3 MB)
[2K   [90m

In [3]:
from google.cloud import bigquery
from google.cloud import storage
import pandas as pd
import tempfile
import os
import numpy as np
from datetime import datetime
import logging
import statistics as st
import tempfile

In [6]:
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "aesthetic-nova-454803-r7-94e7eb0af61c.json"

### Extracting the Parquet File Stored in Google Cloud Storage (GCS) bucket

In [None]:
# GCS configuration
BUCKET_NAME = "youtube-trending-videos-dataset"
PARQUET_BLOB_PATH = "youtube_trending_data/US_youtube_trending_data.parquet"

storage_client = storage.Client()

def download_blob_to_temp(bucket_name, source_blob_name, suffix=""):
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(source_blob_name)
    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=suffix)
    blob.download_to_filename(temp_file.name)
    return temp_file.name

# Download Parquet file to a temporary file
parquet_temp_path = download_blob_to_temp(BUCKET_NAME, PARQUET_BLOB_PATH, suffix=".parquet")

df = pd.read_parquet(parquet_temp_path)
df.head()


Unnamed: 0,video_id,title,publishedAt,channelId,channelTitle,categoryId,trending_date,tags,view_count,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,description
0,3C66w5Z0ixs,I ASKED HER TO BE MY GIRLFRIEND...,2020-08-11 19:20:14+00:00,UCvtRTOMP2TqYqu51xNrqAzg,Brawadis,People & Blogs,2020-08-12 00:00:00+00:00,brawadis|prank|basketball|skits|ghost|funny vi...,1514614,156908,5855,35313,https://i.ytimg.com/vi/3C66w5Z0ixs/default.jpg,False,False,SUBSCRIBE to BRAWADIS ▶ http://bit.ly/Subscrib...
1,M9Pmf9AB4Mo,Apex Legends | Stories from the Outlands – “Th...,2020-08-11 17:00:10+00:00,UC0ZV6M2THA81QT9hrVWJG3A,Apex Legends,Gaming,2020-08-12 00:00:00+00:00,Apex Legends|Apex Legends characters|new Apex ...,2381688,146739,2794,16549,https://i.ytimg.com/vi/M9Pmf9AB4Mo/default.jpg,False,False,"While running her own modding shop, Ramya Pare..."
2,J78aPJ3VyNs,I left youtube for a month and THIS is what ha...,2020-08-11 16:34:06+00:00,UCYzPXprvl5Y-Sf0g4vX-m6g,jacksepticeye,Entertainment,2020-08-12 00:00:00+00:00,jacksepticeye|funny|funny meme|memes|jacksepti...,2038853,353787,2628,40221,https://i.ytimg.com/vi/J78aPJ3VyNs/default.jpg,False,False,I left youtube for a month and this is what ha...
3,kXLn3HkpjaA,XXL 2020 Freshman Class Revealed - Official An...,2020-08-11 16:38:55+00:00,UCbg_UMjlHJg_19SZckaKajg,XXL,Music,2020-08-12 00:00:00+00:00,xxl freshman|xxl freshmen|2020 xxl freshman|20...,496771,23251,1856,7647,https://i.ytimg.com/vi/kXLn3HkpjaA/default.jpg,False,False,Subscribe to XXL → http://bit.ly/subscribe-xxl...
4,VIUo6yapDbc,Ultimate DIY Home Movie Theater for The LaBran...,2020-08-11 15:10:05+00:00,UCDVPcEbVLQgLZX0Rt6jo34A,Mr. Kate,Howto & Style,2020-08-12 00:00:00+00:00,The LaBrant Family|DIY|Interior Design|Makeove...,1123889,45802,964,2196,https://i.ytimg.com/vi/VIUo6yapDbc/default.jpg,False,False,Transforming The LaBrant Family's empty white ...


In [5]:
df.shape

(268787, 16)

In [6]:
df['video_id'].nunique()

47142

We observe that there are 268787 `video_id` found, but only 47142 of them are unique. Hence, we conduct a check for duplicate data.

In [8]:
# Filter `video_id` that appear more than once; Example 10
video_ids_with_10plus = df['video_id'].value_counts()
video_ids_with_10plus = video_ids_with_10plus[video_ids_with_10plus > 10].index

# Randomly pick one of those `video_id`
example_id = np.random.choice(video_ids_with_10plus)

df[df['video_id'] == example_id]

Unnamed: 0,video_id,title,publishedAt,channelId,channelTitle,categoryId,trending_date,tags,view_count,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,description
58421,xaKAn0zHMH0,"World's Hottest Gummy Bear (9,000,000 Scoville)",2021-06-03 20:03:16+00:00,UCd1fLoVFooPeWqCEYVUJZqg,Matt Stonie,Entertainment,2021-06-05 00:00:00+00:00,Matt Stonie|Megatoad|Competitive Eating|Food C...,1614230,106457,1754,5555,https://i.ytimg.com/vi/xaKAn0zHMH0/default.jpg,False,False,A portion of this video is sponsored by The Mo...
58624,xaKAn0zHMH0,"World's Hottest Gummy Bear (9,000,000 Scoville)",2021-06-03 20:03:16+00:00,UCd1fLoVFooPeWqCEYVUJZqg,Matt Stonie,Entertainment,2021-06-05 00:00:00+00:00,Matt Stonie|Megatoad|Competitive Eating|Food C...,2021264,124460,2019,6052,https://i.ytimg.com/vi/xaKAn0zHMH0/default.jpg,False,False,A portion of this video is sponsored by The Mo...
58843,xaKAn0zHMH0,"World's Hottest Gummy Bear (9,000,000 Scoville)",2021-06-03 20:03:16+00:00,UCd1fLoVFooPeWqCEYVUJZqg,Matt Stonie,Entertainment,2021-06-06 00:00:00+00:00,Matt Stonie|Megatoad|Competitive Eating|Food C...,2198527,132458,2163,6211,https://i.ytimg.com/vi/xaKAn0zHMH0/default.jpg,False,False,A portion of this video is sponsored by The Mo...
59052,xaKAn0zHMH0,"World's Hottest Gummy Bear (9,000,000 Scoville)",2021-06-03 20:03:16+00:00,UCd1fLoVFooPeWqCEYVUJZqg,Matt Stonie,Entertainment,2021-06-06 00:00:00+00:00,Matt Stonie|Megatoad|Competitive Eating|Food C...,2327535,137953,2246,6333,https://i.ytimg.com/vi/xaKAn0zHMH0/default.jpg,False,False,A portion of this video is sponsored by The Mo...
59266,xaKAn0zHMH0,"World's Hottest Gummy Bear (9,000,000 Scoville)",2021-06-03 20:03:16+00:00,UCd1fLoVFooPeWqCEYVUJZqg,Matt Stonie,Entertainment,2021-06-07 00:00:00+00:00,Matt Stonie|Megatoad|Competitive Eating|Food C...,2446440,142586,2340,6412,https://i.ytimg.com/vi/xaKAn0zHMH0/default.jpg,False,False,A portion of this video is sponsored by The Mo...
59478,xaKAn0zHMH0,"World's Hottest Gummy Bear (9,000,000 Scoville)",2021-06-03 20:03:16+00:00,UCd1fLoVFooPeWqCEYVUJZqg,Matt Stonie,Entertainment,2021-06-07 00:00:00+00:00,Matt Stonie|Megatoad|Competitive Eating|Food C...,2529782,145383,2375,6497,https://i.ytimg.com/vi/xaKAn0zHMH0/default.jpg,False,False,A portion of this video is sponsored by The Mo...
59706,xaKAn0zHMH0,"World's Hottest Gummy Bear (9,000,000 Scoville)",2021-06-03 20:03:16+00:00,UCd1fLoVFooPeWqCEYVUJZqg,Matt Stonie,Entertainment,2021-06-08 00:00:00+00:00,Matt Stonie|Megatoad|Competitive Eating|Food C...,2611407,148382,2419,6580,https://i.ytimg.com/vi/xaKAn0zHMH0/default.jpg,False,False,A portion of this video is sponsored by The Mo...
59907,xaKAn0zHMH0,"World's Hottest Gummy Bear (9,000,000 Scoville)",2021-06-03 20:03:16+00:00,UCd1fLoVFooPeWqCEYVUJZqg,Matt Stonie,Entertainment,2021-06-08 00:00:00+00:00,Matt Stonie|Megatoad|Competitive Eating|Food C...,2679077,150412,2457,6629,https://i.ytimg.com/vi/xaKAn0zHMH0/default.jpg,False,False,A portion of this video is sponsored by The Mo...
60136,xaKAn0zHMH0,"World's Hottest Gummy Bear (9,000,000 Scoville)",2021-06-03 20:03:16+00:00,UCd1fLoVFooPeWqCEYVUJZqg,Matt Stonie,Entertainment,2021-06-09 00:00:00+00:00,Matt Stonie|Megatoad|Competitive Eating|Food C...,2744775,152488,2500,6697,https://i.ytimg.com/vi/xaKAn0zHMH0/default.jpg,False,False,A portion of this video is sponsored by The Mo...
60338,xaKAn0zHMH0,"World's Hottest Gummy Bear (9,000,000 Scoville)",2021-06-03 20:03:16+00:00,UCd1fLoVFooPeWqCEYVUJZqg,Matt Stonie,Entertainment,2021-06-09 00:00:00+00:00,Matt Stonie|Megatoad|Competitive Eating|Food C...,2791302,153886,2514,6718,https://i.ytimg.com/vi/xaKAn0zHMH0/default.jpg,False,False,A portion of this video is sponsored by The Mo...


From the above example, we can observe that for the same video retrieved, as the `trending_date` becomes larger, video engagement metrics such as `view_counts`, `likes`, `dislikes`, `comment_count` also increases. This represents the exact duration for which the same video was found in the trending videos list, which we would like to explore more.

### ETL Pipeline from GCS -> BigQuery

After we extract the GCS bucket data containing our historical data, we conduct 1 round of **Transformation** by aggregating video features by their entire trending period to avoid video duplication as seen above (ie. 1 trending video for 10 days; 10 records -> 1 record). We also perform feature engineering to obtain derived features (from original features) as well as new features we would like to analyse (`popularity_score`, `popularity_class`).

After Transformation, we **Load** the data into a BigQuery table `US_trending_videos_transformed` where we are then able to query the data for our downstream Machine Learning tasks.

In [None]:
# Set up basic logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

class YouTubeTrendingETL:
    def __init__(self, gcp_project_id, dataset_id, bucket_name):
        """
        Initialize the ETL pipeline with GCP project, dataset ID, and bucket name.
        
        :param gcp_project_id: GCP Project ID
        :param dataset_id: BigQuery dataset ID
        :param bucket_name: Google Cloud Storage bucket name
        """
        self.gcp_project_id = gcp_project_id
        self.dataset_id = dataset_id
        self.bucket_name = bucket_name
        self.bq_client = bigquery.Client(project=gcp_project_id)
        self.storage_client = storage.Client(project=gcp_project_id)
        self.bucket = self.storage_client.bucket(bucket_name)

    def extract_from_gcs(self, parquet_blob_path):
        """
        Extract data from a parquet file in GCS
        
        :param parquet_blob_path: Path to the parquet file in the GCS bucket
        :return: Pandas DataFrame containing the data
        """
        logger.info(f"Extracting data from gs://{self.bucket_name}/{parquet_blob_path}")
        
        # Create a temporary file to download the parquet
        temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.parquet')
        blob = self.bucket.blob(parquet_blob_path)
        blob.download_to_filename(temp_file.name)
        
        df = pd.read_parquet(temp_file.name)
        
        # Clean up the temporary file
        temp_file.close()
        os.unlink(temp_file.name)
        
        logger.info(f"Extracted {len(df)} rows of data")
        return df

    def transform_data(self, df):
        """
        Transform the YouTube trending videos data
        
        :param df: Pandas DataFrame containing the raw data
        :return: Transformed Pandas DataFrame
        """
        logger.info("Starting data transformation")
        
        # Make a copy to avoid modifying the original dataframe
        df1 = df.copy()

        # Dropping `dislikes` column (discontinued by YouTube from 2020-2021)
        df1.drop(columns=['dislikes'], inplace=True, errors='ignore')

        def start_end_trending(x: list) -> list:
            return [min(x), max(x)]

        def description_transf(x: list) -> str:
            return pd.Series(x).fillna('Unknown').iloc[0]

        # Grouping by 'video_id' and aggregating using functions as specified in the above cell
        transformed_df = df1.groupby('video_id').agg({'title': st.mode, 'publishedAt': np.min, 'channelId': st.mode, 'channelTitle': st.mode, 'categoryId': st.mode, 'tags': st.mode,
                    'trending_date': start_end_trending, 'likes': start_end_trending, 'view_count': start_end_trending, 'comment_count': start_end_trending, 'comments_disabled': st.mode, 
                    'ratings_disabled': st.mode, 'description':description_transf}).reset_index()

        # Extract from (min, max)
        transformed_df['trending_date_start'] = transformed_df['trending_date'].apply(lambda x: min(x))
        transformed_df['trending_date_end'] = transformed_df['trending_date'].apply(lambda x: max(x))
        transformed_df['likes_start'] = transformed_df['likes'].apply(lambda x: min(x))
        transformed_df['likes_end'] = transformed_df['likes'].apply(lambda x: max(x))
        transformed_df['view_count_start'] = transformed_df['view_count'].apply(lambda x: min(x))
        transformed_df['view_count_end'] = transformed_df['view_count'].apply(lambda x: max(x))
        transformed_df['comment_count_start'] = transformed_df['comment_count'].apply(lambda x: min(x))
        transformed_df['comment_count_end'] = transformed_df['comment_count'].apply(lambda x: max(x))   

        # Number of days in trending
        transformed_df['trendingDuration'] = (transformed_df['trending_date_end'] - transformed_df['trending_date_start']).dt.days

        # Time Taken to Trend (in Hours)
        transformed_df['hoursToReachTrending'] = round((transformed_df['trending_date_start'] - transformed_df['publishedAt']).dt.total_seconds() / (60 * 60), 1)

        # Extracting the day of the week
        transformed_df['published_dayOfWeek'] = transformed_df['publishedAt'].dt.day_name()

        # Extracting published Year of each video
        transformed_df['published_year'] = transformed_df['publishedAt'].dt.year

        # Extracting published month of each video
        transformed_df['published_month'] = transformed_df['publishedAt'].dt.month   

        # Number of tags present
        transformed_df['tagCount'] = transformed_df['tags'].apply(lambda x: 0 if type(x) == float else len(list(x.split('|'))))

        
        transformed_df.drop(['trending_date', 'likes', 'publishedAt', 'view_count', 'trending_date_start', 'trending_date_end', 'comment_count'], axis = 1, inplace = True)

        # Aggregating the engagement rate: We will take the average of the comment_count and view_count over the trending period
        transformed_df['engagement_rate'] = (transformed_df['comment_count_end'] / 
                                            transformed_df['view_count_end']).replace([np.inf, -np.inf], 0)

        # Aggregating like-to-view ratio: Calculate the ratio based on the final counts of likes and view_count at the end of the trending period
        transformed_df['like_view_ratio'] = (transformed_df['likes_end'] / 
                                            transformed_df['view_count_end']).replace([np.inf, -np.inf], 0)

        # Popularity score calculation: We use the end values for likes, views, and comments for a final snapshot
        transformed_df['popularity_score'] = (transformed_df['comment_count_end'] / 
                                            transformed_df['view_count_end'] * 
                                            transformed_df['likes_end']).replace([np.inf, -np.inf], 0)

        # Calculate percentiles for popularity score
        Q1 = transformed_df['popularity_score'].quantile(0.25)
        Q3 = transformed_df['popularity_score'].quantile(0.75)

        thresholds = {'Q1': Q1, 'Q3': Q3}

        # Video popularity classification (adapted from the research paper)
        conditions = [
            # Class 0: Non-popular videos (views < 100,000)
            (transformed_df['view_count_end'] < 100000),
            
            # Class 1: "Bad views" based on popularity score
            (transformed_df['popularity_score'] < Q1),
            
            # Class 3: Videos with overwhelming praise (score > Q3)
            (transformed_df['popularity_score'] > Q3)
        ]

        # Default is Class 2 (neutral videos)
        choices = [0, 1, 3]
        transformed_df['popularity_class'] = np.select(conditions, choices, default=2)
        
        logger.info(f"Transformation complete. DataFrame now has {len(transformed_df.columns)} columns")
        return transformed_df

    def preview_dataframe(self, df, rows=5):
        """
        Simple preview of the dataframe
        
        :param df: DataFrame to preview
        :param rows: Number of rows to display
        """
        logger.info(f"DataFrame preview (first {rows} rows):")
        print("\nDataFrame Preview:")
        print(f"Shape: {df.shape}")
        print("\nSample Data:")
        print(df.head(rows))
        print("\nColumn Data Types:")
        print(df.dtypes)
        print("\nSummary Statistics:")
        print(df.describe().round(2))

    def load_to_gcs(self, df, output_blob_path):
        """
        Load the transformed DataFrame back to GCS as a parquet file
        
        :param df: Pandas DataFrame to save
        :param output_blob_path: Path where to save the parquet file in GCS
        :return: GCS URI of the saved file
        """
        logger.info(f"Loading transformed data to gs://{self.bucket_name}/{output_blob_path}")
        
        # Create a temporary file to save the parquet
        temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.parquet')
        
        df.to_parquet(temp_file.name, index=False)
        
        # Upload to GCS
        blob = self.bucket.blob(output_blob_path)
        blob.upload_from_filename(temp_file.name)
        
        # Clean up the temporary file
        temp_file.close()
        os.unlink(temp_file.name)
        
        uri = f"gs://{self.bucket_name}/{output_blob_path}"
        logger.info(f"Successfully saved transformed data to {uri}")
        return uri

    def load_to_bigquery(self, gcs_uri, table_id):
        """
        Load data from GCS to a BigQuery table
        
        :param gcs_uri: GCS URI of the parquet file
        :param table_id: Target BigQuery table ID
        """
        logger.info(f"Loading data from {gcs_uri} to BigQuery table {table_id}")
        
        # Define the job configuration for loading parquet
        job_config = bigquery.LoadJobConfig(
            source_format=bigquery.SourceFormat.PARQUET,
            autodetect=True,  # Automatically infer schema from the parquet file
            write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE  # Replace existing table
        )

        # Create the full table reference (project_id.dataset_id.table_id)
        table_ref = f"{self.gcp_project_id}.{self.dataset_id}.{table_id}"

        # Load parquet data into BigQuery
        load_job = self.bq_client.load_table_from_uri(
            gcs_uri, table_ref, job_config=job_config
        )

        load_job.result()  # Waits for the job to complete
        
        # Get the table and print info
        table = self.bq_client.get_table(table_ref)
        logger.info(f"Loaded {table.num_rows} rows and {len(table.schema)} columns to {table_ref}")

    def run_pipeline(self, input_blob_path, transformed_blob_path, table_id):
        """
        Execute the complete ETL pipeline
        
        :param input_blob_path: Path to the input parquet file in GCS
        :param transformed_blob_path: Path where to save the transformed parquet file
        :param table_id: Target BigQuery table ID
        """
        logger.info("Starting ETL pipeline")
        
        # Extract
        raw_data = self.extract_from_gcs(input_blob_path)
        
        # Transform
        transformed_data = self.transform_data(raw_data)
        
        # Preview the transformed data before loading
        self.preview_dataframe(transformed_data)
        
        # Load to GCS
        transformed_uri = self.load_to_gcs(transformed_data, transformed_blob_path)
        
        # Load to BigQuery
        self.load_to_bigquery(transformed_uri, table_id)
        
        logger.info("ETL pipeline completed successfully!")

if __name__ == "__main__":
    GCP_PROJECT_ID = "aesthetic-nova-454803-r7"
    DATASET_ID = "youtube_trending_dataset"
    BUCKET_NAME = "youtube-trending-videos-dataset"
    
    # Initialize the ETL pipeline
    youtube_etl = YouTubeTrendingETL(GCP_PROJECT_ID, DATASET_ID, BUCKET_NAME)

    # Input and output paths
    input_blob_path = "youtube_trending_data/US_youtube_trending_data.parquet"
    transformed_blob_path = "youtube_trending_data/transformed/US_youtube_trending_data_transformed.parquet"
    table_id = "US_trending_videos_transformed"

    # Run the pipeline
    youtube_etl.run_pipeline(input_blob_path, transformed_blob_path, table_id)

2025-04-09 00:47:56,073 - INFO - Starting ETL pipeline
2025-04-09 00:47:56,073 - INFO - Extracting data from gs://youtube-trending-videos-dataset/youtube_trending_data/US_youtube_trending_data.parquet
2025-04-09 00:48:07,061 - INFO - Extracted 268787 rows of data
2025-04-09 00:48:07,063 - INFO - Starting data transformation
  transformed_df = df1.groupby('video_id').agg({'title': st.mode, 'publishedAt': np.min, 'channelId': st.mode, 'channelTitle': st.mode, 'categoryId': st.mode, 'tags': st.mode,
2025-04-09 00:48:11,485 - INFO - Transformation complete. DataFrame now has 25 columns
2025-04-09 00:48:11,488 - INFO - DataFrame preview (first 5 rows):
2025-04-09 00:48:11,511 - INFO - Loading transformed data to gs://youtube-trending-videos-dataset/youtube_trending_data/transformed/US_youtube_trending_data_transformed.parquet



DataFrame Preview:
Shape: (47142, 25)

Sample Data:
      video_id                                              title  \
0  --14w5SOEUs                 Migos - Avalanche (Official Video)   
1  --2O86Z0hsM                           MY TESLA PAYS FOR ITSELF   
2  --40TEbZ9Is         Supporting Actress in a Comedy: 73rd Emmys   
3  --47FjCWgrU  San Francisco 49ers vs. Arizona Cardinals Game...   
4  --5-brQiQFg  Washington Commanders vs. San Francisco 49ers ...   

                  channelId        channelTitle     categoryId  \
0  UCGIelM2Dj3zza3xyV3pL3WQ           MigosVEVO          Music   
1  UCXJEvxZSozjAAqhbMfhIArA             jf.okay  Entertainment   
2  UClBKH8yZRcM4AsRjDVEdjMg  Television Academy  Entertainment   
3  UCDVYQ4Zhbm3S2dlz7P1GBDg                 NFL         Sports   
4  UCDVYQ4Zhbm3S2dlz7P1GBDg                 NFL         Sports   

                                                tags  comments_disabled  \
0  Migos|Avalanche|Quality|Control|Music/Motown|R...        

2025-04-09 00:48:16,144 - INFO - Successfully saved transformed data to gs://youtube-trending-videos-dataset/youtube_trending_data/transformed/US_youtube_trending_data_transformed.parquet
2025-04-09 00:48:16,147 - INFO - Loading data from gs://youtube-trending-videos-dataset/youtube_trending_data/transformed/US_youtube_trending_data_transformed.parquet to BigQuery table US_trending_videos_transformed
2025-04-09 00:48:23,505 - INFO - Loaded 47142 rows and 25 columns to aesthetic-nova-454803-r7.youtube_trending_dataset.US_trending_videos_transformed
2025-04-09 00:48:23,506 - INFO - ETL pipeline completed successfully!
