# Silver Layer Processing

The Silver layer focuses on transforming and cleansing the data that was previously ingested in the Bronze layer.


**2. Importing Required Libraries:**

In [1]:
from pyspark.sql.functions import col
from pyspark.sql.types import TimestampType

StatementMeta(, 03baad28-0a9a-413c-be6a-9c5e1d6ef718, 3, Finished, Available, Finished)

**3. Date Setup for Filtering Data:**

In [4]:
from datetime import date, timedelta
start_date=date.today()-timedelta(7)


StatementMeta(, 03baad28-0a9a-413c-be6a-9c5e1d6ef718, 6, Finished, Available, Finished)

**4. Appending Data to Silver Table:**

In [5]:
# df now is a Spark DataFrame containing JSON data
df = spark.read.option("multiline", "true").json(f"Files/{start_date}_earthquake_data.json")

StatementMeta(, 03baad28-0a9a-413c-be6a-9c5e1d6ef718, 7, Finished, Available, Finished)

**Reshape Earthquake Data and Extract Key Attributes for Further Analysis**
This section of the code is designed to reshape the earthquake data by extracting specific attributes and renaming them for clarity. The extracted attributes are critical for further analysis of earthquake events.

In [6]:
# Reshape earthquake data by extracting and renaming key attributes for further analysis.
df = \
df.\
    select(
        'id',
        col('geometry.coordinates').getItem(0).alias('longitude'),
        col('geometry.coordinates').getItem(1).alias('latitude'),
        col('geometry.coordinates').getItem(2).alias('elevation'),
        col('properties.title').alias('title'),
        col('properties.place').alias('place_description'),
        col('properties.sig').alias('sig'),
        col('properties.mag').alias('mag'),
        col('properties.magType').alias('magType'),
        col('properties.time').alias('time'),
        col('properties.updated').alias('updated')
        )

StatementMeta(, 03baad28-0a9a-413c-be6a-9c5e1d6ef718, 8, Finished, Available, Finished)

**Convert 'time' and 'updated' Columns from Milliseconds to Timestamp Format**
This section of the code converts the time and updated columns, which are currently in milliseconds (UNIX epoch time), into human-readable timestamp format. This transformation allows for a clearer representation of the datetime information for each earthquake event.

In [7]:
# Convert 'time' and 'updated' columns from milliseconds to timestamp format for clearer datetime representation.
df = df.\
    withColumn('time', col('time')/1000).\
    withColumn('updated', col('updated')/1000).\
    withColumn('time', col('time').cast(TimestampType())).\
    withColumn('updated', col('updated').cast(TimestampType()))

StatementMeta(, 03baad28-0a9a-413c-be6a-9c5e1d6ef718, 9, Finished, Available, Finished)

**Appending the Data to the Silver Table**

This section of the code appends the processed and transformed earthquake data into the Silver table. The Silver table is used to store the cleansed and structured data, ready for further refinement or analysis.

In [8]:

df.write.mode('append').saveAsTable('earthquake_events_silver')

StatementMeta(, 03baad28-0a9a-413c-be6a-9c5e1d6ef718, 10, Finished, Available, Finished)