# Gold Layer Processing

**2. Importing Required Libraries**

In [1]:
from pyspark.sql.functions import when, col, udf
from pyspark.sql.types import StringType
# ensure the below library is installed on your fabric environment
import reverse_geocoder as rg

StatementMeta(, 5a2dc83f-bb98-4755-9fe3-6074f8f1ed21, 5, Finished, Available, Finished)

**3. Loading and Filtering the Silver Layer Data**

In [2]:
df = spark.read.table("earthquake_events_silver").filter(col('time') > start_date)

StatementMeta(, 5a2dc83f-bb98-4755-9fe3-6074f8f1ed21, 6, Finished, Available, Finished)

**Function: get_country_code(lat, lon)**


This function retrieves the country code for a given latitude and longitude using the reverse_geocoder library.

In [3]:
def get_country_code(lat, lon):
    """
    Retrieve the country code for a given latitude and longitude.

    Parameters:
    lat (float or str): Latitude of the location.
    lon (float or str): Longitude of the location.

    Returns:
    str: Country code of the location, retrieved using the reverse geocoding API.

    Example:
    >>> get_country_details(48.8588443, 2.2943506)
    'FR'
    """
    coordinates = (float(lat), float(lon))
    return rg.search(coordinates)[0].get('cc')

StatementMeta(, 5a2dc83f-bb98-4755-9fe3-6074f8f1ed21, 7, Finished, Available, Finished)

**Enriching DataFrame with Country Code Using the UDF**

This piece of code applies the get_country_code_udf to the DataFrame df, enriching it with a new column called country_code. The UDF takes the latitude and longitude columns, uses the reverse geocoding function to find the country code, and adds it to the DataFrame.

In [4]:
# registering the udfs so they can be used on spark dataframes
get_country_code_udf = udf(get_country_code, StringType())

StatementMeta(, 5a2dc83f-bb98-4755-9fe3-6074f8f1ed21, 8, Finished, Available, Finished)

In [5]:
# adding country_code and city attributes
df_with_location = \
                df.\
                    withColumn("country_code", get_country_code_udf(col("latitude"), col("longitude")))

StatementMeta(, 5a2dc83f-bb98-4755-9fe3-6074f8f1ed21, 9, Finished, Available, Finished)


**Adding Significance Classification to the DataFrame**

This code snippet adds a new column, sig_class, to the DataFrame df_with_location, classifying the significance of each earthquake event based on its sig (significance) value. The classification groups events into categories like "Low," "Moderate," and "High."

In [6]:
# adding significance classification
df_with_location_sig_class = \
                            df_with_location.\
                                withColumn('sig_class', 
                                            when(col("sig") < 100, "Low").\
                                            when((col("sig") >= 100) & (col("sig") < 500), "Moderate").\
                                            otherwise("High")
                                            )

StatementMeta(, 5a2dc83f-bb98-4755-9fe3-6074f8f1ed21, 10, Finished, Available, Finished)

**Appending the Data to the Gold Table**

In [7]:
# appending the data to the gold table
df_with_location_sig_class.write.mode('append').saveAsTable('earthquake_events_gold')

StatementMeta(, 5a2dc83f-bb98-4755-9fe3-6074f8f1ed21, 11, Finished, Available, Finished)