# Data Ingestion

In this section, we will ingest the credit scoring datasets into our Databricks workspace. We will read the CSV files from the mounted volume and write them into Delta tables in the Bronze schema.

*Note: Adjust the catalog, schema, volume, and file path parameters as needed based on your Databricks environment setup.*

In [0]:
CATALOG = 'workspace'
BRONZE_SCHEMA = 'bronze'
SILVER_SCHEMA = 'silver'
GOLD_SCHEMA   = 'gold'

MANAGED_VOLUME_NAME = 'managed_vol_1' 

VOLUMEPATH = f'/Volumes/{CATALOG}/default/{MANAGED_VOLUME_NAME}'
FILEPATH = VOLUMEPATH + '/credit_datasets'

In [None]:
# Create the bronze schema if it doesn't exist
display(spark.sql(f"CREATE SCHEMA IF NOT EXISTS {CATALOG}.{BRONZE_SCHEMA}"))


# Read and Check Data

In [0]:
# read applicants data

df_applicants = spark.read.csv(FILEPATH + '/1_data_nasabah.csv', header=True, inferSchema=True)
df_applicants.limit(5).display()

In [0]:
# read loans data

df_loans = spark.read.csv(FILEPATH + '/2_data_pinjaman.csv', header=True, inferSchema=True)
df_loans.limit(5).display()

In [0]:
# read payment data

df_repayments = spark.read.csv(FILEPATH + '/3_data_pembayaran.csv', header=True, inferSchema=True)
df_repayments.limit(5).display()


# Save data into table

In [0]:
df_applicants.write.mode("overwrite").saveAsTable("{CATALOG}.{BRONZE_SCHEMA}.raw_applicants")
df_loans.write.mode("overwrite").saveAsTable("{CATALOG}.{BRONZE_SCHEMA}.raw_loans")
df_repayments.write.mode("overwrite").saveAsTable("{CATALOG}.{BRONZE_SCHEMA}.raw_repayments")