# Get data of raw layer

**Objective:** Data ingestion from a remote source to `raw` directory of DBFS.

## Configuration

Add your name to file:
<a href="$./includes/configuration" target="_blank">
includes/configuration</a>

`username = "your_name"`

In [0]:
%run ./includes/configuration

Out[3]: DataFrame[]

## Clean up

In [0]:
dbutils.fs.rm(northwind_path, recurse=True)

for table_name in northwind_tables_trusted:
  spark.sql(
    f"""
    DROP TABLE IF EXISTS {table_name}
    """
  )

for table_name in northwind_tables_refined:
  spark.sql(
    f"""
    DROP TABLE IF EXISTS {table_name}
    """
  )

## Get data

Function `retrieve_data` is used to get data for ingestion.
This function receives arguments:
- `file_name: str`
- `raw_path: str`

#### Import CSV files

In [0]:
dbutils.fs.mkdirs(raw_path)

for file_name in northwind_files:
  retrieve_data(file_name, raw_path)

### Expected files

List of expected files was defined in:
<a href="$./includes/configuration" target="_blank">
includes/configuration</a>

In [0]:
file_names = northwind_files

### Files in `raw` folder

In [0]:
display(dbutils.fs.ls(raw_path))

path,name,size,modificationTime
dbfs:/tfukuda/northwind_dw/raw/categories.csv,categories.csv,427,1654514509000
dbfs:/tfukuda/northwind_dw/raw/customer_customer_demo.csv,customer_customer_demo.csv,29,1654514510000
dbfs:/tfukuda/northwind_dw/raw/customer_demographics.csv,customer_demographics.csv,31,1654514511000
dbfs:/tfukuda/northwind_dw/raw/customers.csv,customers.csv,11563,1654514512000
dbfs:/tfukuda/northwind_dw/raw/employee_territories.csv,employee_territories.csv,417,1654514513000
dbfs:/tfukuda/northwind_dw/raw/employees.csv,employees.csv,4070,1654514513000
dbfs:/tfukuda/northwind_dw/raw/order_details.csv,order_details.csv,40625,1654514514000
dbfs:/tfukuda/northwind_dw/raw/orders.csv,orders.csv,98531,1654514515000
dbfs:/tfukuda/northwind_dw/raw/products.csv,products.csv,4339,1654514516000
dbfs:/tfukuda/northwind_dw/raw/region.csv,region.csv,71,1654514516000


Using `assert` function to verify ingested files

**Note**: `print()` or `display()` is not used in production. It is just for test purpose.

In [0]:
# Validation of ingestion
for file_name in file_names:
  assert file_name in [item.name for item in dbutils.fs.ls(raw_path)], f"{file_name} not present in Raw Path"
print("Assertion passed - files ingested.")

Assertion passed - files ingested.
