# Extract, Transform and Load

ETL is the most frequent process for data engineers, DE needs to integrate data from various heterogeneous sources for data analysts and data scientist.

This practice is under a context where DE needs to get sales data from a URL and then store it locally.

## Extract
Extract data from a storage, I request data and data are sent back. 

In this case, I have an URL to source data.

### 1. requests library

**GET** request/fetch data from a resource

In [9]:
# requests library
import requests
import pandas as pd

# Get the zip file
path = "https://assets.datacamp.com/production/repositories/5899/datasets/19d6cf619d6a771314f0eb489262a31f89c424c2/ppr-all.zip"
response = requests.get(path)

# Print the status code, 200 means ok
print(response.status_code)

# Print the headers (metadata)
print(pd.DataFrame(response.headers.items(), columns=["Header", "Value"]))



200
              Header                                              Value
0               Date                      Sat, 29 Apr 2023 21:11:05 GMT
1       Content-Type                           application/octet-stream
2     Content-Length                                             249296
3         Connection                                         keep-alive
4         x-amz-id-2  XW89DzPlpbUccYH/eZ5vX66OyaaVjqG0VuwuzCmpovCOU+...
5   x-amz-request-id                                   SDQ1W0F2DF1G27D2
6      Last-Modified                      Sun, 30 May 2021 14:00:42 GMT
7   x-amz-version-id                   yT6365UyrWqhSlRsEPyBehY7HKsnLPmH
8               ETag                 "5840e486b3afdf58267d80163cb5d0cf"
9    CF-Cache-Status                                                HIT
10     Accept-Ranges                                              bytes
11        Set-Cookie  __cf_bm=RyVNG.wg5ECoSQfpItPfHOSnQHfJtnBYMawdgh...
12              Vary                                    Acce

### 2. f-strings to build a local path

f-strings is a fancy way to format strings, it combines expressions inside string literals.

In [6]:
# os library
import os

# Set routes
root_dir = 'D:/Learn_DS/Git/python-learning/DataCamp/'
data_dir = 'data'
file_name = 'ETL_Demo.zip'

# Create local directory
if not os.path.exists(data_dir):
    os.makedirs(data_dir)

# build path using f-string
file_path = f"{root_dir}/{data_dir}/{file_name}"

# check if file exists
if os.path.exists(file_path):
    print(f"File {file_name} exists at {file_path}")
else:
    print(f"File {file_name} does not exist at {file_path}")

File ETL_Demo.zip does not exist at D:/Learn_DS/Git/python-learning/DataCamp//data/ETL_Demo.zip


### 3. save online file locally

In [7]:
# Save the file locally, with can close the file after writing it, mode wb for zip file
with open(file_path, "wb") as f:
    f.write(response.content)

### 4. unzip the Zip file

In [10]:
# Unzipped the ZIP file
from zipfile import ZipFile

# Check files

with ZipFile(file_path, mode="r") as f:
  	# Get the list of files and print it
    file_names = f.namelist()
    print(file_names)
    # Extract the csv file
    csv_file_path = f.extract(file_names[0], path=data_dir)
    print(csv_file_path)



['ppr-all.csv']
data\ppr-all.csv


# Transform
