A Python script for an ETL pipeline to external data. The project will extract data from a url, make any necessary transformations, load the transformed data into a SQLite database, and perform CRUD queries to ananlyze and retrieve preliminary insights on the stored data.
The library directory contains extract.py to extract raw data from an online url source, transform_load.py to transform and load the original raw data from a .csv to a .db SQLite database, and crud_query.py to perform CRUD and query basic SQL operations.
- Jupyter notebook
icu.dblibrary.pyextract.pytransform_load.pycrud_query.py
test_main.pyrequirements.txt- CI/CD pipeline
MakefileREADME.md
This dataset combines data from the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System (BRFSS) and the Kaiser Family Foundation to illustrate the number of people who were at high risk for hospitalization from the novel coronavirus COVID-19 in 2020.
URL: https://github.com/fivethirtyeight/data/blob/e6bbbb2d35310b5c63c2995a0d03d582d0c7b2e6/covid-geography/mmsa-icu-beds.csv

