-
Notifications
You must be signed in to change notification settings - Fork 0
Transform
Dat Nguyen edited this page Feb 8, 2021
·
1 revision
This stage transform data from the raw tables to our fact and dimensional tables.
-
country_dim
table:- Dimensional table keeps general information of countries in the world.
- We built this table by extracting
world.country
provided in MySQL workbench.
-
covid19_global_fact
table:- Fact table keep series data of covid19 confirmed cases and deaths fro each country by day.
- Transformed from raw table
convid19_global_raw
-
dateid
is an integer in format ofyyyymmdd
computed fromdate
-
covid19_global_monthly_fact
table:- Fact table aggregated from
covid19_global_fact
monthly. - Used to join with other monthly fact tables in query.
- Fact table aggregated from
-
covid19_us_dim
table:- Dimensional table keeps general information of the US states and counties.
- Build this table during extracting process from raw data.
-
covid19_us_fact
table:- Fact table keep covid19 confirmed cases and deaths for each the US' county by day.
- Transformed from
covid19_us_raw
-
dateid
is an integer in format ofyyyymmdd
computed fromdate
-
covid19_us_monthly_fact
table:- Fact table aggregated from
covid19_us_fact
monthly. - Used to join with other monthly fact tables in query.
- Fact table aggregated from
-
stock_price_fact
table:- Fact table keep series data for daily stock prices
- Remind that we've built
stock_ticker_raw
in the Extract stage. This table use stock_ticker as a foreign key. -
dateid
is an integer in format ofyyyymmdd
computed fromdate
CREATE TABLE IF NOT EXISTS stock_price_fact( dateid BIGINT NOT NULL, -- number in YYYYmmdd format stock_ticker VARCHAR(16) NOT NULL, date datetime NOT NULL, high double NOT NULL, low double NOT NULL, open double NOT NULL, close double NOT NULL, volume double NOT NULL, adj_close double NOT NULL, PRIMARY KEY(dateid, stock_ticker), FOREIGN KEY(stock_ticker) REFERENCES stock_ticker_raw(ticker) );
-
bol_series_fact
table:- Fact table keep series data for monthly statistics feature provided by U.S Bureau of Labor
- Transformed from raw table
bol_raw
-
dateid
is an integer in format ofyyyymmdd
computed fromdate
. - we convert
date
frombol_raw.year
andbol_raw.period
. Since the data is monthly published, the day value always show as '01'.
CREATE TABLE IF NOT EXISTS bol_series_fact(
dateid BIGINT NOT NULL, -- id = YYYYMM e.g., 202009 is data for Sep of 2020
series_id VARCHAR(64) NOT NULL, -- matched with series_id from raw data
date datetime NOT NULL, -- monthly
value double,
footnotes varchar(128),
PRIMARY KEY(dateid, series_id),
FOREIGN KEY(series_id) references BOL_series_dim(series_id)
);