Skip to content

Transform

Dat Nguyen edited this page Feb 8, 2021 · 1 revision

This stage transform data from the raw tables to our fact and dimensional tables.

Covid-19

  • country_dim table:

    • Dimensional table keeps general information of countries in the world.
    • We built this table by extracting world.country provided in MySQL workbench.
  • covid19_global_fact table:

    • Fact table keep series data of covid19 confirmed cases and deaths fro each country by day.
    • Transformed from raw table convid19_global_raw
    • dateid is an integer in format of yyyymmdd computed from date
  • covid19_global_monthly_fact table:

    • Fact table aggregated from covid19_global_fact monthly.
    • Used to join with other monthly fact tables in query.
  • covid19_us_dim table:

    • Dimensional table keeps general information of the US states and counties.
    • Build this table during extracting process from raw data.
  • covid19_us_fact table:

    • Fact table keep covid19 confirmed cases and deaths for each the US' county by day.
    • Transformed from covid19_us_raw
    • dateid is an integer in format of yyyymmdd computed from date
  • covid19_us_monthly_fact table:

    • Fact table aggregated from covid19_us_fact monthly.
    • Used to join with other monthly fact tables in query.

Stock Prices

  • stock_price_fact table:
    • Fact table keep series data for daily stock prices
    • Remind that we've built stock_ticker_raw in the Extract stage. This table use stock_ticker as a foreign key.
    • dateid is an integer in format of yyyymmdd computed from date
    CREATE TABLE IF NOT EXISTS stock_price_fact(
         dateid BIGINT NOT NULL, -- number in YYYYmmdd format
      stock_ticker VARCHAR(16) NOT NULL,
      date datetime NOT NULL,
      high double NOT NULL,
      low double NOT NULL,
      open double NOT NULL,
      close double NOT NULL,
      volume double NOT NULL,
      adj_close double NOT NULL,
      
      PRIMARY KEY(dateid, stock_ticker),
      FOREIGN KEY(stock_ticker) REFERENCES stock_ticker_raw(ticker)
    );
    

Employment / Unemployment rate

  • bol_series_fact table:
    • Fact table keep series data for monthly statistics feature provided by U.S Bureau of Labor
    • Transformed from raw table bol_raw
    • dateid is an integer in format of yyyymmdd computed from date.
    • we convert date from bol_raw.year and bol_raw.period. Since the data is monthly published, the day value always show as '01'.
CREATE TABLE IF NOT EXISTS bol_series_fact(
	   dateid BIGINT NOT NULL, -- id = YYYYMM e.g., 202009 is data for Sep of 2020
    series_id VARCHAR(64) NOT NULL, -- matched with series_id from raw data
    date datetime NOT NULL, -- monthly
    value double,
    footnotes  varchar(128),
    PRIMARY KEY(dateid, series_id),
    
    FOREIGN KEY(series_id) references BOL_series_dim(series_id)
);
Clone this wiki locally