# Modeling in DBT

### Introduction

Now that we have gotten our DBT account connected to our data warehouse and our github repository, it's time to use DBT to perform some modeling.  Modeling means transforming  our data from raw data to other tables that make the data easier to work with.  In an ELT paradigm, we directly load raw data into our database without DBT, and then use DBT to transform the data from there.  

In this lesson, we'll learn more about this structure for transforming our data.

### Our Starting Point

Now with the ELT paradigm, we essentially begin with a data dump.  This is what occurs with the extract and load steps in ELT.  We simply take data from either an internal data source, or external APIs, and dump it into our data warehouse.

For example, take a look at the diagram below.

> <img src="./rittman-pipeline.png" width="80%">

> Diagram courtesy of [Rittman analytics](https://github.com/rittmananalytics/ra_data_warehouse).

Above we can see that data comes from various sources, and that DBT is used to transform the data to ultimately feed to dashboards or send formatted data to various departments.  Each one of those dashboards is a datamart, consisting of fact and dimension tables.

> <img src="./star_pagila.png" width="40%">

So we'll use DBT to transform our source data to data for our data marts.  Remember, that we'll start with our data dumped into our data warehouse (eg. via Fivetran or manually), and from there we can perform transformations to go from the source data to data marts.

### The transformations

Now let's take a look at the process for making these transformations.  

<img src="./dbt-pipeline.jpg" width="60%">

In the diagram above, we can see that we start with our **source tables**, which is our raw data, then clean up that raw data with **staging**, combine data from different sources in **integration**, and finally organize the data in our star schema for stakeholders in the mart layer. 

Let's discuss this in a bit more detail.

1. The source tables

We call these tables that house our untransformed data our **source tables**. These are the tables from our data dump.  Some of our source tables will come from our OLTP database, but many will come from third party sources.  There isn't a lot of work to be do, at this layer, as these tables consist simply of our raw data. 

2. Staging 

Now as we know, when we first load our data, there will be transformations that we will want to apply to each of these tables.  The idea with staging, is to use views to clean up each of our source tables, but not to combine any of our data at this point. 

Many of the transformations that we apply would be light transformations.  For example, in the diagram below we start with `customers` source table drawn from hubspot data, and create a new table with the same data and changing the column names of `first_name`, `last_name` `address_id` to `first`, `last`, `address_id`.

> <img src="./customers_stg.png" width="40%">

> Notice that all of the data in a single staging table is derived from a single source table (just like we see with our customers table above).  

2. Staging to Integration

Now take another look at what occurs from the staging to integration layer.

<img src="./integration.png" width="40%">

When we move from staging to integration, we combine data from multiple tables.  For example, we may be looking to combine user information from hubspot, stripe, and mailchimp into one users table.  

3. Mart tables

Finally, with our mart data we have tables structured for various stakeholders.  These are organized with fact and dimension tables to for data dashboards or to provide directly to various stakeholders.

### Summary

In this lesson, we saw some of the workflow and naming conventions in our data modeling process.  We start with our data already loaded into our data warehouse in our **source tables**.  Then, we use **staging tables** to apply transformations to our source tables.  Each staging table should only reference in a single source table.  Then we saw how we can use the integration layer to combine our data.  Finally, we use the staging tables to create our **mart tables**, which is the name given to our dimension and fact tables.  