The purpose of this DBT project is to help folks with getting started with Heap data synced to a data warehouse via Heap Connect. This project comes with the following baked in:
- A heap_sources.yml file which includes documentation, freshness checks, and basic tests for the built-in views that ship with Heap Connect
 - Logic to handle incremental table transformations of the source data synced to your warehouse via Heap Connect
 - Macros which can be used to quickly spin up attribution and funnel analysis
 - An opinionated viewpoint on how to best structure DBT projects in terms of folder structure and naming conventions in order to promote adoption and collaboration
 
This table contains a row for each event synced to Heap. This table automatically includes pageviews but will also include events toggled to sync via Heap Connect in the Heap UI
| Column Name | Description | Source | 
|---|---|---|
| heap_user_id | Unique ID generated by Heap and stored as a cookie | Heap | 
| heap_session_id | Unique ID generated by Heap and stored as a cookie | Heap | 
| heap_event_name | The name of the event which will either be pageview or the category plus name of the event toggled to sync | Heap | 
| heap_event_time | The timestamp at which the event occurred | Heap | 
| heap_session_event_sequence | The order in which the event occurred within the session for the user | src_heap_events_all_eph | 
| heap_event_funnel_position | If the event name matches a string within the funnel_events variable found in the dbt_project yml file, the index value for that string within that list is returned | stg_heap_events_all_inc_fct | 
| heap_event_is_conversion | Flags whether or not the event name matches a string provided within the conversion events variable | stg_heap_events_all_inc_fct | 
This table contains a row for each pageview captured by Heap with some custom sessionization baked in to make joining Heap Connect data with other sources easier
| Column Name | Description | Source | 
|---|---|---|
| heap_pageview_id | The unique id generated for each pageview | Heap | 
| heap_user_id | The unique id generated for each user | Heap | 
| heap_session_id | The unique id generated for each session | Heap | 
| heap_pageview_time | The timestamp of each pageview | Heap | 
| heap_pageview_domain | The domain component of the url that points to the page that was viewed | Heap | 
| heap_pageview_path | The path component of the url that points to the page that was viewed | Heap | 
| heap_pageview_query | The query component of the url that points to the page that was viewed | Heap | 
| heap_pageview_hash | The hash component of the url that points to the page that was viewed | Heap | 
| heap_pageview_title | The value of the html title tag for the page that was viewed | Heap | 
| heap_pageview_previous_page | The url of the page the user was previously on | Heap | 
| heap_pageview_session_id | The unique id of the user concatenated with the index number of the session generated from custom sessionization logic | stg_heap_pageview_sessions_inc_fct | 
| heap_pageview_session_index | A sequential number that flags the beginning of a new session based on the previous pageview being more than 30 minutes away | stg_heap_pageview_sessions_inc_fct | 
| heap_pageview_session_sequence_number | The order in which each pageview occurred within a session defined by custom sessionization logic | stg_heap_pageview_sessions_inc_fct | 
| heap_pageview_session_start_time | The time of the first pageview for each user and session index number | stg_heap_pageview_sessions_inc_fct | 
| heap_pageview_session_end_time | The time of the last pageview for each user and session index number | stg_heap_pageview_sessions_inc_fct | 
| heap_pageview_session_first_pageview_id | The heap_pageview_id of the first pageview for each user and session index number | stg_heap_pageview_sessions_inc_fct | 
| heap_pageview_session_last_pageview_id | The heap_pageview_id of the last pageview for each user and session index number | stg_heap_pageview_sessions_inc_fct | 
| heap_pageview_session_start_epoch | The epoch time of the first pageview for each user and session index number | stg_heap_pageview_sessions_inc_fct | 
| heap_pageview_session_end_epoch | The epoch time of the last pageview for each user and session index number | stg_heap_pageview_sessions_inc_fct | 
This table contains a row for each session captured by Heap. Sessions are generated from a client side cookie. Server Side events will need to be sessionized with the stg_heap_pageview_sessions_inc_fct model
| Column Name | Description | Source | 
|---|---|---|
| heap_session_id | the unique id generated for each session | Heap | 
| heap_user_id | the unique id generated for each user | Heap | 
| heap_session_sequence_num | the order in which the session occurred for each user | src_heap_sessions_eph | 
| heap_session_start_time | the timestamp of the start of the session | Heap | 
| heap_session_conversion_lookback_14d | subtracting 14 days from the session start time | stg_heap_sessions_inc_fct | 
| heap_session_conversion_lookforward_14d | adding 14 days to the session start time | Heap | 
| heap_session_country | the country tied to the ip where the page was requested from | Heap | 
| heap_session_region | the region tied to the ip where the page was requested from | Heap | 
| heap_session_city | the city tied to the ip where the page was requested from | Heap | 
| heap_session_ip | the ip where the page was requested from | Heap | 
| heap_session_referrer | the url the user was on before landing on your domain | Heap | 
| heap_session_landing_page | the page the user landed on that initiated the session | Heap | 
| heap_session_landing_page_query | the query parameters of the page the user landed on that initiated the session | Heap | 
| heap_session_landing_page_hash | the hash value of the page the user landed on that initiated the session | Heap | 
| heap_session_browser | the browser the user was using when the session was initiated | Heap | 
| heap_session_utm_source | the utm_source pulled from the landing page query parameters | Heap | 
| heap_session_utm_campaign | the utm_campaign pulled from the landing page query parameters | Heap | 
| heap_session_utm_medium | the utm_medium pulled from the landing page query parameters | Heap | 
| heap_session_utm_term | the utm_term pulled from the landing page query parameters | Heap | 
| heap_session_utm_content | the utm_content pulled from the landing page query parameters | Heap | 
| heap_session_defined_channel_attribution | the marketing channel that was attributed with driving the session determined by the logic within the gen_defined_channel_attribution macro | gen_defined_channel_attribution macro | 
| heap_session_is_conversion | flags sessions that contain a conversion event based on the events passed within the conversion_events variable defined in the dbt_project yml file | stg_heap_sessions_inc_fct | 
Heap will automatically generate a user_id for each user tracked by Heap and store that user_id value as a cookie on the users browser/device. Any one user could have > multiple user_ids as the anonymous user_id generated by Heap is unique to each device/browser combination. In order to track user behavior across different device/ > browser combinations, a call to Heap's identify API will need to be made upon the user providing uniquely identifiable information. Once a user has logged into the same > account across different device/browser combinations and a subsequent identify call has been made on each, Heap will then merge the behavior of each unique anonymous > user_id under one newly generated user_id. This user identity resolution process happens downstream of the event stream view within the live feed.
Heap generates a user_migrations table in order to process these identity resolutins. Heap Connect ships with this process built in for BigQuery, Snowflake, and Redshift. However, this means that fact tables will have a mutating user_id value that will need to be updated upon incremental runs. As a catch all, this project is built to update all data for all active users since the last time this job ran.
Each incrementally materialized table will check for active users since the most recent timestamp within each respective incrementally configured table. This check is made within each ephemeral model that has a 1:1 mapping with its respective incrementally configured table. This was done to limit the complexity contained within each model file. It is important to note that the call to the get_active_users macro expects the timestamp column value of the table that is being incrementally materialized.
{{
  config(
    unique_key='replace_me_w_unique_id'
  )
}}
{{ config(
  partition_by={
      "field":"heap_pageview_time",
      "data_type":"timestamp",
      "granularity":"day"
    },
    cluster_by = ['replace_me_w_unique_id'],
    unique_key = 'replace_me_w_unique_id'
  )
}}
{{
  config(
    sort = 'heap_event_time',
    dist = 'heap_event_id',
    unique_key = 'replace_me_w_unique_id'
  )
}}
Heap Docs - Managing Event Syncing
Heap Connect ships with 4 built-in views (users, sessions, all_events, user_migrations). In order to leverage this package to facilitate funnel analysis, you'll need to > sync toggle the event to sync to your warehouse from the Heap Event Definition pane. Once the events have synced, they will appear within the all_events along with a view for each unique synced event.
Now that the events that correspond to your top line funnel have been synced to your warehouse from Heap via Heap Connect, you will want to update the funnel_events > variable found within the dbt_project yaml file. Make sure that the events are entered into the funnel_events list variable in the appropriate order and that the values entered into the funnel_events variable map to the unique event names found within the all_events view.