Skip to content

thedatagata/heap_package

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Heap Package Docs

Package Description

The purpose of this DBT project is to help folks with getting started with Heap data synced to a data warehouse via Heap Connect. This project comes with the following baked in:

  1. A heap_sources.yml file which includes documentation, freshness checks, and basic tests for the built-in views that ship with Heap Connect
  2. Logic to handle incremental table transformations of the source data synced to your warehouse via Heap Connect
  3. Macros which can be used to quickly spin up attribution and funnel analysis
  4. An opinionated viewpoint on how to best structure DBT projects in terms of folder structure and naming conventions in order to promote adoption and collaboration

Materialized Table Schemas

stg_heap_events_all_inc_fct

This table contains a row for each event synced to Heap. This table automatically includes pageviews but will also include events toggled to sync via Heap Connect in the Heap UI

Column Name Description Source
heap_user_id Unique ID generated by Heap and stored as a cookie Heap
heap_session_id Unique ID generated by Heap and stored as a cookie Heap
heap_event_name The name of the event which will either be pageview or the category plus name of the event toggled to sync Heap
heap_event_time The timestamp at which the event occurred Heap
heap_session_event_sequence The order in which the event occurred within the session for the user src_heap_events_all_eph
heap_event_funnel_position If the event name matches a string within the funnel_events variable found in the dbt_project yml file, the index value for that string within that list is returned stg_heap_events_all_inc_fct
heap_event_is_conversion Flags whether or not the event name matches a string provided within the conversion events variable stg_heap_events_all_inc_fct

stg_heap_pageview_sessions_inc_fct

This table contains a row for each pageview captured by Heap with some custom sessionization baked in to make joining Heap Connect data with other sources easier

Column Name Description Source
heap_pageview_id The unique id generated for each pageview Heap
heap_user_id The unique id generated for each user Heap
heap_session_id The unique id generated for each session Heap
heap_pageview_time The timestamp of each pageview Heap
heap_pageview_domain The domain component of the url that points to the page that was viewed Heap
heap_pageview_path The path component of the url that points to the page that was viewed Heap
heap_pageview_query The query component of the url that points to the page that was viewed Heap
heap_pageview_hash The hash component of the url that points to the page that was viewed Heap
heap_pageview_title The value of the html title tag for the page that was viewed Heap
heap_pageview_previous_page The url of the page the user was previously on Heap
heap_pageview_session_id The unique id of the user concatenated with the index number of the session generated from custom sessionization logic stg_heap_pageview_sessions_inc_fct
heap_pageview_session_index A sequential number that flags the beginning of a new session based on the previous pageview being more than 30 minutes away stg_heap_pageview_sessions_inc_fct
heap_pageview_session_sequence_number The order in which each pageview occurred within a session defined by custom sessionization logic stg_heap_pageview_sessions_inc_fct
heap_pageview_session_start_time The time of the first pageview for each user and session index number stg_heap_pageview_sessions_inc_fct
heap_pageview_session_end_time The time of the last pageview for each user and session index number stg_heap_pageview_sessions_inc_fct
heap_pageview_session_first_pageview_id The heap_pageview_id of the first pageview for each user and session index number stg_heap_pageview_sessions_inc_fct
heap_pageview_session_last_pageview_id The heap_pageview_id of the last pageview for each user and session index number stg_heap_pageview_sessions_inc_fct
heap_pageview_session_start_epoch The epoch time of the first pageview for each user and session index number stg_heap_pageview_sessions_inc_fct
heap_pageview_session_end_epoch The epoch time of the last pageview for each user and session index number stg_heap_pageview_sessions_inc_fct

stg_heap_sessions_inc_fct

This table contains a row for each session captured by Heap. Sessions are generated from a client side cookie. Server Side events will need to be sessionized with the stg_heap_pageview_sessions_inc_fct model

Column Name Description Source
heap_session_id the unique id generated for each session Heap
heap_user_id the unique id generated for each user Heap
heap_session_sequence_num the order in which the session occurred for each user src_heap_sessions_eph
heap_session_start_time the timestamp of the start of the session Heap
heap_session_conversion_lookback_14d subtracting 14 days from the session start time stg_heap_sessions_inc_fct
heap_session_conversion_lookforward_14d adding 14 days to the session start time Heap
heap_session_country the country tied to the ip where the page was requested from Heap
heap_session_region the region tied to the ip where the page was requested from Heap
heap_session_city the city tied to the ip where the page was requested from Heap
heap_session_ip the ip where the page was requested from Heap
heap_session_referrer the url the user was on before landing on your domain Heap
heap_session_landing_page the page the user landed on that initiated the session Heap
heap_session_landing_page_query the query parameters of the page the user landed on that initiated the session Heap
heap_session_landing_page_hash the hash value of the page the user landed on that initiated the session Heap
heap_session_browser the browser the user was using when the session was initiated Heap
heap_session_utm_source the utm_source pulled from the landing page query parameters Heap
heap_session_utm_campaign the utm_campaign pulled from the landing page query parameters Heap
heap_session_utm_medium the utm_medium pulled from the landing page query parameters Heap
heap_session_utm_term the utm_term pulled from the landing page query parameters Heap
heap_session_utm_content the utm_content pulled from the landing page query parameters Heap
heap_session_defined_channel_attribution the marketing channel that was attributed with driving the session determined by the logic within the gen_defined_channel_attribution macro gen_defined_channel_attribution macro
heap_session_is_conversion flags sessions that contain a conversion event based on the events passed within the conversion_events variable defined in the dbt_project yml file stg_heap_sessions_inc_fct

Incremental Logic

Heap will automatically generate a user_id for each user tracked by Heap and store that user_id value as a cookie on the users browser/device. Any one user could have > multiple user_ids as the anonymous user_id generated by Heap is unique to each device/browser combination. In order to track user behavior across different device/ > browser combinations, a call to Heap's identify API will need to be made upon the user providing uniquely identifiable information. Once a user has logged into the same > account across different device/browser combinations and a subsequent identify call has been made on each, Heap will then merge the behavior of each unique anonymous > user_id under one newly generated user_id. This user identity resolution process happens downstream of the event stream view within the live feed.


Heap generates a user_migrations table in order to process these identity resolutins. Heap Connect ships with this process built in for BigQuery, Snowflake, and Redshift. However, this means that fact tables will have a mutating user_id value that will need to be updated upon incremental runs. As a catch all, this project is built to update all data for all active users since the last time this job ran.


Each incrementally materialized table will check for active users since the most recent timestamp within each respective incrementally configured table. This check is made within each ephemeral model that has a 1:1 mapping with its respective incrementally configured table. This was done to limit the complexity contained within each model file. It is important to note that the call to the get_active_users macro expects the timestamp column value of the table that is being incrementally materialized.

Warehouse Specific Configs

Snowflake

{{
  config(
    unique_key='replace_me_w_unique_id'
  )
}}

Big Query

{{ config(
  partition_by={
      "field":"heap_pageview_time",
      "data_type":"timestamp",
      "granularity":"day"
    },
    cluster_by = ['replace_me_w_unique_id'],
    unique_key = 'replace_me_w_unique_id'
  )
}}

Redshift

{{
  config(
    sort = 'heap_event_time',
    dist = 'heap_event_id',
    unique_key = 'replace_me_w_unique_id'
  )
}}

Funnel Analysis

Heap Docs - Managing Event Syncing

Heap Connect ships with 4 built-in views (users, sessions, all_events, user_migrations). In order to leverage this package to facilitate funnel analysis, you'll need to > sync toggle the event to sync to your warehouse from the Heap Event Definition pane. Once the events have synced, they will appear within the all_events along with a view for each unique synced event.


Now that the events that correspond to your top line funnel have been synced to your warehouse from Heap via Heap Connect, you will want to update the funnel_events > variable found within the dbt_project yaml file. Make sure that the events are entered into the funnel_events list variable in the appropriate order and that the values entered into the funnel_events variable map to the unique event names found within the all_events view.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published