In [1]:
%%capture
!pip install -r ../requirements.txt

In [2]:
%load_ext sql
%sql postgresql://corise:corise@localhost:5432/dbt
%config SqlMagic.displaylimit=10
%config SqlMagic.displaycon = False

### Modeling challenge
Let’s say that the Director of Product at greenery comes to us (the head Analytics Engineer) and asks some questions:

How are our users moving through the product funnel?

Which steps in the funnel have largest drop off points?

Product funnel is defined with 3 levels for our dataset:
- Sessions with any event of type page_view / add_to_cart / checkout
- Sessions with any event of type add_to_cart / checkout
- Sessions with any event of type checkout

In [3]:
%%sql

with visitors as (
  select
    session_id, -- effectively a user_id
    min(created_at) as min_time -- gets the earliest Visit for each person
  from staging.stg_events
  group by 1
),

page_views as (
  select
    distinct e.session_id
  from visitors v -- ensures we only look at the Visitors defined above
  inner join staging.stg_events e on e.session_id = v.session_id
  where e.event_type= 'page_view' -- an internal event that defines sign-up
),

add_to_cart as (
  select
    distinct e.session_id
  from page_views s  -- ensures we only look at the Signups defined above
  inner join staging.stg_events e on e.session_id= s.session_id
  where e.event_type= 'add_to_cart'
),

checkouts as (
  select
    distinct e.session_id
  from add_to_cart  a -- ensures we only look at the Activations defined above
  inner join staging.stg_events e on e.session_id = a.session_id
  where e.event_type = 'checkout'   
),


steps as (
  select 'Visits' as step, COUNT(*) from visitors
    union
   select 'Page Views' as step, COUNT(*) from page_views
    union
  select 'Add to Carts' as step, COUNT(*) from add_to_cart
    union
  select 'Checkouts' as step, COUNT(*) from checkouts
  order by count desc
)

select
  step,
  count,
  lag(count, 1) over (),
  round((1.0 - count::numeric/lag(count, 1) over ()),2) as drop_off

from steps

(psycopg2.errors.UndefinedTable) relation "staging.stg_events" does not exist
LINE 5:   from staging.stg_events
               ^

[SQL: with visitors as (
  select
    session_id, -- effectively a user_id
    min(created_at) as min_time -- gets the earliest Visit for each person
  from staging.stg_events
  group by 1
),

page_views as (
  select
    distinct e.session_id
  from visitors v -- ensures we only look at the Visitors defined above
  inner join staging.stg_events e on e.session_id = v.session_id
  where e.event_type= 'page_view' -- an internal event that defines sign-up
),

add_to_cart as (
  select
    distinct e.session_id
  from page_views s  -- ensures we only look at the Signups defined above
  inner join staging.stg_events e on e.session_id= s.session_id
  where e.event_type= 'add_to_cart'
),

checkouts as (
  select
    distinct e.session_id
  from add_to_cart  a -- ensures we only look at the Activations defined above
  inner join staging.stg_events e on e.session_id 

##### If your organization is thinking about using dbt, how would you pitch the value of dbt/analytics engineering to a decision maker at your organization?
- I would pitch the ability to better version control data modeling logic, and present it in visual DAGs so other analysts can conceptualize where data is coming from.

#### If you are thinking about moving to analytics engineering, what skills have you picked that give you the most confidence in pursuing this next step?
- Learning new ways to model events/web data

#### Setting up for production / scheduled dbt run of your project And finally, before you fly free into the dbt night, we will take a step back and reflect: after learning about the various options for dbt deployment and seeing your final dbt project, how would you go about setting up a production/scheduled dbt run of your project in an ideal state? You don’t have to actually set anything up - just jot down what you would do and why and post in a README file.

Hints: what steps would you have? Which orchestration tool(s) would you be interested in using? What schedule would you run your project on? Which metadata would you be interested in using? How/why would you use the specific metadata? , etc.

- I would look into dagster, prefect, airflow to set up the orchestation of dbt. Dagster looks to have an impressive integration with them, however only self-hosted options seem to exist currently.

### Week 4 DAG:

![Week4Dag](images/dbt-dag-week4.png)