Skip to content
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
356 lines (290 sloc) 9.07 KB
title: 'Funnel Charts in R with funneljoin'
author: ~
date: '2019-11-08'
slug: funnel-charts-funneljoin
categories: []
tags: []
draft: false
toc: true
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE,
results='show', cache=FALSE, autodep=FALSE)
## funneljoin
[Emily Robinson]( has an awesome new package
`funneljoin` has an intuitive interface to create and analyze funnels.
Be sure to check out her
and her slides:
```{r echo=FALSE}
blogdown::shortcode('tweet', '1193202114446516225')
I'm so excited about Emily's package,
because data analysts get requests for funnel charts all the time.
With `funneljoin`, you don't have to start from scratch:
you can use your time and energy to understand the data,
rather than creating the funnel.
## Customer usage funnels
In the funneljoin vignette, Emily uses ads as an example:
she wants to find the proportion of users who buy something
after they click an ad.
I don't work in ads, but customer usage funnels are also important for product design.
One theory is that customers purchase a product to do a
Suppose we see some customers who purchase the product,
but don't use it.
We need to ask ourselves:
- is the platform making it difficult for them to do their job?
- do we understand the users' job to be done?
### Usage example
Let's take a look at some simulated^[
If you want to see how I simulated this data,
checkout out the source for this blog post here:
] customer usage logs.
```{r, include=FALSE}
# starting to simulate customer log data
week_to_days <- list(
"one" = 0,
"two" = 7,
"three" = 14,
"four" = 21
```{r, include=FALSE}
user_weeks <- tibble(
user = rep(1:100, 3 * 4),
weeks = map(names(week_to_days), rep, 100 * 3) %>%
days_after_start = map(flatten_dbl(week_to_days), ~rep(., 100 * 3)) %>%
```{r, include=FALSE}
upload_weekly_props <- list(
"one" = 1,
"two" = 0.8,
"three" = 0.6,
"four" = 0.4
step_weekly_props <- list(
"one" = 0.4,
"two" = 0.6,
"three" = 0.8,
"four" = 1
wide_funnel <- user_weeks %>%
mutate(upload = map_dbl(weeks, ~rbinom(1, 1, upload_weekly_props[[.]]))) %>%
mutate(submit =
map_dbl(weeks, ~rbinom(1, 1, step_weekly_props[[.]])) * upload
) %>%
mutate(print =
map_dbl(weeks, ~rbinom(1, 1, step_weekly_props[[.]])) * submit
```{r, include=FALSE}
event_logs <- wide_funnel %>%
pivot_longer(upload:print, names_to = "event") %>%
filter(value != 0) %>%
```{r, include=FALSE}
# end of customer log simulation
logs <- event_logs %>%
mutate(skew = rbinom(n(), 1, 0.2)) %>%
mutate(date = case_when(
event == "upload" ~ as.Date("2020-01-06") + skew + days_after_start,
event == "submit" ~ as.Date("2020-01-08") + skew + days_after_start,
event == "print" ~ as.Date("2020-01-10") + skew + days_after_start,
)) %>%
arrange(date) %>%
select(date, user, event)
saveRDS(logs, "funneljoin-logs.RDS")
This dataset records each time a user
uploads, submits, or prints
using the website.
Here's the workflow. Customers
1. upload data
2. submit data to be analyzed
3. print off reports based on the analysis
Additionally, here's some assumptions about our customer.
1. They purchase our product to print reports.
2. Reports are only useful for data uploaded in the same week^[
For example, football coaches need scouting reports to prepare for their
next opponent. Next week is too late, because they've already played them
and they have to prepare for the next game.
Let's use `funneljoin` to analyze our data set and see how many
customers are successfully printing reports on data they've uploaded.
I'll make the funnel a function,
since I'll be reusing the same funnel through-out the blog post.
customer_funnel <- function(.data, type, max_gap = NULL) {
.data %>%
moment_type = "upload",
moment = "event",
tstamp = "date",
user = "user"
) %>%
moment_types = c("submit", "print"),
type = type,
gap_col = TRUE,
max_gap = max_gap
) %>%
mutate(days_to_submit = .gap.x / 60 / 60 / 24) %>%
mutate(days_to_print = .gap.y / 60 / 60 / 24)
logs %>%
customer_funnel("first-firstafter") %>%
At some point,
almost all our our customers are submitting and printing uploaded data.
But are they printing reports in time to be useful
(before the end of the week)?
logs %>%
customer_funnel("first-firstafter") %>%
avg_days_to_submit = mean(days_to_submit, na.rm = TRUE),
avg_days_to_print = mean(days_to_print, na.rm = TRUE),
avg_time_to_complete_funnel = mean(
days_to_submit + days_to_print,
na.rm = TRUE
Something is definitely wrong.
On average,
customers are printing 9 days after uploading.
This gap is too big: the data is out-of-date after 5-6 days and
is no longer useful. What's going on?
Are we using the right [funnel type](
### Funnel types
`first-firstafter` finds the first upload, and then the first submit
(then print) afterwards.
But this isn't a safe assumption. Our customers are uploading
throughout the month.
```{r, fig.height=2}
logs %>%
filter(event == "upload") %>%
mutate(week = floor_date(date, "week")) %>%
group_by(week) %>%
summarise(n = n()) %>%
ggplot(aes(week, n)) +
We care about each upload, not only the first one.
We want to know if they are printing reports after any upload.
Enter `any-firstafter`. This will find the first
submit/print event after any user upload.
logs %>%
customer_funnel("any-firstafter") %>%
avg_days_to_submit = mean(days_to_submit, na.rm = TRUE),
avg_days_to_print = mean(days_to_print, na.rm = TRUE),
avg_time_to_complete_funnel = mean(
days_to_submit + days_to_print,
na.rm = TRUE
This looks better better for our customer.
However they are still taking 6 days on average to complete the funnel. That's right on the edge of being useless.
### Gaps
One of our assumptions is that reports need to be printed in the
same week when the data was uploaded to be useful^[
Technically our assumption was slightly different.
Printing 5 days after the upload may still be too late.
The real constraint is that the print must occur in the same
calender week as the upload.
This adds some complications,
so in this blog post I've
stuck with gaps. But check out my [gap vs. deadline](
blog post for a full analysis.
Let's add this assumption to our funnel, using the `max_gap` parameter.
weekly_customer_funnel <- logs %>%
max_gap = as.difftime(5, units = "days")
weekly_customer_funnel %>%
avg_days_to_submit = mean(days_to_submit, na.rm = TRUE),
avg_days_to_print = mean(days_to_print, na.rm = TRUE),
avg_time_to_complete_funnel = mean(
days_to_submit + days_to_print,
na.rm = TRUE
So for customers who print reports at most 5 days after uploading,
on average
they print reports about 4 days after uploading.
This turnaround time is quick enough to be useful.
But this is only looking at users who print off reports. How many customers
never print reports, even though they upload data?
### Conversions
Let's look at how many uploads lead to a print.
weekly_customer_funnel %>%
Around 30% of data uploaded isn't printed off as a report.
What's going on?
Let's look at the trend over time.
This is a little messy,
but the goal is to calculate the conversation rate by week of upload.
weekly_customer_funnel %>%
mutate(upload_week = floor_date(date_upload, "week")) %>%
nest(funnel = -upload_week) %>%
mutate(funnel = map(funnel, summarize_conversions, date_print)) %>%
unnest(funnel) %>%
Early in the month,
our users print reports much less often.
But by the end of the month,
reports always follow within 5 days of an upload.
Maybe there isn't a problem with the platform,
but there is additional seasonality we need to understand.
## Conclusion
I absolutely love this package.
In my experience, business partners love funnel plots and dashboards.
The fact that this package provides a systematic interface
to cleaning and analyzing this type of data will save me time and brainpower.
Definitely check it out the next time you need to create a funnel!
You can’t perform that action at this time.