<a href="https://colab.research.google.com/github/theventurecity/data-toolkit/blob/master/Understanding_Event_Logs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://theventure.city"><img src="https://github.com/theventurecity/data-toolkit/blob/master/img/tvc_horiz_junglegreen.png?raw=true" alt='TheVentureCity' style="width: 400px;"></a>

# Understanding Event Logs

User-level tracking allows for an array of analyses that help us understand engagement, retention, and growth accounting. In fact, just a few fields—user id(s), timestamp, event name, transaction amount—can be used to derive a variety of useful insights. And you don't need expensive tools to do it. TheVentureCity's [Data Pipeline Toolkit for Early-Stage Startups](https://github.com/theventurecity/data-toolkit) contains a comprehensive discussion about what to do once you have a good event log.

## Definitions

**“Events”** are the things that users do in your product, whether it's an app, a website, a marketplace, a store, a platform, a game, or any other configuration. Some events such as purchases, transactions, registrations, referrals, posts, shares, or likes tie directly to the health of your product and should thus be considered **“key events.”**

An **“event log”** is a record of every date/time an event occurs and what user triggered that event. It captures the “who, what, when, where, and how much” of the activity in your product.

## Viewing a simple event log in Pandas

If an event log is tracking one type of key event only, it can contain as few as two columns:

* Unique user identifier (who)
* Event date or date/time (when)

Python's Pandas package is ideal for processing event logs. Once we load the data into memory in the form of a Pandas DataFrame, Pandas has a multitude of functions to help us filter, aggregate, and transform the data into a DataFrames that can help us make insights. Here's what a simple event log looks like if we load it into a Pandas DataFrame:

In [1]:
# Import Pandas
import pandas as pd

# Edit this filename to your local filename.csv if using a local CSV file
filename = 'https://raw.githubusercontent.com/theventurecity/analytics/master/data/SmileCo_transactions.csv'

t = pd.read_csv(filename)
t.tail(10)

Unnamed: 0,user_id,activity_date
1209696,438E84E2-CDD3-4311-BC67-8B726149CFCB,2024-04-29 02:31:37.000
1209697,8CC36A55-4B70-48D6-A67C-16C290D62988,2024-04-29 02:32:50.000
1209698,966294A9-F98E-491F-A5F2-2B07B07B6ED7,2024-04-29 02:33:11.000
1209699,8130537F-9317-48E5-BA62-19766B6A5032,2024-04-29 02:33:50.000
1209700,3AFC060B-B90A-4E5F-B3DF-0FEACA0B0252,2024-04-29 02:34:37.000
1209701,FFA89731-278F-48A0-8433-231E7FD7B2C4,2024-04-29 02:34:59.000
1209702,7983860B-8D92-4DC3-ADC8-8AACD3A110B4,2024-04-29 02:35:03.000
1209703,ffffffff-d707-9c07-0000-000000000000,2024-04-29 02:35:04.000
1209704,16A7BE74-F509-4AB8-B043-11533D8F3B5E,2024-04-29 02:36:33.000
1209705,16A7BE74-F509-4AB8-B043-11533D8F3B5E,2024-04-29 02:36:35.000


## Commonly tracked event attributes

More often, an event log contains additional columns to flesh out what happens each time an event is triggered:

* Event or product type (what)
* Transaction amount and/or fee (how much)
* Geographic location (where)
* Marketplace seller and buyer identifiers (who)
* Marketplace listing identifier (what)

It is also common to augment an event log with information related to the users, such as:

* Marketing acquisition channel (e.g., paid vs. organic)
* Customer segment (e.g., B2C vs. B2B)

The more additional columns you have in your event log, the more flexibility you have to perform segmentation analysis to understand your business in more detail.

The example event log below shows a unique user identifier ("client_id"), event date ("date"), transaction amount ("value_usd"), and customer segment ("segment").

In [2]:
# Edit this filename to your local filename.csv if using a local CSV file
filename = 'https://raw.githubusercontent.com/theventurecity/analytics/master/data/ServBiz_transactions.csv'

t = pd.read_csv(filename)
t.tail(10)

Unnamed: 0,client_id,date,value_usd,segment
420781,27902A,2024-04-30,8.75,Enterprise
420782,34181A,2024-04-30,18.97,SMB
420783,30168A,2024-04-30,17.73,SMB
420784,30844A,2024-04-30,19.98,SMB
420785,35815A,2024-04-30,17.98,SMB
420786,16958A,2024-04-30,17.45,SMB
420787,13090A,2024-04-30,13.48,SMB
420788,19162A,2024-04-30,13.64,Enterprise
420789,28409A,2024-04-30,14.72,SMB
420790,12080A,2024-04-30,18.32,SMB


## Event log sources

Event logs can come from a variety of different sources:

* E-commerce shopping carts all have the ability to query the transactions table or export it into a CSV file
* Many apps track key events as a matter of course, writing them to a production database that can then be replicated and queried for analytics
* Stripe transactions logs can be downloaded from their site; find out more about how to do so [here](https://stripe.com/docs/reporting)
* In addition to enabling analytics within their customer-facing UI's, freemium event tracking services like Mixpanel and Amplitude allow for extracting raw events with their APIs. For scripts on how to do this, visit our [data engineering repository](https://site)
* Segment, a paid service, allows you to write events to any endpoint of your choosing, including cloud data storage for subsequent analysis


## Things to watch out for

* **Rogue events** -- At TheVentureCity, we have encountered some data sets that contained events that were not indicative of actual usage of a product. For example, Mixpanel records emails sent to users--a useful feature for sure. But in those cases, we choose not to count receiving an email--or even opening it--as meaningful usage of the product
* **Payments vs. triggered events** -- It is important to understand what kind of data you are looking at and how it maps to the nature of the product you are analyzing. Sales transactions capture the bottom of the funnel, but there may also be up-funnel events that indicate that users are engaging with and receiving value from the product. Often, mixing multiple event logs can help provide a fuller picture of what is happening in your product
* **Monthly vs. annual payments** -- We have seen SaaS payment transaction logs that mix annual subscription renewals with monthly renewals. This makes it tough to calculate monthly churn; just because a customer doesn't show up in the transaction log for a particular month does not mean they have quit the product. Therefore, in this situation, you may want to standardize all of the annual payments to be spread out as monthly payments. Or, you may want to use user-triggered events to indicate active usage rather than a stream of payment events

Oftentimes, it makes sense to perform some **pre-processing** to clean up your event data prior to feeding it into a data analysis pipeline. This could include filtering out extraneous event types; specifying a date range; or spreading annual payments to a monthly cadence.

## A note on Google Analytics
You might be asking: "Why go to the trouble of analyzing a raw event log when I can just use Google Analytics (GA)?" The answer is that a simple event log gives insights that GA can't.

For sure, GA is an amazing tool. With relatively little instrumentation effort, you can see how many people are coming to your site or app, where they come from, how long they stay, and which pages they visit. You can use it to understand funnel conversion, establish customer segments, measure acquisition channels, track key events, and establish goal conversions. Plus it's free! No wonder it has 83% market share. Most startups should begin by instrumenting with GA as an easy way to track usage and avoid “flying blind.”

**BUT**, GA does not allow for a comprehensive understanding of your startup's business. That's because the free version of GA does not let you analyze your events, conversions, and transactions at a user level (the paid version, Google Analytics 360, does but it costs $150K on up).

## Next Step: Transform from Raw Event Log to "DAU Decorated"
To see what you can do with your event log, visit [Create the DAU Decorated Data Set](https://colab.research.google.com/drive/12uehG2EcIqxcTazKs-pNQRTQSckllOmE).