# How to think like a data scientist

1. **Clean the data**. Do not assume data is clean
2. **Normalize**. Let's say you're making a list of popular wedding destinations. You could count the number of people flying in for a wedding, but unless you consider the total number of air travellers coming to that city as well, you'll just get a list of cities with busy airports.
3. **Consider Outliers**. Excluding outliers can be a mistake, as well as just including them without a doubt. Outliers can hold important information (e.g. customers that use your product much mor often than the average → leads to **qualitative insights**). But they are not good for **building a model**.
4. **Consider Seasonality**
5. **Consider Context**. Do not ignore size when reporting growth. E.g. when you are just starting your product, technically, your dad signing up count as doubling your user base."
6. **Avoid data vomit**. A dashboard is not of much use, if you do not know where to look.
7. **Avoid metrics that cry wolf**. If you set alarms with a too low threshold, they get sensitive and "whiny", you'll start to ignore them.
8. **Combine data from other sources**. Do not develop the "not collected here" syndrome and gain insights from mashing up data.
9. **Do not focus on noise**. We as humans see patterns everywhere, because we are "hardwired" like that. Do not develop **vanity metrics**, step back, and look at the bigger picture.

# OMTM - One Metric That Matters
> OMTM = a temporary focus metric. The **single most important metric** you should focus on **right now**, given your stage of growth, your business model, and your immediate goals.

## Example
1. Example for a product company looking into **new feature adoption**
* **Context**: You release a new collaboration feature in your SaaS tool.
* **OMTM**: “% of active users who try the new feature within 30 days of release.”
* **Why?**
  * It directly measures adoption.
  * If adoption is low, it triggers deeper questions: awareness? UX friction? Lack of value?

2. Example for a company rolling out features consistently across products
* This is trickier (more meta) but here’s a good OMTM-style approach:
* **Context**: You have multiple product lines (say, a suite like Microsoft 365). You want to measure not just adoption of one feature, but how well your teams are consistently rolling out and integrating features across the suite.
* **OMTM Idea**: “Median time (days) from feature release to 50% adoption across all products.”
* **Why?**
    * Captures both speed of rollout and depth of adoption.
    * Easy to track over time — you want that number to go down.
    * Forces teams to think about cross-product consistency and communication effectiveness.

**The idea is**: instead of drowning in dashboards, focus your team around one number that drives learning and decision-making.

**In larger companies**, it’s often adapted into **North Star Metrics or Key Results (OKRs)**, because focusing on just one number is sometimes too narrow.

# 5 Stages of Lean Analytics

Think about: What stage are we at?

1. **EMPATHY**: Identify a real problem and a real solution. It addresses the riskiest question: Will anyone care?
   * Your job is to get inside someone else's head.
   * Get out of the building → qualitative feedback.
   * **Outcome**:
     * You have conducted enough qualitative interviews to feel **_confident that you've found a problem worth solving_**.
     * You understand your customers well enough
     * You believe your **_solution will meet the needs of you customers_**
2. **STICKINESS**: Will the dogs eat the dog food? Make your mistakes with a small, friendly audience before throwing the masses at it.
   * Focus is squarely on **retention and engagement**. 
   * daily/weekly/monthly active users; segment **metrics by cohort**
   * when do users become inactive; how to reactivate; features engaged users spend time with
   * Remember: the **MVP is a process**, not a product (instead it is a tool for figuring out which product to build). Improve and iterate with focus on your core metrics.
   * **Goal**:
     * Retention
3. **VIRALITY**: Sharing helps grow, but also verifies that what you have made is good.
   * 3 types: inherent, artifical, word of mouth
   * What's your viral coefficient?
4. **REVENUE**: Will people open their pocketbooks? Can you charge them enough to fund your ongoing operation?
   * The core equation for Revenue: money a customer brings in minus the cost of acquiring that customer.
   * You are moving from proving you have the right product to proving you have a real business.
   * **Goal**:
     * Figure out where to focus: more revenue per customer; more customers; more efficiencies; greater frequency etc.
5. **SCALE**: You need channels to amortize the cost of sales and distribution. You need a ecosystem to cross the "hole in the middle" from niche player to bug company.
   * How to scale (M. Porter): a) niche market → **segmentation** strategy, b) Focus on being efficient -> **cost** strategy, or c) be unique -> **differentiation** strategy
   * **Goal**: **THE THREE THREES** - a simple way of focusing on metrics that gives you the ability to change while avoiding **management-by-opinion**.
     * **_3 big, fundamental assumptions_** (e.g. we will make money from parents)
     * **_3 actions to take_** (for each board level assumption take 3 tactical actions, e.g. product enhancements, marketing strategies)
     * **_3 experiments to run_** (for each of the actions, perform 3 tasks, e.g. 3 experiments you are running and how to choose the winner)

# Map of (typical) KPI by Business Goal
## Terminology
* **Measure**: Change we observe
* **Metric**: Measure we track over time
* **KPI**: Important Metrics
* **Analytics**: Measures that computer track (subset of our Metrics, not too seldom not helpful)

* **Leading indicator**
  * Lagging indicators provide insight into past outcomes and are typically revenue-related and standard across companies.
  * E.g. Monthly recurring revenue, Average revenue per user/unit (ARPU), Net revenue retention (NRR), Gross revenue retention (GRR)
* **Lagging indicator**
  * Leading indicators foretell the likelihood of future success, and the ones you track depend on your goals, business model, and product type.
  * E.g. Session duration, Number of sessions per user, Activation rate 

## ▶︎ Growth & Marketing
| KPI                       | What it Tells You                                | Formula / Measurement                          |
|----------------------------|--------------------------------------------------|-----------------------------------------------|
| Customer Acquisition Cost (CAC) | Efficiency of acquiring new customers.          | Marketing + Sales Spend ÷ New Customers        |
| Conversion Rate            | How well leads/visitors turn into customers.     | Conversions ÷ Visitors (or Leads)              |
| Activation Rate            | How many new users reach first meaningful use.   | Activated Users ÷ New Signups                  |
| Uplift / Incremental Lift  | Impact of campaigns/features (A/B tests).        | ΔConversion (Treatment − Control)              |
| Virality Coefficient / k-factor | Measures how effectively users of a product spread it to new users through referrals or sharing.        | $k = \frac{\text{Total invitations sent}}{\text{Total existing users}} \times \frac{\text{New signups from invites}}{\text{Total invitations sent}} → k = \frac{\text{New signups from invites}}{\text{Total existing users}}$              |


## ▶︎ Product & Engagement
| KPI                        | What it Tells You                                | Formula / Measurement                          |
|-----------------------------|--------------------------------------------------|-----------------------------------------------|
| DAU / MAU                  | Usage level (daily / monthly activity).          | Unique Active Users per Day/Month              |
| DAU/MAU Ratio (Stickiness) | How often monthly users return daily.            | DAU ÷ MAU                                     |
| Time to Value (TTV)        | How quickly users see product value.             | Time between signup and first value action     |
| Cohort Retention           | Long-term engagement by user cohorts.            | Retention % by months since signup             |


## ▶︎ Financial Performance
| KPI                        | What it Tells You                                | Formula / Measurement                          |
|-----------------------------|--------------------------------------------------|-----------------------------------------------|
| Revenue                    | Total income generated.                          | Sum of transactions                           |
| ARPU (Avg. Revenue per User)| Monetization efficiency per user.                | Revenue ÷ Active Users                        |
| MRR / ARR                  | Predictable subscription revenue.                | Monthly / Annual recurring subscription fees   |
| Gross Margin               | Profitability after direct costs.                | (Revenue − COGS) ÷ Revenue                    |
| Contribution Margin        | Unit economics after variable costs.             | (Revenue − Variable Costs) ÷ Revenue          |


## ▶︎ Customer Experience & Loyalty
| KPI                        | What it Tells You                                | Formula / Measurement                          |
|-----------------------------|--------------------------------------------------|-----------------------------------------------|
| Churn Rate (e.g. weekly, monthly, yearly etc)                 | % of customers leaving.                          | $\text{Churn Rate} = \frac{\text{Customers lost during the period}}{\text{Customers at the start of the period}}$            |
| Retention Rate             | % of customers staying.                          | 1 − Churn Rate                                |
| Customer Lifetime Value (CLV)| Long-term value of a customer.                   | ARPU × Avg. Lifetime (months/years)           |
| Net Promoter Score (NPS)   | Loyalty & advocacy (likelihood to recommend).    | %Promoters − %Detractors (survey 0–10 scale)  |
| Customer Satisfaction (CSAT)| Satisfaction with a specific experience (e.g., support call, checkout). | Avg. satisfaction rating ÷ Max rating (survey) |
| Customer Effort Score (CES)| How easy it was for the customer to achieve their goal. | Avg. ease-of-use score from survey (1–5 or 1–7 scale) |

# Business Terms for Product Companies

## Revenue Metrics

| Term                     | Definition                                                                 | Why It Matters in Product Companies                   | Example (LinkedIn / Adobe) |
|--------------------------|-----------------------------------------------------------------------------|-------------------------------------------------------|-----------------------------|
| **Revenue**              | Total money earned from sales in a given period.                           | Shows top-line growth. For SaaS, split into recurring (subscriptions) and non-recurring (ads, services). | Adobe Creative Cloud subscriptions = **$60M revenue** in Q1. |
| **Recurring Revenue (MRR/ARR)** | Predictable income from subscriptions (Monthly/Annual).                       | SaaS lifeblood; tells how stable the business is.     | LinkedIn Premium MRR = **$50 per user × 2M users = $100M/month**. |
| **Deferred Revenue**     | Cash collected for services not yet delivered.                              | Critical in SaaS, as subscriptions are billed upfront but delivered over time. | Adobe bills $600 for a yearly plan upfront → booked as **deferred revenue**, recognized monthly. |
| **ARPU (Avg. Revenue per User)** | Average revenue per active user in a period.                           | Helps compare product lines or geos; used in SaaS and ad models. | LinkedIn: Revenue $1B ÷ 200M users = **$5 ARPU**. |
| **CLV (Customer Lifetime Value)** | Expected total revenue from a customer over their lifetime.              | Guides how much you can spend to acquire users (CAC).  | Adobe: $50/month × 36 months = **$1,800 CLV**. |
| **CAC (Customer Acquisition Cost)** | Cost to acquire one paying customer.                                 | Critical for growth efficiency. CAC < CLV is healthy. | Adobe spends $180M to get 100k customers → CAC = **$1,800 each**. |

## Profitability Metrics
| Term                     | Definition                                                                 | Why It Matters in Product Companies                   | Example (LinkedIn / Adobe) |
|--------------------------|-----------------------------------------------------------------------------|-------------------------------------------------------|-----------------------------|
| **Gross Profit**         | Revenue minus Cost of Goods Sold (COGS: servers, hosting, delivery costs). | Shows core profitability of product delivery.          | Adobe earns $100, spends $20 on AWS → Gross Profit = **$80**. |
| **Gross Margin**         | Gross Profit ÷ Revenue (percentage).                                      | Key SaaS efficiency metric; high margins (>70%) are expected. | LinkedIn ads generate $200M with $40M infra cost → Margin = **80%**. |
| **Contribution Margin**  | Revenue − variable costs (before fixed costs like R&D).                   | Useful for unit economics and product profitability.  | Adobe sells 1 license $50, costs $5 to deliver → CM = **$45**. |
| **Operating Profit (EBIT)** | Profit after operating expenses (marketing, R&D, salaries) but before taxes/interest. | Shows how efficiently the company runs operations.     | Adobe: Revenue $1B − OPEX $700M → Operating Profit = **$300M**. |
| **EBITDA**               | Earnings before Interest, Taxes, Depreciation, Amortization.               | Strips out non-cash (depreciation/amortization) and financing/tax effects. Used for comparing companies’ operating performance. | Adobe: Operating profit $300M + Depreciation $50M + Amortization $30M → **EBITDA = $380M**. |
| **Net Profit**           | Final profit after taxes, interest, and one-off costs.                      | What’s left for shareholders.                         | LinkedIn: $1B revenue − all costs/taxes = **$150M net profit**. |
| **Free Cash Flow**       | Cash generated after operating expenses and capital expenditures.       | True measure of financial flexibility.                | LinkedIn makes $200M, invests $50M in infra → FCF = **$150M**. |