### Users Table
- `id`: Unique identifier for each user (numeric)
- `created_at`: User creation timestamp (ISO date string)
- `attribution_source`: User acquisition source (tiktok, instagram, or organic)
- `country`: User's country (US, TR, or NL)
- `name`: User's name

### User Events Table
- `id`: Unique event identifier (numeric)
- `created_at`: Event timestamp (ISO date string)
- `user_id`: Reference to users table (numeric)
- `event_name`: Type of event (app_install, trial_started, trial_cancelled, subscription_started, subscription_renewed, subscription_cancelled)
- `amount_usd`: Transaction amount in USD (numeric)

In [1]:
import pandas as pd
import numpy as np
import sqlite3
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

In [5]:
try:
    conn = sqlite3.connect('papcorns.sqlite')
except Exception as e:
    print(e)

In [7]:
users_df = pd.read_sql_query("SELECT * FROM users LIMIT 5;", conn)
print("Users table preview:")
display(users_df)

# Load events table
events_df = pd.read_sql_query("SELECT * FROM user_events LIMIT 5;", conn)
print("\nUser events table preview:")
display(events_df)

Users table preview:


Unnamed: 0,id,created_at,attribution_source,country,name
0,1,2024-05-07T00:00:00,instagram,US,Eve Brown
1,2,2024-10-12T00:00:00,instagram,NL,Frank Moore
2,3,2024-10-15T00:00:00,tiktok,TR,Ivy Anderson
3,4,2024-08-28T00:00:00,tiktok,TR,Alice Brown
4,5,2024-04-03T00:00:00,organic,NL,Bob Moore



User events table preview:


Unnamed: 0,id,created_at,user_id,event_name,amount_usd
0,1,2024-05-07T00:00:00,1,app_install,
1,2,2024-05-12T00:00:00,1,trial_started,
2,3,2024-05-24T00:00:00,1,trial_cancelled,
3,4,2024-10-12T00:00:00,2,app_install,
4,5,2024-10-13T00:00:00,2,trial_started,


Let's start with checking missing values

In [16]:
users_df = pd.read_sql_query("SELECT * FROM users", conn)
users_df.isna().any()

id                    False
created_at            False
attribution_source    False
country               False
name                  False
dtype: bool

In [18]:
events_df = pd.read_sql_query("SELECT * From user_events ", conn)
users_df.isna().any()

id                    False
created_at            False
attribution_source    False
country               False
name                  False
dtype: bool

There aren't any missing values present in either table