# Database Query Notebook

This notebook provides a scratchpad for querying the email processing database with Pandas.

In [1]:
import pandas as pd
import sys
import os

# Add the project root to Python path
sys.path.append(os.getcwd())

from email_processing.database.db_manager import EmailDatabaseManager

In [2]:
# Initialize database manager
db_manager = EmailDatabaseManager()
connection = db_manager.get_connection()

print(f"Connected to database: {db_manager.db_path}")

Connected to database: data/email_processing.db


In [23]:
import textwrap

pd.set_option("display.max_colwidth", None)
# Fetch all emails from the emails table
query = "SELECT * FROM categorizations"
categorizations_df = pd.read_sql(query, connection)


# def wrap_column(categorizations_df, col, width=80):
#     categorizations_df[col] = categorizations_df[col].apply(
#         lambda x: "\n".join(textwrap.wrap(str(x), width))
#     )
#     return categorizations_df


# # Usage:
# categorizations_df = wrap_column(categorizations_df, "ai_reasoning", width=80)


print(f"Fetched {len(categorizations_df)} categorizations from database")
categorizations_df.sort_values(by="created_at", ascending=True)[
    ["email_id", "ai_reasoning"]
].head(20)

Fetched 30 categorizations from database


Unnamed: 0,email_id,ai_reasoning
0,1990bb498a57e609,"This is a security alert from Netflix about a new device login. It requires action to verify if the login was authorized and suggests changing the password if it wasn't, making it an actionable security notification."
1,1990bb44ffd9e8bc,This is a newsletter email from Tim Ferriss containing updates about his latest podcast episode and weekly content. It's purely informational and doesn't require any action from the recipient.
2,1990b78f53342fe3,"This is a security alert email from GitHub about Dependabot vulnerability warnings that require attention and updates. Multiple dependencies need to be upgraded to address security vulnerabilities, including high severity issues, which demands action from the repository owner."
3,1990b4558397f0b4,"This is a shipping notification email from Temu informing about order delivery status. It's an automated message containing tracking information and standard policy links, requiring no direct action from the recipient."
4,1990b1a132ca27f0,This is a promotional email from LinkedIn notifying about job openings. It's an automated notification about job market trends and doesn't require any action from the recipient.
5,1990b180383e7498,"This is a notification email from H&M about terms and conditions update ('Uppdatering av villkor'). It contains links to view the email in browser and unsubscribe options, typical of informational marketing communications."
6,1990af6275cda803,"This is clearly a promotional email about a clearance sale (""Lagerrensning"") and lighting week discount offer (15% off) from a retailer. It's a marketing communication that requires no specific action from the recipient."
7,1990aee2541f2915,This is a promotional newsletter from Elite Hotels Rewards advertising hotel discounts and weekend offers. It's purely informational marketing content with no required action from the recipient.
8,1990aca8c0a2ad1c,"This is clearly a newsletter email from GAFFA Sweden, containing subscription information and unsubscribe options. It's purely informational content with no required actions except optional unsubscribe."
9,1990ab7d6d290b51,This appears to be an automated notification email from SEB bank regarding a customer application. No-reply sender addresses typically indicate informational updates rather than requiring action.


In [11]:
# Fetch all emails from the emails table
query = "SELECT * FROM emails"
emails_df = pd.read_sql(query, connection)

print(f"Fetched {len(emails_df)} emails from database")
emails_df.sort_values(by="date", ascending=True)

Fetched 30 emails from database


Unnamed: 0,email_id,date,sender,subject,body_markdown,body_clean,pdf_text,raw_email,category_id
25,19903f72de38e2cf,2025-09-01 00:26:54.000000,Willys <willys@kund.willys.se>,"Varsågod, veckans Willys Plus-erbjudanden!","Varsagod, veckans Willys Plus-erbjudanden! ͏ ...","Varsagod, veckans Willys Plus-erbjudanden! ͏ ...",,"{""id"": ""19903f72de38e2cf"", ""subject"": ""Vars\u0...",
29,199039fb3156b42b,2025-09-01 04:53:41.000000,Billy Oppenheimer <billy@billyoppenheimer.com>,"SIX at 6: Swing Your Swing, The Incorrect Way ...",96 ​ ​ ***************************************...,96 ​ ​ ***************************************...,,"{""id"": ""199039fb3156b42b"", ""subject"": ""SIX at ...",
28,19903aea85a240ee,2025-09-01 05:10:00.000000,Medium Daily Digest <noreply@medium.com>,Claude Code + SubAgents | Gabriel Sena,Stories for Christian Wahlström @christian.wah...,Stories for Christian Wahlström @christian.wah...,,"{""id"": ""19903aea85a240ee"", ""subject"": ""Claude ...",
27,19903c86c586ae2a,2025-09-01 05:38:10.000000,Nordnet <no-reply@mail.nordnet.se>,Experternas 7 favoritaktier just nu,1 september Visa i webbläsare (\nhttp://track....,1 september Visa i webbläsare (\nhttp://track....,,"{""id"": ""19903c86c586ae2a"", ""subject"": ""Experte...",
26,19903e49f272c89c,2025-09-01 06:08:57.000000,"""Hitta.se"" <noreply@hitta.se>","Verifiera Wahlström Eriksson, Christin kostnad...",Gör det enkelt för kunderna att välja er med e...,Gör det enkelt för kunderna att välja er med e...,,"{""id"": ""19903e49f272c89c"", ""subject"": ""Verifie...",
24,1990415001c36374,2025-09-01 07:01:49.000000,Kivra <noreply@notifications.kivra.com>,Ny faktura från Parkman i Sverige AB — betala ...,| | | \n--- \n \n| | \n--- \n \n# N...,| | |\n---\n| |\n---\n# Ny faktura från Par...,,"{""id"": ""1990415001c36374"", ""subject"": ""Ny fakt...",
8,1990563a309aa3db,2025-09-01 07:06:50.000000,Live Nation - Sweden <email@info.livenation.se>,💥 DEF LEPPARD TILL DALHALLA 2026!,https://view.mailing.livenation.com/?qs=ceaa6f...,https://view.mailing.livenation.com/?qs=ceaa6f...,,"{""id"": ""1990563a309aa3db"", ""subject"": ""\ud83d\...",
23,19904199cf7d9364,2025-09-01 07:06:51.000000,"""IMDb.com"" <do-not-reply@imdb.com>",See what's streaming in September,IMDb Streaming Guides Check out our watch guid...,IMDb Streaming Guides Check out our watch guid...,,"{""id"": ""19904199cf7d9364"", ""subject"": ""See wha...",
22,199042c12eaee11c,2025-09-01 07:27:00.000000,Amazon Web Services <aws-marketing-email-repli...,Watch sessions on-demand | AWS Summit Stockhol...,The keynote and 6 sessions are available to wa...,The keynote and 6 sessions are available to wa...,,"{""id"": ""199042c12eaee11c"", ""subject"": ""Watch s...",
21,1990444cdb4e3b38,2025-09-01 07:53:57.000000,LinkedIn Job Alerts <jobalerts-noreply@linkedi...,“project manager”: Ingersoll Rand - Senior IT ...,Your job alert for project manager in Greater ...,Your job alert for project manager in Greater ...,,"{""id"": ""1990444cdb4e3b38"", ""subject"": ""\u201cp...",


In [15]:
mask = emails_df["date"] > "2023-01-01"
emails_df[mask][["date", "sender", "subject", "body_clean"]].head(20)

Unnamed: 0,date,sender,subject,body_clean
0,2025-09-01 20:40:52.000000,Garmin <alerts@account.garmin.com>,Action Required: Download Your Data,| | | | | Data Export Request Complete ...
1,2025-09-01 18:45:22.000000,Google <no-reply@accounts.google.com>,Säkerhetsvarning för wahlstrom.kalle@gmail.com,Det här är en kopia av en säkerhetsvarning som...
2,2025-09-01 17:09:15.000000,Magnus Hoem <magnus@hoem.se>,I morgon är det dags!,Hej! Här kommer en liten påminnelse om morgond...
3,2025-09-01 14:38:41.000000,Autosport <news@e.autosport.com>,Why the drivers' title battle is becoming F1's...,https://www.autosport.com/ [All Series](https:...
4,2025-09-01 14:34:31.000000,Lovable <hi@lovable.dev>,"Lovable Update - student discount, security ce...",Lovable Logo ****************** Lovable Update...
5,2025-09-01 14:33:53.000000,GAFFA Sweden <noreply@gaffa.se>,Vi listar Bob hunds tio bästa låtar,"Hej, Du har fått ett nyhetsbrev från GAFFA Swe..."
6,2025-09-01 13:36:01.000000,Temu <temu@orders.temu.com>,Ditt temu-konto är nu aktivt!,\---------------------------------------------...
7,2025-09-01 15:18:13.000000,GANT <no-reply@message.digitalrecruiters.com>,We received your application!,
8,2025-09-01 07:06:50.000000,Live Nation - Sweden <email@info.livenation.se>,💥 DEF LEPPARD TILL DALHALLA 2026!,https://view.mailing.livenation.com/?qs=ceaa6f...
9,2025-09-01 12:18:57.000000,"""Svenska Golfförbundet"" <no-reply@mingolfutski...",Registrerad HCP-rond 01 september 2025,| | | |\n---\n|\n---\n| | |\n# Registrer...


In [7]:
# Example: Query specific columns only
query = "SELECT email_id, sender, subject, date FROM emails"
emails_summary_df = pd.read_sql(query, connection)

print("Email Summary:")
emails_summary_df.head()

Email Summary:


Unnamed: 0,email_id,sender,subject,date
0,194a8aefa4f1a8cb,Autosport <news@e.autosport.com>,Newey expects F1 2026 to be engine-dominated,2025-01-27 16:53:45.000000
1,194a81de444c5c2d,"""Kjell & Company"" <noreply@medlem.kjell.com>",Din order har skickats! 9639607,2025-01-27 14:15:18.000000
2,194a81d483db0c3b,Ben Thompson <email@stratechery.com>,DeepSeek FAQ (Stratechery Article 1-27-2025),2025-01-27 14:14:38.000000
3,194a8095607c6b96,Dropbox <no-reply@em-s.dropbox.com>,"Christian, your storage space if filling up - ...",2025-01-27 13:52:51.000000
4,194a7e7f3dde141d,"""Kjell & Company"" <noreply@medlem.kjell.com>",Din order har skickats! 9639607,2025-01-27 13:16:22.000000


In [None]:
# Custom query space - modify as needed
custom_query = """
SELECT 
    email_id,
    sender,
    subject,
    date,
    category_id
FROM emails 
WHERE sender LIKE '%@%'
ORDER BY date DESC
LIMIT 10
"""

result_df = pd.read_sql(custom_query, connection)
print("Custom Query Results:")
result_df