# Automated Marketing Lead Screening Pipelines using Snowflake Cortex
Efficiently managing and qualifying marketing leads is crucial for business growth, but manually processing large volumes of form submissions can be time-consuming and inconsistent. In this notebook, you'll use Cortex AISQL to systematically analyze, cleanse, and enrich incoming marketing leads captured from web forms.

### Context
This is an example of how Snowflake's own marketing analytics team leverages Snowflake's Cortex AISQL to automate lead qualification processes. The system processes form submissions from marketing events and webinars, filtering out spam content, scoring leads based on their fit with the Ideal Customer Profile (ICP), and categorizing them by job seniority.

In this notebook, we will leverage multiple AISQL functions to build an intelligent lead management workflow that:
- Filters out spam and irrelevant submissions
- Scores lead quality based on ICP alignment
- Categorizes leads by seniority level
- Generates personalized outreach content

The primary business value is creating a more efficient and intelligent lead management workflow that saves manual effort, enables sales and marketing teams to prioritize high-potential prospects, and allows for personalized outreach at scale.

##

In [None]:
import pandas as pd
import altair as alt
import numpy as np
from snowflake.snowpark.context import get_active_session


# Get the active Snowflake session
session = get_active_session()

session.query_tag = {"origin":"sf_sit-is", "name":"AISQL-HOL", "version":{"major":1, "minor":0}, "attributes":{"is_quickstart":3, "source":"notebook"}}
print(session)

In [None]:
-- Set the correct schema context for accessing data
-- Data assets are in the default schema, not the notebooks schema
USE SCHEMA DEFAULT_SCHEMA;

#### Overview of `marketing_form_data` table

In [None]:
SELECT * FROM marketing_form_data LIMIT 10;

## Filter out spam and irrelevant submissions

As a first step in our lead qualification process, we need to filter out spam, test entries, and irrelevant form submissions. We'll use the `AI_FILTER` function to automatically identify legitimate business leads from valid personas with companies, excluding junk data, spam, jokes, or student submissions.

In [None]:
# 🤖 LEAD ANALYSIS DASHBOARD
# Visualize AISQL function outputs for comprehensive lead intelligence

# Load AI_FILTER results (spam detection)
ai_filter_query = """
CREATE OR REPLACE TEMP TABLE FILTERED_RESULTS AS
SELECT 
    person_id,
    first_name,
    last_name,
    title,
    company,
    AI_FILTER(
        'We are a B2B SAAS company, and this is the text entered to a demand gen form on our website to register for a marketing event. Please confirm if this is a form fill that appears to be legitimate data from a valid persona with a company, without any junk, spam, scams, jokes, or nonsensical entries that cannot be worked by sales. Please do not include students:' 
        || CONCAT_WS(' ',
            'First Name:', first_name,
            'Last Name:', last_name,
            'Job Title:', title,
            'Company:', company
        )
    ) as is_legitimate_lead
FROM marketing_form_data
WHERE person_id IS NOT NULL
"""

# Execute AI_FILTER analysis
session.sql(ai_filter_query).collect()

# Get AI_FILTER results for visualization
filter_results_query = """
SELECT 
    person_id,
    first_name,
    last_name,
    title,
    company,
    is_legitimate_lead,
    CASE WHEN is_legitimate_lead THEN 'AI-Validated Clean' ELSE 'AI-Detected Spam' END as ai_data_quality
FROM FILTERED_RESULTS
"""

df_filter = session.sql(filter_results_query).to_pandas()

print(f"AI_FILTER Analysis")
print(f"✅ Clean leads: {len(df_filter[df_filter['IS_LEGITIMATE_LEAD'] == True])} ({len(df_filter[df_filter['IS_LEGITIMATE_LEAD'] == True])/len(df_filter)*100:.1f}%)")
print(f"🚫 Spam detected: {len(df_filter[df_filter['IS_LEGITIMATE_LEAD'] == False])} ({len(df_filter[df_filter['IS_LEGITIMATE_LEAD'] == False])/len(df_filter)*100:.1f}%)")

# AI FILTER Results Visualization
ai_filter_chart = alt.Chart(df_filter).mark_arc(innerRadius=60, outerRadius=120).encode(
    theta=alt.Theta(field="AI_DATA_QUALITY", type="nominal", aggregate="count"),
    color=alt.Color(
        field="AI_DATA_QUALITY", 
        type="nominal",
        scale=alt.Scale(domain=['AI-Validated Clean', 'AI-Detected Spam'], 
                       range=['#28a745', '#dc3545']),
        legend=alt.Legend(title="AI_FILTER Results", orient="bottom")
    ),
    tooltip=['AI_DATA_QUALITY:N', alt.Tooltip('count():Q', title='Count')]
).properties(
    title=alt.Title("🤖 AI_FILTER: Spam Detection Results", fontSize=14, anchor='start'),
    width=250,
    height=250
)

ai_filter_chart
df_filter.head()

## Score leads based on Ideal Customer Profile (ICP)

Now, let's classify each filtered lead into quality categories: "High", "Medium", "Low", or "Poor" based on their alignment with our Ideal Customer Profile. We'll use `AI_CLASSIFY` to systematically evaluate leads based on their decision-making role, company characteristics, and business need for data and AI services.

In [None]:
# 🎯 AI_CLASSIFY RESULTS: Lead Quality & Seniority Analysis
# Visualize AI-powered lead classification outputs

# Load AI_CLASSIFY results for lead quality
quality_classify_query = """
CREATE OR REPLACE TEMP TABLE MARKETING_LEADS_W_QUALITY AS
SELECT 
    person_id,
    first_name,
    last_name,
    title,
    company,
    AI_CLASSIFY(
        CONCAT_WS(' ',
            'Job Title:', title,
            'Company:', company
        ),
        [
            {
                'label': 'High',
                'description': 'The lead has a decision-making role, relevant persona, and comes from a company well-aligned with B2B SAAS target industries and size. They must have a clear business need for data and AI services.'
            },
            {
                'label': 'Medium',
                'description': 'The lead is a good fit but may lack full decision-making authority or strong company characteristics. However, they should have influence or future potential.'
            },
            {
                'label': 'Low',
                'description': 'The lead has minimal alignment, lacks decision-making power, or is from a less relevant company. The persona or company factors do not strongly align with B2B SAAS ICP.'
            },
            {
                'label': 'Poor',
                'description': 'The lead has no alignment with B2B SAAS ICP. Their role and company characteristics are irrelevant. A form fill that appears to contain test accounts, junk, spam, scams, jokes, or nonsensical entries should be labeled as poor.'
            }
        ],
        {
            'task_description': 'We are a B2B SAAS company. Return a classification for the Ideal Customer Profile of this lead based on the text entered to a demand gen form on our website to register for a marketing event.'
        }
    ) AS classification_output_raw
FROM FILTERED_RESULTS
WHERE is_legitimate_lead = TRUE
"""

# Execute AI_CLASSIFY quality analysis
session.sql(quality_classify_query).collect()

# Get quality results
quality_results_query = """
SELECT 
    person_id,
    title,
    company,
    classification_output_raw:labels[0]::TEXT AS ai_lead_quality
FROM MARKETING_LEADS_W_QUALITY
WHERE classification_output_raw:labels[0]::TEXT IS NOT NULL
"""

df_quality = session.sql(quality_results_query).to_pandas()

print(f"🎯 AI_CLASSIFY Lead Quality Analysis Complete!")
quality_dist = df_quality['AI_LEAD_QUALITY'].value_counts()
for quality, count in quality_dist.items():
    print(f"   {quality}: {count} leads ({count/len(df_quality)*100:.1f}%)")

# AI_CLASSIFY Quality Results Visualization
quality_chart = alt.Chart(df_quality).mark_bar().encode(
    x=alt.X('count():Q', title='Number of Leads'),
    y=alt.Y('AI_LEAD_QUALITY:N', title='AI-Classified Quality', 
            sort=['High', 'Medium', 'Low', 'Poor']),
    color=alt.Color(
        'AI_LEAD_QUALITY:N',
        scale=alt.Scale(domain=['High', 'Medium', 'Low', 'Poor'],
                       range=['#28a745', '#ffc107', '#fd7e14', '#dc3545']),
        legend=None
    ),
    tooltip=['AI_LEAD_QUALITY:N', 'count():Q']
).properties(
    title=alt.Title("🎯 AI_CLASSIFY: Lead Quality Distribution", fontSize=14, anchor='start'),
    width=400,
    height=250
)

quality_chart


## Categorize leads by job seniority

To further enhance our lead qualification, we can classify leads based on their job seniority level. This helps sales teams understand the decision-making authority and tailor their approach accordingly. We'll use `AI_CLASSIFY` to categorize job titles into seniority groups ranging from C-level executives to individual contributors.

In [None]:
CREATE OR REPLACE TEMP TABLE MARKETING_LEADS_W_SENIORITY AS
SELECT 
  d.*,
  AI_CLASSIFY(
    CONCAT_WS(' ', 'Job Title:', d.title),
    [
      {
        'label': 'CXO',
        'description': 'Any title that is a company C-level executive or founder.'
      },
      {
        'label': 'VP+',
        'description': 'Any title that is a company executive below C-level like a VP, President, or managing director.'
      },
      {
        'label': 'Director',
        'description': 'Any title that is director level or head of a department.'
      },
      {
        'label': 'Manager',
        'description': 'Any title that relates to managers or team leads.'
      },
      {
        'label': 'IC',
        'description': 'Any title that relates to an individual contributor.'
      },
      {
        'label': 'Junk',
        'description': 'Any title seems like junk, spam, scams, jokes, or nonsensical entries that cannot be worked by sales'
      }
    ],
    {
      'task_description': 'We are a B2B SAAS company. Use this data to classify job titles into seniority groupings. Consider all parts of the title and be careful of how the meaning changes based on parentheses or other punctuation.'
    }
  ) AS seniority_classification_raw,
  seniority_classification_raw:labels[0]::TEXT AS seniority_level,
  CLASSIFICATION_OUTPUT_RAW:labels[0]::TEXT AS leads_quality
FROM MARKETING_LEADS_W_QUALITY d;

SELECT * FROM MARKETING_LEADS_W_SENIORITY LIMIT 10;

## Step 5: Generate personalized outreach content

For high-quality leads, we can leverage company information to create personalized outreach messages. By joining lead data with company insights, we can use `AI_COMPLETE` to draft customized emails that reference specific company details and use cases, significantly improving response rates.

In [None]:
WITH high_quality_leads AS (
  SELECT 
    ml.*,
    ci.description AS company_description,
  FROM MARKETING_LEADS_W_SENIORITY ml
  LEFT JOIN company_info ci ON ml.company = ci.company
  WHERE ml.leads_quality = 'High'
)
SELECT 
  first_name,
  last_name,
  title,
  company,
  leads_quality,
  seniority_level,
  AI_COMPLETE(
    'llama4-maverick',
    PROMPT(
      'We are a B2B SAAS company focusing on data and analytics platform. This is a form filled by our potential prospect. Can you help me draft an outreach email to ask if they have any interest in how data and analytics might supercharge their business. Please only return the email draft and nothing else. Here is the filled form: {0}; Here is the company info: {1}',
      CONCAT_WS(' ',
        'First Name:', first_name,
        'Last Name:', last_name,
        'Job Title:', title,
        'Company:', company
      ),
      COALESCE(company_description, '')
    )
  ) AS personalized_email
FROM high_quality_leads
LIMIT 5;

In [None]:

#Create comprehensive analysis combining all AISQL outputs
comprehensive_query = """
CREATE OR REPLACE TEMP TABLE COMPREHENSIVE_ANALYSIS AS
SELECT 
    fr.person_id,
    fr.first_name,
    fr.last_name,
    fr.title,
    fr.company,
    fr.is_legitimate_lead,
    qr.classification_output_raw:labels[0]::TEXT AS ai_lead_quality,
    sr.seniority_classification_raw:labels[0]::TEXT AS ai_seniority_level
FROM FILTERED_RESULTS fr
LEFT JOIN MARKETING_LEADS_W_QUALITY qr ON fr.person_id = qr.person_id
LEFT JOIN MARKETING_LEADS_W_SENIORITY sr ON fr.person_id = sr.person_id  
WHERE fr.is_legitimate_lead = TRUE
"""

# Execute comprehensive analysis
session.sql(comprehensive_query).collect()

# Get comprehensive results
comprehensive_results_query = """
SELECT 
    person_id,
    title,
    company,
    ai_lead_quality,
    ai_seniority_level,
    -- Create a composite score based on AI classifications
    CASE 
        WHEN ai_lead_quality = 'High' AND ai_seniority_level IN ('CXO', 'VP+') THEN 'Hot Lead'
        WHEN ai_lead_quality IN ('High', 'Medium') AND ai_seniority_level IN ('CXO', 'VP+', 'Director') THEN 'Warm Lead'
        WHEN ai_lead_quality IN ('Medium', 'Low') AND ai_seniority_level IN ('Manager', 'IC') THEN 'Cold Lead'
        ELSE 'Low Priority'
    END as lead_temperature
FROM COMPREHENSIVE_ANALYSIS
WHERE ai_lead_quality IS NOT NULL 
AND ai_seniority_level IS NOT NULL
"""

df_comprehensive = session.sql(comprehensive_results_query).to_pandas()

print(f"📊 Comprehensive AISQL Analysis Complete!")
print(f"🔥 Hot Leads: {len(df_comprehensive[df_comprehensive['LEAD_TEMPERATURE'] == 'Hot Lead'])}")
print(f"🌡️ Warm Leads: {len(df_comprehensive[df_comprehensive['LEAD_TEMPERATURE'] == 'Warm Lead'])}")
print(f"❄️ Cold Leads: {len(df_comprehensive[df_comprehensive['LEAD_TEMPERATURE'] == 'Cold Lead'])}")
print(f"📉 Low Priority: {len(df_comprehensive[df_comprehensive['LEAD_TEMPERATURE'] == 'Low Priority'])}")

# Lead Temperature Distribution
temp_chart = alt.Chart(df_comprehensive).mark_arc(innerRadius=50).encode(
    theta=alt.Theta(field="LEAD_TEMPERATURE", type="nominal", aggregate="count"),
    color=alt.Color(
        field="LEAD_TEMPERATURE", 
        type="nominal",
        scale=alt.Scale(domain=['Hot Lead', 'Warm Lead', 'Cold Lead', 'Low Priority'],
                       range=['#dc3545', '#fd7e14', '#17a2b8', '#6c757d']),
        legend=alt.Legend(title="Lead Temperature", orient="bottom")
    ),
    tooltip=['LEAD_TEMPERATURE:N', alt.Tooltip('count():Q', title='Count')]
).properties(
    title=alt.Title("🔥 AI-Powered Lead Temperature Classification", fontSize=14, anchor='start'),
    width=300,
    height=250
)

# Quality vs Seniority Heatmap
heatmap_data = df_comprehensive.groupby(['AI_LEAD_QUALITY', 'AI_SENIORITY_LEVEL']).size().reset_index(name='count')

heatmap_chart = alt.Chart(heatmap_data).mark_rect().encode(
    x=alt.X('AI_SENIORITY_LEVEL:N', title='AI-Classified Seniority'),
    y=alt.Y('AI_LEAD_QUALITY:N', title='AI-Classified Quality'),
    color=alt.Color('count:Q', scale=alt.Scale(scheme='viridis'), legend=alt.Legend(title="Lead Count")),
    tooltip=['AI_LEAD_QUALITY:N', 'AI_SENIORITY_LEVEL:N', 'count:Q']
).properties(
    title=alt.Title("🎯 Quality vs. Seniority Heatmap", fontSize=14, anchor='start'),
    width=350,
    height=250
)

# Company Opportunity Leaderboard
company_summary_query = """
WITH enriched_analysis AS (
    SELECT 
        person_id,
        title,
        company,
        ai_lead_quality,
        ai_seniority_level,
        CASE 
            WHEN ai_lead_quality = 'High' AND ai_seniority_level IN ('CXO', 'VP+') THEN 'Hot Lead'
            WHEN ai_lead_quality IN ('High', 'Medium') AND ai_seniority_level IN ('CXO', 'VP+', 'Director') THEN 'Warm Lead'
            WHEN ai_lead_quality IN ('Medium', 'Low') AND ai_seniority_level IN ('Manager', 'IC') THEN 'Cold Lead'
            ELSE 'Low Priority'
        END AS lead_temperature
    FROM COMPREHENSIVE_ANALYSIS
    WHERE ai_lead_quality IS NOT NULL 
      AND ai_seniority_level IS NOT NULL
)

-- Step 2: Summarize by company using the enriched data
SELECT 
    company,
    COUNT(*) AS total_leads,
    COUNT(CASE WHEN lead_temperature = 'Hot Lead' THEN 1 END) AS hot_leads,
    COUNT(CASE WHEN lead_temperature = 'Warm Lead' THEN 1 END) AS warm_leads,
    COUNT(CASE WHEN lead_temperature = 'Cold Lead' THEN 1 END) AS cold_leads,
    COUNT(CASE WHEN lead_temperature = 'Low Priority' THEN 1 END) AS low_priority_leads
FROM enriched_analysis
GROUP BY company
ORDER BY hot_leads DESC, warm_leads DESC;

"""

df_company_summary = session.sql(company_summary_query).to_pandas()



# Final comprehensive dashboard
final_dashboard = alt.vconcat(
    alt.hconcat(temp_chart, heatmap_chart),
    
    title=alt.Title("📊 Comprehensive AISQL Marketing Intelligence Dashboard", fontSize=16, anchor='start')
).resolve_scale(color='independent')

final_dashboard


## Key Takeaways

* **End-to-End Automation**: You can chain Cortex AI functions together (`AI_FILTER` -> `AI_CLASSIFY` -> `AI_COMPLETE`) to build a comprehensive lead qualification pipeline entirely within Snowflake.
* **Intelligent Filtering**: AI-powered spam detection eliminates manual review of irrelevant submissions, ensuring your sales team focuses on legitimate prospects.
* **ICP Scoring**: Automated lead scoring based on Ideal Customer Profile criteria helps prioritize sales efforts on the highest-potential opportunities.
* **Personalized Outreach**: By combining lead data with company intelligence, you can generate customized messaging that significantly improves response rates.
* **Scalable Process**: This automated workflow can handle thousands of leads while maintaining consistency and quality in qualification decisions.

## Additional Resources

* [Documentation: Cortex AI SQL Functions](https://docs.snowflake.com/en/user-guide/snowflake-cortex/aisql)