Skip to content

vinimabreu/lead-qualifier

Repository files navigation

lead-qualifier

CI Python License: MIT Runtime deps

Qualify scraped leads, then measure the qualifier. Point it at a noisy scraped list and it returns kept vs dropped, each lead carrying a 0-100 relevance score and a short reason. Then it does the part most lead-gen tooling skips: it tells you how good the qualifier actually is, with precision, recall, f1, and accuracy against a labeled set.

A scraper hands you everything that matched a search. The business question is narrower: which of these are real, relevant leads? A page that says "veneers all inclusive" might be a cosmetic dentist, or it might be selling toothpaste. lead-qualifier is the small, reliable layer that makes that call, explains it, and lets you prove on real data that it is making the call well.

Zero runtime dependencies. The whole core runs on the Python standard library. The LLM is injected as a plain Callable[[str], str], so the package never imports any provider SDK.


How it works

flowchart LR
    S["Scraped leads<br/>json / csv / list"] --> Q{Scorer}
    C[["Criteria<br/>include · exclude<br/>required · threshold"]] -.-> Q
    Q -->|"RuleScorer<br/>(deterministic)"| V[Verdicts]
    Q -->|"LLMScorer<br/>(injected callable)"| V
    V --> K["Kept<br/>score · reason"]
    V --> D["Dropped<br/>score · reason"]
    L[["Labeled set<br/>gold keep/drop"]] -.-> E
    Q --> E{"evaluate()"}
    E --> R["precision · recall<br/>f1 · accuracy"]
Loading

Every lead gets exactly one Verdict: a clamped 0-100 score, a keep/drop decision, the criteria terms that fired, and a one-line reason. The same criteria that drive qualification also drive the eval harness, so the metrics measure the thing you actually run in production, not a proxy.


Quickstart

git clone https://github.com/vinimabreu/lead-qualifier
cd lead-qualifier
make install      # creates .venv and installs the package + dev tools
make demo         # runs the offline demo below

The demo has two acts. First it qualifies a noisy list of cosmetic-dentistry leads that includes traps a naive keyword filter gets wrong. Then it measures the qualifier against a labeled set with harder cases, so the metrics are honest instead of a suspicious 1.0:

=== Act 1: qualify a noisy scraped list ===

5 kept, 5 dropped (of 10) against 'cosmetic-dentistry'

KEPT:
  + [100] Bright Smile Dental - Porcelain Veneers & Cosmetic Dentistry
        matched veneers, cosmetic, dentist, smile; score 100 vs threshold 50 -> kept.
  + [100] Lakeside Implant & Cosmetic Dentistry Clinic
        matched veneers, cosmetic, dentist, implant, smile; score 100 vs threshold 50 -> kept.
  + [100] Downtown Smile Studio - Veneers, Implants, Invisalign
        matched veneers, cosmetic, dentist, implant, smile; score 100 vs threshold 50 -> kept.
  + [ 75] Family Dental Care of Riverton
        matched cosmetic, dentist, smile; score 75 vs threshold 50 -> kept.
  + [100] Premier Veneers of Hill Country
        matched veneers, cosmetic, dentist, implant; score 100 vs threshold 50 -> kept.

DROPPED:
  - [  0] WhiteGlow Veneers Whitening Toothpaste, 6-Pack
        Hard drop: matched excluded term(s): toothpaste, amazon.
  - [  0] Veneers (dental restoration) - Wikipedia
        Hard drop: matched excluded term(s): wikipedia.
  - [  0] Now Hiring: Cosmetic Dentist - Full Time Vacancy
        Hard drop: matched excluded term(s): job, vacancy.
  - [ 25] 10 Foods That Naturally Whiten Your Smile
        matched smile; score 25 vs threshold 50 -> dropped.
  - [  0] Local Plumber - 24/7 Emergency Service
        no include terms matched; score 0 vs threshold 50 -> dropped.

=== Act 2: measure the qualifier ===

eval report (rule)
  examples: 12
  confusion: tp=6 fp=1 tn=4 fn=1
  precision: 0.857
  recall:    0.857
  f1:        0.857
  accuracy:  0.833

No network, no API keys.


Use it on your own data (CLI)

# qualify a scrape into kept / dropped, with a reason per lead
lead-qualifier qualify --source json:leads.json --criteria criteria.json --out kept.json

# measure the qualifier against a labeled set
lead-qualifier eval --labeled labeled.json --criteria criteria.json
  • --source is json:PATH or csv:PATH. JSON may be an array or an object with a leads key.
  • --criteria is the JSON spec below.
  • --out writes kept leads plus the full per-lead verdict list as JSON.
  • qualify and eval exit 0 on success and 2 on a usage or input error.

A criteria file is small and declarative:

{
  "name": "cosmetic-dentistry",
  "description": "real cosmetic dentistry practices, not products or articles",
  "include_keywords": ["veneers", "cosmetic", "dentist", "implant", "smile"],
  "exclude_keywords": ["toothpaste", "amazon", "wikipedia", "job", "vacancy"],
  "text_fields": ["title", "snippet"],
  "required_fields": ["url"],
  "threshold": 50
}

Qualify in code (library)

from lead_qualifier import RuleScorer, load_criteria, qualify

criteria = load_criteria("criteria.json")
leads = [
    {"title": "Bright Smile - Veneers & Cosmetic Dentistry", "snippet": "...", "url": "https://a.example"},
    {"title": "WhiteGlow Veneers Toothpaste", "snippet": "buy on amazon", "url": "https://b.example"},
]

result = qualify(leads, criteria, RuleScorer())
print(result.summary())          # {'total': 2, 'kept': 1, 'dropped': 1}
for verdict in result.verdicts:  # aligned with input order
    print(verdict.score, verdict.keep, verdict.reason)

The RuleScorer is deterministic and key-free, which is exactly why it is the default and the baseline you measure against.


Plug in an LLM (no SDK required)

The LLM is just a function from prompt to text. Adapt whatever client you already use to that signature and inject it; the package never imports a provider library, so it adds nothing to your dependency tree:

from lead_qualifier import LLMScorer, load_criteria, qualify

def my_llm(prompt: str) -> str:
    # call your own client here; return its raw text response
    return client.complete(prompt)

scorer = LLMScorer(my_llm)
result = qualify(leads, load_criteria("criteria.json"), scorer)

LLMScorer builds the prompt, parses a strict JSON verdict out of the response (even when the model wraps it in prose or a code fence), clamps the score to 0-100, and degrades gracefully: if the call raises or the output cannot be parsed, it returns a neutral below-threshold verdict whose reason records the fallback instead of crashing the batch.


Measure the qualifier

This is the part that makes the tool trustworthy. Hand evaluate a labeled set (each lead paired with the gold keep/drop boolean) and it returns the full confusion matrix and the four metrics that matter:

from lead_qualifier import RuleScorer, evaluate, load_criteria

labeled = [
    ({"title": "Bright Smile - Veneers", "url": "https://a.example"}, True),
    ({"title": "WhiteGlow Veneers Toothpaste", "url": "https://b.example"}, False),
]
report = evaluate(RuleScorer(), labeled, load_criteria("criteria.json"))
print(report.to_dict())
# {'tp': 1, 'fp': 0, 'tn': 1, 'fn': 0, 'precision': 1.0, 'recall': 1.0, 'f1': 1.0, 'accuracy': 1.0}

Now you can compare a rule scorer against an LLM scorer, or tune a threshold, with numbers instead of a hunch. A "positive" is a lead the scorer keeps: precision answers of the leads we kept, how many were real, recall answers of the real leads, how many did we keep.


Schedule it

cron

0 7 * * * cd /app && lead-qualifier qualify --source json:/data/scrape.json \
  --criteria /data/criteria.json --out /data/kept.json >> /var/log/qualifier.log 2>&1

Docker

docker build -t lead-qualifier .
docker run --rm -v "$PWD/data:/data" lead-qualifier qualify \
  --source json:/data/scrape.json --criteria /data/criteria.json --out /data/kept.json

Design notes

  • Two scorers, one interface. Anything with score(lead, criteria) -> Verdict is a Scorer. The deterministic RuleScorer is the baseline; the LLMScorer is the upgrade. You can measure both against the same labeled set with the same harness.
  • The toothpaste guard. Any exclude-keyword hit is a hard drop (keep=False, score=0), regardless of how many include words also matched. This is the exact case a naive if "veneers" in text filter gets wrong.
  • Scores are clamped. Every score is forced into 0-100 on construction, so an LLM returning score: 250 can never leak a runaway value downstream.
  • Graceful LLM fallback. A failed call or unparseable response yields a neutral, below-threshold verdict, not an exception, so one bad response does not sink a whole batch.
  • Order is preserved. qualify keeps input order within both the kept and dropped lists, and the verdict list is aligned one-to-one with the input.
  • No hidden dependencies. The LLM is injected, never imported, so the package installs with zero runtime dependencies and works fully offline in the demo and tests.
  • Measure, do not assume. The eval harness is a first-class part of the package, not an afterthought, because a qualifier you cannot measure is a qualifier you cannot trust.

Testing

make test     # pytest
make lint     # ruff

The suite covers the rule scorer (include hits, the exclude hard-drop, the threshold boundary, the missing-required-field penalty, the matched list, determinism), prompt parsing (clean JSON, JSON embedded in prose or a code fence, braces inside strings, graceful fallback), the LLM scorer with a fake injected callable, the pipeline (kept/dropped split, order preservation, summary counts), the eval harness (hand-built confusion matrices with exact precision/recall/f1/accuracy), criteria loading, and the CLI.


License

MIT. See LICENSE.

Built by Vinicius Pereira (github.com/vinimabreu)

About

Qualify scraped leads with rules or an LLM (0-100 score + reason) and measure the qualifier with a precision/recall eval harness. Zero runtime dependencies.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages