Skip to content

timjm25/WinLossAnalyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WinLossAnalyzer

License: MIT Python Flask Tests Domain

NLP-driven win/loss analysis for B2B pharmaceutical and CDMO sales teams. Surfaces the real reasons deals are won or lost — from CRM notes your team never had time to read.


The Business Problem

In B2B pharmaceutical services and contract development and manufacturing (CDMO) sales, each deal represents months of technical proposals, executive meetings, scientific feasibility studies, and regulatory reviews. A typical gene therapy manufacturing contract takes 6–18 months to close and involves 8–15 stakeholders across manufacturing, regulatory affairs, quality, and executive leadership.

When a deal closes — won or lost — the account executive records their debriefing notes in the CRM. These notes contain the most valuable intelligence a commercial team can collect: the real voice-of-customer reasons why a sponsor chose or rejected your organisation. "Lost on price," "regulatory track record was the differentiator," "incumbent relationship with Lonza made it impossible to displace" — this is the raw data of commercial strategy.

The problem is that this intelligence never gets synthesised. Notes sit unread in Salesforce or HubSpot. Commercial leaders rely on gut feel and anecdote to understand win and loss patterns. The same strategic mistakes repeat quarter after quarter. Sales reps walk into competitive situations without knowing which competitors they are likely to face, or which capabilities to lead with in a proposal.

WinLossAnalyzer solves this by running Natural Language Processing on your closed-deal notes at scale. It extracts themes, scores them against outcomes, surfaces the drivers of winning and losing, tracks competitor mentions across your pipeline history, and delivers actionable intelligence through a clean web dashboard — all without requiring a data science team or expensive ML infrastructure.

For CGT CDMOs, the stakes are particularly high. Programmes are large ($2M–$15M), sales cycles are long, and a handful of theme-level insights ("our regulatory CMC support team is a differentiator in 78% of won deals") can meaningfully shift commercial strategy, proposal positioning, and capability investment priorities.


What This Program Does

WinLossAnalyzer provides four integrated analytical capabilities:

1. NLP Engine — Processes free-text closed-deal notes using TF-IDF keyword extraction, domain-specific theme scoring across 12 CGT/CDMO themes, and rule-based sentiment analysis. No external ML libraries required — pure Python.

2. Win/Loss Driver Analysis — Identifies which themes correlate with won versus lost deals across your deal corpus. Surfaces the top win drivers (regulatory track record, relationship, technical capability) and top loss drivers (pricing, capacity constraints, incumbent relationships) with win rate, frequency, and example deals for each.

3. Competitive Intelligence — Automatically extracts competitor mentions (Lonza, WuXi ATU, Catalent, Samsung Biologics, Charles River, Oxford Biomedica, Brammer Bio) from deal notes, computes your win rate against each named competitor, and tracks which therapy areas they compete in.

4. CRM Connector Layer — Pluggable ABC-based connector architecture supports Mock (30 realistic CGT deals included), REST API (Salesforce/HubSpot-style), and CSV file import. Add your own connector in minutes.


Architecture

WinLossAnalyzer/
│
├── connectors/              # CRM connector layer
│   ├── base.py              # CRMConnector ABC
│   └── crm/
│       ├── mock_connector.py    # 30 CGT/CDMO deals, 18 won / 12 lost
│       ├── api_connector.py     # Salesforce / HubSpot REST stub
│       └── file_connector.py    # CSV import connector
│
├── engine/                  # Analysis engines
│   ├── nlp_processor.py     # TF-IDF, theme extraction, sentiment (pure Python)
│   ├── driver_analyzer.py   # Win/loss driver identification
│   └── competitive_analyzer.py  # Competitor mention extraction + win rates
│
├── winloss/                 # Flask application
│   ├── app.py               # Factory: browser routes + REST API
│   ├── model.py             # SQLite persistence + orchestration
│   ├── seed.py              # Seeds from MockCRMConnector
│   └── templates/           # Jinja2 HTML (Chart.js, no external CSS)
│       ├── base.html
│       ├── dashboard.html
│       ├── deals.html
│       ├── deal_detail.html
│       ├── drivers.html
│       └── competitive.html
│
├── tests/                   # 110 tests
│   ├── test_nlp_processor.py
│   ├── test_driver_analyzer.py
│   ├── test_competitive_analyzer.py
│   ├── test_connectors.py
│   └── test_model.py
│
├── run.py                   # Entrypoint (port 5073)
├── requirements.txt         # Flask + pytest only
└── LICENSE                  # MIT

Data flow:

CRM Notes
    │
    ▼
CRMConnector (Mock / API / File)
    │  fetch_deals()
    ▼
NLPProcessor.analyze_corpus()
    │  TF-IDF + theme scoring + sentiment
    ▼
DriverAnalyzer.top_drivers()          CompetitiveAnalyzer.extract_mentions()
    │  win/loss driver ranking              │  competitor entity extraction
    ▼                                       ▼
SQLite (deals, deal_nlp,            competitive_mentions, drivers)
    │
    ▼
Flask Dashboard + REST API

NLP Engine

The NLP engine is implemented in pure Python — no NLTK, spaCy, scikit-learn, or any external ML library. This means it deploys anywhere Python 3.11+ runs, with no GPU, no model download, and no dependency conflicts.

TF-IDF Keyword Extraction

Term Frequency–Inverse Document Frequency (TF-IDF) is computed across the entire deal corpus. For each deal note:

  • TF (Term Frequency): how often a term appears in this deal's notes, normalised by total word count
  • IDF (Inverse Document Frequency): log((N+1) / (df(t)+1)) + 1, where N is the corpus size and df(t) is how many deals contain the term
  • TF-IDF score: TF × IDF — terms that are distinctive to a specific deal score highest

This surfaces the vocabulary that makes each deal unique relative to the rest of your pipeline — the specific capability, objection, or competitor that defined that deal.

Domain Theme Scoring

Twelve CGT/CDMO domain themes are defined, each with a keyword lexicon:

Theme Type Example Keywords
regulatory_track_record Win regulatory, GMP, IND, BLA, CMC, compliance, MSAT, audit
technical_capability Win process, analytical, platform, manufacturing, QC, titer, yield
relationship Win relationship, partnership, trust, existing, referral, prior
timeline_speed Win timeline, fast, rapid, schedule, flexibility, weeks, expedited
capacity_availability Win capacity, suite, cleanroom, bioreactor, slot, availability
quality_systems Win quality, deviation, CAPA, batch record, validation, SOP
scientific_support Win scientific, advisory, optimization, yield, troubleshoot
pricing Loss price, cost, budget, expensive, rate, fees, cheaper
capacity_constraints Loss no capacity, fully booked, no availability, waitlist, constrained
incumbent_relationship Loss incumbent, existing contract, long-term agreement, switching cost
capability_gap Loss lack of experience, not validated, no platform, unproven, gap
competitor_won Loss Lonza, WuXi, Catalent, Samsung, Charles River, awarded to

For each theme, a confidence score is computed as min(1.0, keyword_hits / threshold). Themes are ranked by confidence and stored per deal.

Sentiment Analysis

A rule-based lexicon approach assigns each deal note a sentiment label:

  • Positive: presence of positive-outcome words (outstanding, reliable, awarded, preferred, satisfied)
  • Negative: presence of negative-outcome words (concern, risk, gap, lost, costly, constraint)
  • Mixed: both positive and negative signals present
  • Neutral: neither signal detected

A sentiment score on [-1.0, 1.0] is computed as (positive_count - negative_count) / total_sentiment_words.


Win/Loss Driver Analysis

The DriverAnalyzer correlates NLP theme presence with deal outcomes across the corpus. For each theme:

  1. Count how many won deals and how many lost deals mention the theme
  2. Compute win_rate = won_count / total_count
  3. Classify as a win driver if won_count >= lost_count and won_count > 0
  4. Classify as a loss driver if lost_count >= won_count and lost_count > 0
  5. Extract the top TF-IDF keywords from deals where this theme appeared

The result is a ranked list of drivers with frequency (how many deals), win rate, and example deal IDs for qualitative review.

Typical output for a CGT CDMO corpus:

Win drivers (by frequency): Technical Capability → Regulatory Track Record → Quality Systems → Relationship → Timeline Speed

Loss drivers (by frequency): Pricing → Competitor Won → Capacity Constraints → Incumbent Relationship → Capability Gap


Competitive Intelligence

The CompetitiveAnalyzer performs named entity extraction for a predefined list of known competitors:

Lonza, WuXi ATU, Catalent, Samsung Biologics, Oxford Biomedica,
Brammer Bio, Charles River, Cellero, PCT, Thermo Fisher

For each deal note containing a competitor mention:

  • The relevant sentence is extracted as context
  • The deal outcome (won/lost) is recorded
  • won_against = True if the deal was won despite the competitor being mentioned

Aggregated into CompetitorProfile objects with:

  • total_mentions: how often this competitor appears in your pipeline
  • won_against: deals where you prevailed despite their presence
  • lost_to: deals awarded to this competitor
  • win_rate_against: your win rate in competitive situations involving this firm
  • therapy_areas: which modalities they compete in

CRM Connectors

All connectors implement the CRMConnector ABC from connectors/base.py:

class CRMConnector(ABC):
    def fetch_deals(self, outcome: str | None = None) -> list[dict]: ...
    def fetch_deal(self, deal_id: str) -> dict | None: ...
    def fetch_contacts(self, deal_id: str | None = None) -> list[dict]: ...
    def fetch_competitors(self) -> list[dict]: ...
    def health_check(self) -> dict: ...

Mock Connector

30 CGT/CDMO deals across 8 therapy areas (CAR-T, AAV Gene Therapy, LVV, NK Cell, TCR-T, Allogeneic Cell, Plasmid DNA, mRNA), 4 sales reps, deals ranging from $1.8M to $12M closing between 2024–2025. Engineered to surface all 12 NLP themes with realistic sponsor names from the CGT biotech landscape.

API Connector

Salesforce / HubSpot-compatible REST connector using Python's urllib.request (no external HTTP library). Supports bearer token authentication. Configure your CRM's base URL:

from connectors.crm.api_connector import APICRMConnector
conn = APICRMConnector("https://your-crm-api.com", api_key="your-token")

File/CSV Connector

Import deals from a CSV export. Required columns: deal_id, name, account, outcome, deal_value_usd, close_date, sales_rep, therapy_area, modality, notes_text.

from connectors.crm.file_connector import FileCRMConnector
conn = FileCRMConnector("exported_deals.csv")
deals = conn.fetch_deals()

Quickstart

# Clone
git clone https://github.com/timjm25/WinLossAnalyzer.git
cd WinLossAnalyzer

# Install (no heavy ML dependencies)
pip install -r requirements.txt

# Run (auto-seeds with 30 mock CGT deals on first launch)
python run.py

# Open browser
open http://127.0.0.1:5073

Run tests:

python3 -m pytest tests/ -v

REST API Reference

Method Endpoint Description
GET /api/v1/deals List all deals (optional ?outcome=won|lost)
GET /api/v1/deals/<deal_id> Get single deal
POST /api/v1/deals Add a deal (JSON body)
GET /api/v1/drivers List drivers (optional ?type=win|loss)
GET /api/v1/competitive Competitor profiles + summary
GET /api/v1/stats Dashboard statistics
POST /api/v1/run_analysis Trigger full NLP re-analysis
GET /api/v1/connectors/health Connector health status

Example:

curl http://127.0.0.1:5073/api/v1/stats
# {"total_deals":30,"won_deals":18,"lost_deals":12,"win_rate":0.6,
#  "top_win_driver":"Technical Capability","top_loss_driver":"Pricing",...}

curl http://127.0.0.1:5073/api/v1/drivers?type=win

curl -X POST http://127.0.0.1:5073/api/v1/run_analysis
# {"nlp_results":30,"drivers":12,"mentions":9,"status":"ok"}

Mock Dataset

The included mock dataset contains 30 CGT manufacturing deals across a realistic CDMO pipeline:

Metric Value
Total Deals 30
Won 18 (60% win rate)
Lost 12
Therapy Areas CAR-T, AAV Gene Therapy, LVV, NK Cell, TCR-T, Allogeneic Cell, Plasmid DNA
Deal Range $1.8M – $12M
Date Range March 2024 – June 2025
Sales Reps Sarah Chen, Marcus Webb, Priya Sharma, David Torres

Competitor scenarios included:

  • Lonza (3 mentions — LVV commercial scale, BLA incumbent, allogeneic capacity)
  • WuXi ATU (2 mentions — AAV capacity/timeline)
  • Catalent (2 mentions — AAV incumbent MSA)
  • Samsung Biologics (1 mention — allogeneic pricing)
  • Charles River (1 mention — autologous CAR-T incumbent)
  • Brammer Bio (1 mention — LVV pricing comparison)

Win scenarios include: regulatory differentiation, long-term partnership renewal, scientific advisory as deal catalyst, technical feasibility-to-award pipeline, quality audit zero-observations.

Loss scenarios include: commercial-scale capacity gap, pricing differential exceeding budget ceiling, incumbent relationship lock-in, analytical capability gap, timeline constraint, regulatory compliance concern.


Extending the CRM Connector

Implement CRMConnector to connect to your live CRM system:

from connectors.base import CRMConnector

class SalesforceCRMConnector(CRMConnector):
    def __init__(self, instance_url: str, access_token: str):
        self.instance_url = instance_url
        self.access_token = access_token

    def fetch_deals(self, outcome=None):
        soql = "SELECT Id, Name, Account.Name, Amount, CloseDate, Description FROM Opportunity WHERE IsClosed=true"
        if outcome == "won":
            soql += " AND IsWon=true"
        elif outcome == "lost":
            soql += " AND IsWon=false"
        raw = self._soql_query(soql)
        return [self._map_opportunity(r) for r in raw["records"]]

    def _map_opportunity(self, opp: dict) -> dict:
        return {
            "deal_id": opp["Id"],
            "crm_id": opp["Id"],
            "name": opp["Name"],
            "account": opp.get("Account", {}).get("Name", ""),
            "outcome": "won" if opp.get("IsWon") else "lost",
            "deal_value_usd": int(opp.get("Amount", 0) or 0),
            "close_date": opp.get("CloseDate", ""),
            "notes_text": opp.get("Description", ""),
            "sales_rep": "",
            "therapy_area": "",
            "modality": "",
            "deal_stage_lost": None,
        }

    def fetch_deal(self, deal_id): ...
    def fetch_contacts(self, deal_id=None): ...
    def fetch_competitors(self): return []
    def health_check(self): return {"connector": "SalesforceCRMConnector", "status": "healthy"}

Pages

Route Description
/ Dashboard: stats, win rate chart, top drivers, recent deals
/deals Full deal table with won/lost filter
/deal/<id> Deal detail: NLP themes, TF-IDF keywords, sentiment, competitor mentions
/drivers Win/loss driver analysis with bar charts and theme win rate table
/competitive Competitor landscape: win rates against named competitors, deal feed

License

MIT License — see LICENSE.

Copyright (c) 2024 Tim Maguire.

About

NLP-driven win/loss analysis for B2B pharma/CDMO sales teams — CRM connectors, TF-IDF theme extraction, competitive intelligence | MIT License

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors