WinLossAnalyzer

NLP-driven win/loss analysis for B2B pharmaceutical and CDMO sales teams. Surfaces the real reasons deals are won or lost — from CRM notes your team never had time to read.

The Business Problem

In B2B pharmaceutical services and contract development and manufacturing (CDMO) sales, each deal represents months of technical proposals, executive meetings, scientific feasibility studies, and regulatory reviews. A typical gene therapy manufacturing contract takes 6–18 months to close and involves 8–15 stakeholders across manufacturing, regulatory affairs, quality, and executive leadership.

When a deal closes — won or lost — the account executive records their debriefing notes in the CRM. These notes contain the most valuable intelligence a commercial team can collect: the real voice-of-customer reasons why a sponsor chose or rejected your organisation. "Lost on price," "regulatory track record was the differentiator," "incumbent relationship with Lonza made it impossible to displace" — this is the raw data of commercial strategy.

The problem is that this intelligence never gets synthesised. Notes sit unread in Salesforce or HubSpot. Commercial leaders rely on gut feel and anecdote to understand win and loss patterns. The same strategic mistakes repeat quarter after quarter. Sales reps walk into competitive situations without knowing which competitors they are likely to face, or which capabilities to lead with in a proposal.

WinLossAnalyzer solves this by running Natural Language Processing on your closed-deal notes at scale. It extracts themes, scores them against outcomes, surfaces the drivers of winning and losing, tracks competitor mentions across your pipeline history, and delivers actionable intelligence through a clean web dashboard — all without requiring a data science team or expensive ML infrastructure.

For CGT CDMOs, the stakes are particularly high. Programmes are large ($2M–$15M), sales cycles are long, and a handful of theme-level insights ("our regulatory CMC support team is a differentiator in 78% of won deals") can meaningfully shift commercial strategy, proposal positioning, and capability investment priorities.

What This Program Does

WinLossAnalyzer provides four integrated analytical capabilities:

1. NLP Engine — Processes free-text closed-deal notes using TF-IDF keyword extraction, domain-specific theme scoring across 12 CGT/CDMO themes, and rule-based sentiment analysis. No external ML libraries required — pure Python.

2. Win/Loss Driver Analysis — Identifies which themes correlate with won versus lost deals across your deal corpus. Surfaces the top win drivers (regulatory track record, relationship, technical capability) and top loss drivers (pricing, capacity constraints, incumbent relationships) with win rate, frequency, and example deals for each.

3. Competitive Intelligence — Automatically extracts competitor mentions (Lonza, WuXi ATU, Catalent, Samsung Biologics, Charles River, Oxford Biomedica, Brammer Bio) from deal notes, computes your win rate against each named competitor, and tracks which therapy areas they compete in.

4. CRM Connector Layer — Pluggable ABC-based connector architecture supports Mock (30 realistic CGT deals included), REST API (Salesforce/HubSpot-style), and CSV file import. Add your own connector in minutes.

Architecture

WinLossAnalyzer/
│
├── connectors/              # CRM connector layer
│   ├── base.py              # CRMConnector ABC
│   └── crm/
│       ├── mock_connector.py    # 30 CGT/CDMO deals, 18 won / 12 lost
│       ├── api_connector.py     # Salesforce / HubSpot REST stub
│       └── file_connector.py    # CSV import connector
│
├── engine/                  # Analysis engines
│   ├── nlp_processor.py     # TF-IDF, theme extraction, sentiment (pure Python)
│   ├── driver_analyzer.py   # Win/loss driver identification
│   └── competitive_analyzer.py  # Competitor mention extraction + win rates
│
├── winloss/                 # Flask application
│   ├── app.py               # Factory: browser routes + REST API
│   ├── model.py             # SQLite persistence + orchestration
│   ├── seed.py              # Seeds from MockCRMConnector
│   └── templates/           # Jinja2 HTML (Chart.js, no external CSS)
│       ├── base.html
│       ├── dashboard.html
│       ├── deals.html
│       ├── deal_detail.html
│       ├── drivers.html
│       └── competitive.html
│
├── tests/                   # 110 tests
│   ├── test_nlp_processor.py
│   ├── test_driver_analyzer.py
│   ├── test_competitive_analyzer.py
│   ├── test_connectors.py
│   └── test_model.py
│
├── run.py                   # Entrypoint (port 5073)
├── requirements.txt         # Flask + pytest only
└── LICENSE                  # MIT

Data flow:

CRM Notes
    │
    ▼
CRMConnector (Mock / API / File)
    │  fetch_deals()
    ▼
NLPProcessor.analyze_corpus()
    │  TF-IDF + theme scoring + sentiment
    ▼
DriverAnalyzer.top_drivers()          CompetitiveAnalyzer.extract_mentions()
    │  win/loss driver ranking              │  competitor entity extraction
    ▼                                       ▼
SQLite (deals, deal_nlp,            competitive_mentions, drivers)
    │
    ▼
Flask Dashboard + REST API

NLP Engine

The NLP engine is implemented in pure Python — no NLTK, spaCy, scikit-learn, or any external ML library. This means it deploys anywhere Python 3.11+ runs, with no GPU, no model download, and no dependency conflicts.

TF-IDF Keyword Extraction

Term Frequency–Inverse Document Frequency (TF-IDF) is computed across the entire deal corpus. For each deal note:

TF (Term Frequency): how often a term appears in this deal's notes, normalised by total word count
IDF (Inverse Document Frequency): log((N+1) / (df(t)+1)) + 1, where N is the corpus size and df(t) is how many deals contain the term
TF-IDF score: TF × IDF — terms that are distinctive to a specific deal score highest

This surfaces the vocabulary that makes each deal unique relative to the rest of your pipeline — the specific capability, objection, or competitor that defined that deal.

Domain Theme Scoring

Twelve CGT/CDMO domain themes are defined, each with a keyword lexicon:

Theme	Type	Example Keywords
`regulatory_track_record`	Win	regulatory, GMP, IND, BLA, CMC, compliance, MSAT, audit
`technical_capability`	Win	process, analytical, platform, manufacturing, QC, titer, yield
`relationship`	Win	relationship, partnership, trust, existing, referral, prior
`timeline_speed`	Win	timeline, fast, rapid, schedule, flexibility, weeks, expedited
`capacity_availability`	Win	capacity, suite, cleanroom, bioreactor, slot, availability
`quality_systems`	Win	quality, deviation, CAPA, batch record, validation, SOP
`scientific_support`	Win	scientific, advisory, optimization, yield, troubleshoot
`pricing`	Loss	price, cost, budget, expensive, rate, fees, cheaper
`capacity_constraints`	Loss	no capacity, fully booked, no availability, waitlist, constrained
`incumbent_relationship`	Loss	incumbent, existing contract, long-term agreement, switching cost
`capability_gap`	Loss	lack of experience, not validated, no platform, unproven, gap
`competitor_won`	Loss	Lonza, WuXi, Catalent, Samsung, Charles River, awarded to

For each theme, a confidence score is computed as min(1.0, keyword_hits / threshold). Themes are ranked by confidence and stored per deal.

Sentiment Analysis

A rule-based lexicon approach assigns each deal note a sentiment label:

Positive: presence of positive-outcome words (outstanding, reliable, awarded, preferred, satisfied)
Negative: presence of negative-outcome words (concern, risk, gap, lost, costly, constraint)
Mixed: both positive and negative signals present
Neutral: neither signal detected

A sentiment score on [-1.0, 1.0] is computed as (positive_count - negative_count) / total_sentiment_words.

Win/Loss Driver Analysis

The DriverAnalyzer correlates NLP theme presence with deal outcomes across the corpus. For each theme:

Count how many won deals and how many lost deals mention the theme
Compute win_rate = won_count / total_count
Classify as a win driver if won_count >= lost_count and won_count > 0
Classify as a loss driver if lost_count >= won_count and lost_count > 0
Extract the top TF-IDF keywords from deals where this theme appeared

The result is a ranked list of drivers with frequency (how many deals), win rate, and example deal IDs for qualitative review.

Typical output for a CGT CDMO corpus:

Win drivers (by frequency): Technical Capability → Regulatory Track Record → Quality Systems → Relationship → Timeline Speed

Loss drivers (by frequency): Pricing → Competitor Won → Capacity Constraints → Incumbent Relationship → Capability Gap

Competitive Intelligence

The CompetitiveAnalyzer performs named entity extraction for a predefined list of known competitors:

Lonza, WuXi ATU, Catalent, Samsung Biologics, Oxford Biomedica,
Brammer Bio, Charles River, Cellero, PCT, Thermo Fisher

For each deal note containing a competitor mention:

The relevant sentence is extracted as context
The deal outcome (won/lost) is recorded
won_against = True if the deal was won despite the competitor being mentioned

Aggregated into CompetitorProfile objects with:

total_mentions: how often this competitor appears in your pipeline
won_against: deals where you prevailed despite their presence
lost_to: deals awarded to this competitor
win_rate_against: your win rate in competitive situations involving this firm
therapy_areas: which modalities they compete in

CRM Connectors

All connectors implement the CRMConnector ABC from connectors/base.py:

class CRMConnector(ABC):
    def fetch_deals(self, outcome: str | None = None) -> list[dict]: ...
    def fetch_deal(self, deal_id: str) -> dict | None: ...
    def fetch_contacts(self, deal_id: str | None = None) -> list[dict]: ...
    def fetch_competitors(self) -> list[dict]: ...
    def health_check(self) -> dict: ...

Mock Connector

30 CGT/CDMO deals across 8 therapy areas (CAR-T, AAV Gene Therapy, LVV, NK Cell, TCR-T, Allogeneic Cell, Plasmid DNA, mRNA), 4 sales reps, deals ranging from $1.8M to $12M closing between 2024–2025. Engineered to surface all 12 NLP themes with realistic sponsor names from the CGT biotech landscape.

API Connector

Salesforce / HubSpot-compatible REST connector using Python's urllib.request (no external HTTP library). Supports bearer token authentication. Configure your CRM's base URL:

from connectors.crm.api_connector import APICRMConnector
conn = APICRMConnector("https://your-crm-api.com", api_key="your-token")

File/CSV Connector

Import deals from a CSV export. Required columns: deal_id, name, account, outcome, deal_value_usd, close_date, sales_rep, therapy_area, modality, notes_text.

from connectors.crm.file_connector import FileCRMConnector
conn = FileCRMConnector("exported_deals.csv")
deals = conn.fetch_deals()

Quickstart

# Clone
git clone https://github.com/timjm25/WinLossAnalyzer.git
cd WinLossAnalyzer

# Install (no heavy ML dependencies)
pip install -r requirements.txt

# Run (auto-seeds with 30 mock CGT deals on first launch)
python run.py

# Open browser
open http://127.0.0.1:5073

Run tests:

python3 -m pytest tests/ -v

REST API Reference

Method	Endpoint	Description
`GET`	`/api/v1/deals`	List all deals (optional `?outcome=won\|lost`)
`GET`	`/api/v1/deals/<deal_id>`	Get single deal
`POST`	`/api/v1/deals`	Add a deal (JSON body)
`GET`	`/api/v1/drivers`	List drivers (optional `?type=win\|loss`)
`GET`	`/api/v1/competitive`	Competitor profiles + summary
`GET`	`/api/v1/stats`	Dashboard statistics
`POST`	`/api/v1/run_analysis`	Trigger full NLP re-analysis
`GET`	`/api/v1/connectors/health`	Connector health status

Example:

curl http://127.0.0.1:5073/api/v1/stats
# {"total_deals":30,"won_deals":18,"lost_deals":12,"win_rate":0.6,
#  "top_win_driver":"Technical Capability","top_loss_driver":"Pricing",...}

curl http://127.0.0.1:5073/api/v1/drivers?type=win

curl -X POST http://127.0.0.1:5073/api/v1/run_analysis
# {"nlp_results":30,"drivers":12,"mentions":9,"status":"ok"}

Mock Dataset

The included mock dataset contains 30 CGT manufacturing deals across a realistic CDMO pipeline:

Metric	Value
Total Deals	30
Won	18 (60% win rate)
Lost	12
Therapy Areas	CAR-T, AAV Gene Therapy, LVV, NK Cell, TCR-T, Allogeneic Cell, Plasmid DNA
Deal Range	$1.8M – $12M
Date Range	March 2024 – June 2025
Sales Reps	Sarah Chen, Marcus Webb, Priya Sharma, David Torres

Competitor scenarios included:

Lonza (3 mentions — LVV commercial scale, BLA incumbent, allogeneic capacity)
WuXi ATU (2 mentions — AAV capacity/timeline)
Catalent (2 mentions — AAV incumbent MSA)
Samsung Biologics (1 mention — allogeneic pricing)
Charles River (1 mention — autologous CAR-T incumbent)
Brammer Bio (1 mention — LVV pricing comparison)

Win scenarios include: regulatory differentiation, long-term partnership renewal, scientific advisory as deal catalyst, technical feasibility-to-award pipeline, quality audit zero-observations.

Loss scenarios include: commercial-scale capacity gap, pricing differential exceeding budget ceiling, incumbent relationship lock-in, analytical capability gap, timeline constraint, regulatory compliance concern.

Extending the CRM Connector

Implement CRMConnector to connect to your live CRM system:

from connectors.base import CRMConnector

class SalesforceCRMConnector(CRMConnector):
    def __init__(self, instance_url: str, access_token: str):
        self.instance_url = instance_url
        self.access_token = access_token

    def fetch_deals(self, outcome=None):
        soql = "SELECT Id, Name, Account.Name, Amount, CloseDate, Description FROM Opportunity WHERE IsClosed=true"
        if outcome == "won":
            soql += " AND IsWon=true"
        elif outcome == "lost":
            soql += " AND IsWon=false"
        raw = self._soql_query(soql)
        return [self._map_opportunity(r) for r in raw["records"]]

    def _map_opportunity(self, opp: dict) -> dict:
        return {
            "deal_id": opp["Id"],
            "crm_id": opp["Id"],
            "name": opp["Name"],
            "account": opp.get("Account", {}).get("Name", ""),
            "outcome": "won" if opp.get("IsWon") else "lost",
            "deal_value_usd": int(opp.get("Amount", 0) or 0),
            "close_date": opp.get("CloseDate", ""),
            "notes_text": opp.get("Description", ""),
            "sales_rep": "",
            "therapy_area": "",
            "modality": "",
            "deal_stage_lost": None,
        }

    def fetch_deal(self, deal_id): ...
    def fetch_contacts(self, deal_id=None): ...
    def fetch_competitors(self): return []
    def health_check(self): return {"connector": "SalesforceCRMConnector", "status": "healthy"}

Pages

Route	Description
`/`	Dashboard: stats, win rate chart, top drivers, recent deals
`/deals`	Full deal table with won/lost filter
`/deal/<id>`	Deal detail: NLP themes, TF-IDF keywords, sentiment, competitor mentions
`/drivers`	Win/loss driver analysis with bar charts and theme win rate table
`/competitive`	Competitor landscape: win rates against named competitors, deal feed

License

MIT License — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WinLossAnalyzer

The Business Problem

What This Program Does

Architecture

NLP Engine

TF-IDF Keyword Extraction

Domain Theme Scoring

Sentiment Analysis

Win/Loss Driver Analysis

Competitive Intelligence

CRM Connectors

Mock Connector

API Connector

File/CSV Connector

Quickstart

REST API Reference

Mock Dataset

Extending the CRM Connector

Pages

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
connectors		connectors
engine		engine
tests		tests
winloss		winloss
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Folders and files

Latest commit

History

Repository files navigation

WinLossAnalyzer

The Business Problem

What This Program Does

Architecture

NLP Engine

TF-IDF Keyword Extraction

Domain Theme Scoring

Sentiment Analysis

Win/Loss Driver Analysis

Competitive Intelligence

CRM Connectors

Mock Connector

API Connector

File/CSV Connector

Quickstart

REST API Reference

Mock Dataset

Extending the CRM Connector

Pages

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages