NLP-driven win/loss analysis for B2B pharmaceutical and CDMO sales teams. Surfaces the real reasons deals are won or lost — from CRM notes your team never had time to read.
In B2B pharmaceutical services and contract development and manufacturing (CDMO) sales, each deal represents months of technical proposals, executive meetings, scientific feasibility studies, and regulatory reviews. A typical gene therapy manufacturing contract takes 6–18 months to close and involves 8–15 stakeholders across manufacturing, regulatory affairs, quality, and executive leadership.
When a deal closes — won or lost — the account executive records their debriefing notes in the CRM. These notes contain the most valuable intelligence a commercial team can collect: the real voice-of-customer reasons why a sponsor chose or rejected your organisation. "Lost on price," "regulatory track record was the differentiator," "incumbent relationship with Lonza made it impossible to displace" — this is the raw data of commercial strategy.
The problem is that this intelligence never gets synthesised. Notes sit unread in Salesforce or HubSpot. Commercial leaders rely on gut feel and anecdote to understand win and loss patterns. The same strategic mistakes repeat quarter after quarter. Sales reps walk into competitive situations without knowing which competitors they are likely to face, or which capabilities to lead with in a proposal.
WinLossAnalyzer solves this by running Natural Language Processing on your closed-deal notes at scale. It extracts themes, scores them against outcomes, surfaces the drivers of winning and losing, tracks competitor mentions across your pipeline history, and delivers actionable intelligence through a clean web dashboard — all without requiring a data science team or expensive ML infrastructure.
For CGT CDMOs, the stakes are particularly high. Programmes are large ($2M–$15M), sales cycles are long, and a handful of theme-level insights ("our regulatory CMC support team is a differentiator in 78% of won deals") can meaningfully shift commercial strategy, proposal positioning, and capability investment priorities.
WinLossAnalyzer provides four integrated analytical capabilities:
1. NLP Engine — Processes free-text closed-deal notes using TF-IDF keyword extraction, domain-specific theme scoring across 12 CGT/CDMO themes, and rule-based sentiment analysis. No external ML libraries required — pure Python.
2. Win/Loss Driver Analysis — Identifies which themes correlate with won versus lost deals across your deal corpus. Surfaces the top win drivers (regulatory track record, relationship, technical capability) and top loss drivers (pricing, capacity constraints, incumbent relationships) with win rate, frequency, and example deals for each.
3. Competitive Intelligence — Automatically extracts competitor mentions (Lonza, WuXi ATU, Catalent, Samsung Biologics, Charles River, Oxford Biomedica, Brammer Bio) from deal notes, computes your win rate against each named competitor, and tracks which therapy areas they compete in.
4. CRM Connector Layer — Pluggable ABC-based connector architecture supports Mock (30 realistic CGT deals included), REST API (Salesforce/HubSpot-style), and CSV file import. Add your own connector in minutes.
WinLossAnalyzer/
│
├── connectors/ # CRM connector layer
│ ├── base.py # CRMConnector ABC
│ └── crm/
│ ├── mock_connector.py # 30 CGT/CDMO deals, 18 won / 12 lost
│ ├── api_connector.py # Salesforce / HubSpot REST stub
│ └── file_connector.py # CSV import connector
│
├── engine/ # Analysis engines
│ ├── nlp_processor.py # TF-IDF, theme extraction, sentiment (pure Python)
│ ├── driver_analyzer.py # Win/loss driver identification
│ └── competitive_analyzer.py # Competitor mention extraction + win rates
│
├── winloss/ # Flask application
│ ├── app.py # Factory: browser routes + REST API
│ ├── model.py # SQLite persistence + orchestration
│ ├── seed.py # Seeds from MockCRMConnector
│ └── templates/ # Jinja2 HTML (Chart.js, no external CSS)
│ ├── base.html
│ ├── dashboard.html
│ ├── deals.html
│ ├── deal_detail.html
│ ├── drivers.html
│ └── competitive.html
│
├── tests/ # 110 tests
│ ├── test_nlp_processor.py
│ ├── test_driver_analyzer.py
│ ├── test_competitive_analyzer.py
│ ├── test_connectors.py
│ └── test_model.py
│
├── run.py # Entrypoint (port 5073)
├── requirements.txt # Flask + pytest only
└── LICENSE # MIT
Data flow:
CRM Notes
│
▼
CRMConnector (Mock / API / File)
│ fetch_deals()
▼
NLPProcessor.analyze_corpus()
│ TF-IDF + theme scoring + sentiment
▼
DriverAnalyzer.top_drivers() CompetitiveAnalyzer.extract_mentions()
│ win/loss driver ranking │ competitor entity extraction
▼ ▼
SQLite (deals, deal_nlp, competitive_mentions, drivers)
│
▼
Flask Dashboard + REST API
The NLP engine is implemented in pure Python — no NLTK, spaCy, scikit-learn, or any external ML library. This means it deploys anywhere Python 3.11+ runs, with no GPU, no model download, and no dependency conflicts.
Term Frequency–Inverse Document Frequency (TF-IDF) is computed across the entire deal corpus. For each deal note:
- TF (Term Frequency): how often a term appears in this deal's notes, normalised by total word count
- IDF (Inverse Document Frequency):
log((N+1) / (df(t)+1)) + 1, where N is the corpus size and df(t) is how many deals contain the term - TF-IDF score: TF × IDF — terms that are distinctive to a specific deal score highest
This surfaces the vocabulary that makes each deal unique relative to the rest of your pipeline — the specific capability, objection, or competitor that defined that deal.
Twelve CGT/CDMO domain themes are defined, each with a keyword lexicon:
| Theme | Type | Example Keywords |
|---|---|---|
regulatory_track_record |
Win | regulatory, GMP, IND, BLA, CMC, compliance, MSAT, audit |
technical_capability |
Win | process, analytical, platform, manufacturing, QC, titer, yield |
relationship |
Win | relationship, partnership, trust, existing, referral, prior |
timeline_speed |
Win | timeline, fast, rapid, schedule, flexibility, weeks, expedited |
capacity_availability |
Win | capacity, suite, cleanroom, bioreactor, slot, availability |
quality_systems |
Win | quality, deviation, CAPA, batch record, validation, SOP |
scientific_support |
Win | scientific, advisory, optimization, yield, troubleshoot |
pricing |
Loss | price, cost, budget, expensive, rate, fees, cheaper |
capacity_constraints |
Loss | no capacity, fully booked, no availability, waitlist, constrained |
incumbent_relationship |
Loss | incumbent, existing contract, long-term agreement, switching cost |
capability_gap |
Loss | lack of experience, not validated, no platform, unproven, gap |
competitor_won |
Loss | Lonza, WuXi, Catalent, Samsung, Charles River, awarded to |
For each theme, a confidence score is computed as min(1.0, keyword_hits / threshold). Themes are ranked by confidence and stored per deal.
A rule-based lexicon approach assigns each deal note a sentiment label:
- Positive: presence of positive-outcome words (outstanding, reliable, awarded, preferred, satisfied)
- Negative: presence of negative-outcome words (concern, risk, gap, lost, costly, constraint)
- Mixed: both positive and negative signals present
- Neutral: neither signal detected
A sentiment score on [-1.0, 1.0] is computed as (positive_count - negative_count) / total_sentiment_words.
The DriverAnalyzer correlates NLP theme presence with deal outcomes across the corpus. For each theme:
- Count how many won deals and how many lost deals mention the theme
- Compute
win_rate = won_count / total_count - Classify as a win driver if
won_count >= lost_countandwon_count > 0 - Classify as a loss driver if
lost_count >= won_countandlost_count > 0 - Extract the top TF-IDF keywords from deals where this theme appeared
The result is a ranked list of drivers with frequency (how many deals), win rate, and example deal IDs for qualitative review.
Typical output for a CGT CDMO corpus:
Win drivers (by frequency): Technical Capability → Regulatory Track Record → Quality Systems → Relationship → Timeline Speed
Loss drivers (by frequency): Pricing → Competitor Won → Capacity Constraints → Incumbent Relationship → Capability Gap
The CompetitiveAnalyzer performs named entity extraction for a predefined list of known competitors:
Lonza, WuXi ATU, Catalent, Samsung Biologics, Oxford Biomedica,
Brammer Bio, Charles River, Cellero, PCT, Thermo Fisher
For each deal note containing a competitor mention:
- The relevant sentence is extracted as context
- The deal outcome (won/lost) is recorded
won_against = Trueif the deal was won despite the competitor being mentioned
Aggregated into CompetitorProfile objects with:
total_mentions: how often this competitor appears in your pipelinewon_against: deals where you prevailed despite their presencelost_to: deals awarded to this competitorwin_rate_against: your win rate in competitive situations involving this firmtherapy_areas: which modalities they compete in
All connectors implement the CRMConnector ABC from connectors/base.py:
class CRMConnector(ABC):
def fetch_deals(self, outcome: str | None = None) -> list[dict]: ...
def fetch_deal(self, deal_id: str) -> dict | None: ...
def fetch_contacts(self, deal_id: str | None = None) -> list[dict]: ...
def fetch_competitors(self) -> list[dict]: ...
def health_check(self) -> dict: ...30 CGT/CDMO deals across 8 therapy areas (CAR-T, AAV Gene Therapy, LVV, NK Cell, TCR-T, Allogeneic Cell, Plasmid DNA, mRNA), 4 sales reps, deals ranging from $1.8M to $12M closing between 2024–2025. Engineered to surface all 12 NLP themes with realistic sponsor names from the CGT biotech landscape.
Salesforce / HubSpot-compatible REST connector using Python's urllib.request (no external HTTP library). Supports bearer token authentication. Configure your CRM's base URL:
from connectors.crm.api_connector import APICRMConnector
conn = APICRMConnector("https://your-crm-api.com", api_key="your-token")Import deals from a CSV export. Required columns: deal_id, name, account, outcome, deal_value_usd, close_date, sales_rep, therapy_area, modality, notes_text.
from connectors.crm.file_connector import FileCRMConnector
conn = FileCRMConnector("exported_deals.csv")
deals = conn.fetch_deals()# Clone
git clone https://github.com/timjm25/WinLossAnalyzer.git
cd WinLossAnalyzer
# Install (no heavy ML dependencies)
pip install -r requirements.txt
# Run (auto-seeds with 30 mock CGT deals on first launch)
python run.py
# Open browser
open http://127.0.0.1:5073Run tests:
python3 -m pytest tests/ -v| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/deals |
List all deals (optional ?outcome=won|lost) |
GET |
/api/v1/deals/<deal_id> |
Get single deal |
POST |
/api/v1/deals |
Add a deal (JSON body) |
GET |
/api/v1/drivers |
List drivers (optional ?type=win|loss) |
GET |
/api/v1/competitive |
Competitor profiles + summary |
GET |
/api/v1/stats |
Dashboard statistics |
POST |
/api/v1/run_analysis |
Trigger full NLP re-analysis |
GET |
/api/v1/connectors/health |
Connector health status |
Example:
curl http://127.0.0.1:5073/api/v1/stats
# {"total_deals":30,"won_deals":18,"lost_deals":12,"win_rate":0.6,
# "top_win_driver":"Technical Capability","top_loss_driver":"Pricing",...}
curl http://127.0.0.1:5073/api/v1/drivers?type=win
curl -X POST http://127.0.0.1:5073/api/v1/run_analysis
# {"nlp_results":30,"drivers":12,"mentions":9,"status":"ok"}The included mock dataset contains 30 CGT manufacturing deals across a realistic CDMO pipeline:
| Metric | Value |
|---|---|
| Total Deals | 30 |
| Won | 18 (60% win rate) |
| Lost | 12 |
| Therapy Areas | CAR-T, AAV Gene Therapy, LVV, NK Cell, TCR-T, Allogeneic Cell, Plasmid DNA |
| Deal Range | $1.8M – $12M |
| Date Range | March 2024 – June 2025 |
| Sales Reps | Sarah Chen, Marcus Webb, Priya Sharma, David Torres |
Competitor scenarios included:
- Lonza (3 mentions — LVV commercial scale, BLA incumbent, allogeneic capacity)
- WuXi ATU (2 mentions — AAV capacity/timeline)
- Catalent (2 mentions — AAV incumbent MSA)
- Samsung Biologics (1 mention — allogeneic pricing)
- Charles River (1 mention — autologous CAR-T incumbent)
- Brammer Bio (1 mention — LVV pricing comparison)
Win scenarios include: regulatory differentiation, long-term partnership renewal, scientific advisory as deal catalyst, technical feasibility-to-award pipeline, quality audit zero-observations.
Loss scenarios include: commercial-scale capacity gap, pricing differential exceeding budget ceiling, incumbent relationship lock-in, analytical capability gap, timeline constraint, regulatory compliance concern.
Implement CRMConnector to connect to your live CRM system:
from connectors.base import CRMConnector
class SalesforceCRMConnector(CRMConnector):
def __init__(self, instance_url: str, access_token: str):
self.instance_url = instance_url
self.access_token = access_token
def fetch_deals(self, outcome=None):
soql = "SELECT Id, Name, Account.Name, Amount, CloseDate, Description FROM Opportunity WHERE IsClosed=true"
if outcome == "won":
soql += " AND IsWon=true"
elif outcome == "lost":
soql += " AND IsWon=false"
raw = self._soql_query(soql)
return [self._map_opportunity(r) for r in raw["records"]]
def _map_opportunity(self, opp: dict) -> dict:
return {
"deal_id": opp["Id"],
"crm_id": opp["Id"],
"name": opp["Name"],
"account": opp.get("Account", {}).get("Name", ""),
"outcome": "won" if opp.get("IsWon") else "lost",
"deal_value_usd": int(opp.get("Amount", 0) or 0),
"close_date": opp.get("CloseDate", ""),
"notes_text": opp.get("Description", ""),
"sales_rep": "",
"therapy_area": "",
"modality": "",
"deal_stage_lost": None,
}
def fetch_deal(self, deal_id): ...
def fetch_contacts(self, deal_id=None): ...
def fetch_competitors(self): return []
def health_check(self): return {"connector": "SalesforceCRMConnector", "status": "healthy"}| Route | Description |
|---|---|
/ |
Dashboard: stats, win rate chart, top drivers, recent deals |
/deals |
Full deal table with won/lost filter |
/deal/<id> |
Deal detail: NLP themes, TF-IDF keywords, sentiment, competitor mentions |
/drivers |
Win/loss driver analysis with bar charts and theme win rate table |
/competitive |
Competitor landscape: win rates against named competitors, deal feed |
MIT License — see LICENSE.
Copyright (c) 2024 Tim Maguire.