EvalBot — AI-Powered Startup Evaluation Pipeline

EvalBot is a multi-agent AI system that evaluates early-stage startup submissions through a structured pipeline. Each agent specializes in one aspect of venture analysis (intake parsing, venture scoring, market analysis, product positioning, founder fit, and strategic recommendations), producing comprehensive evaluation reports with actionable insights.

What It Does

EvalBot analyzes startup submissions and provides:

Structured scoring across 10 venture-critical dimensions
Verdict classification (Top VC Candidate, Promising, Small Business, Feature Not Company, etc.)
SWOT analysis with market, product, and founder assessments
Strategic recommendations including pivot options and 30/90-day action plans
Batch ranking when evaluating multiple startups together
Professional reports in JSON and Markdown formats
Web UI for browsing results and running batches

Architecture

Multi-Agent Pipeline

EvalBot uses 7 specialized agents that run sequentially:

Intake Parser (Agent 1) — Converts raw submissions into standardized JSON structure
Venture Analyst (Agent 2) — Core scoring: evaluates 10 dimensions (1-10 scale), produces verdict
Market & Competition Analyst (Agent 3) — Market dynamics, TAM/SAM/SOM, competitive landscape
Product & Positioning Analyst (Agent 4) — Feature vs company assessment, AI wrapper risk
Founder Fit Analyst (Agent 5) — Team assessment, execution confidence
Recommendation Agent (Agent 6) — Strategic synthesis with pivot options and action plans
Ranking Committee (Agent 7) — Comparative ranking across cohort (batch mode only)

Key Features

PDF/DOCX Support: Reads pitch decks, documents, and text submissions directly
Feedback Loop: Agents can request re-runs from earlier agents when detecting gaps
Flexible LLM Backend: Supports Anthropic, OpenAI, MiniMax, Gemini, Groq, and Ollama models
Automatic Retry & Fallback: Handles API failures gracefully with configurable fallback models
Database Tracking: SQLite database stores all outputs, iterations, and feedback events
Web UI: Reflex-based dashboard for browsing batches, startup reports, and running new evaluations

Web Interface

A full-featured web UI is available at http://167.99.43.130 (internal access only).

Pages

Route	Description
`/`	Dashboard — all batches, stats, top startups
`/batch/[batch_id]`	Batch leaderboard — scores, verdict distribution, shortlist
`/batch/[batch_id]/[startup_name]`	Startup detail — radar chart, SWOT, analyst tabs
`/run`	Run Batch — upload submissions, monitor live progress
`/roadmap`	Feature roadmap

Running the Frontend Locally

source .venv313/bin/activate

# First time only — downloads Node/React dependencies
reflex init

# Start the dev server
reflex run
# → Open http://localhost:3000

For production (managed by systemd on the server):

reflex run --env prod --loglevel warning

Quick Start

Prerequisites

Python 3.13 or earlier (CrewAI is incompatible with Python 3.14+)
API key for at least one provider (Anthropic, OpenAI, or MiniMax recommended)
Node.js (required by Reflex for the web UI — installed automatically by reflex init)

Installation

# Clone the repository
git clone <your-repo-url>
cd EvalBot

# Create virtual environment (Python 3.13 or earlier)
python3.13 -m venv .venv313
source .venv313/bin/activate  # or .venv313\Scripts\activate on Windows

# Install dependencies
pip install -r requirements.txt

# Set up API keys
cp .env.example .env
# Edit .env and add your API key(s)

# Initialize Reflex (first time only)
reflex init

Environment Variables

Create a .env file in the project root:

# At least one of these is required:
ANTHROPIC_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
MINIMAX_API_KEY=your_key_here

# Optional providers:
GEMINI_API_KEY=your_key_here
GROQ_API_KEY=your_key_here

Usage

Process a Single Startup

source .venv313/bin/activate

# From a text file
python main.py single path/to/submission.txt

# From a PDF pitch deck
python main.py single path/to/deck.pdf

# From a Word document
python main.py single path/to/pitch.docx

Process a Batch of Startups

# Process all startups in a directory and rank them
python main.py batch path/to/startups_folder

# Long-running batches: pipe through tee to capture output
python main.py batch Startups/ 2>&1 | tee output.log

Batch directory structure:

Startups/
├── Company1/
│   ├── pitch.pdf
│   └── notes.md
├── Company2/
│   └── submission.txt
└── Company3/
    └── deck.docx

Supported file types: .pdf, .docx, .txt, .md

Via the Web UI

Go to /run
Enter a startup name and upload its files
Repeat for each startup in the batch
Click Run Batch — progress streams live in the terminal monitor
When complete, click View Results to open the batch leaderboard

Output Structure

Results are saved to output/Batch/batch_N/:

output/
└── Batch/
    └── batch_1/
        ├── Startup1/
        │   ├── Startup1.json          # Full pipeline results (all agents)
        │   └── Startup1.md            # Human-readable report
        ├── Startup2/
        │   ├── Startup2.json
        │   └── Startup2.md
        ├── ranking.json               # Agent 7 comparative ranking + shortlist
        └── batch_summary.md           # Model usage, costs, retry stats

Each startup's .json file contains:

{
  "agent1": { /* structured intake data */ },
  "agent2": { /* scores, verdict, SWOT */ },
  "agent3": { /* market analysis */ },
  "agent4": { /* product positioning */ },
  "agent5": { /* founder fit */ },
  "agent6": { /* recommendations */ },
  "_tags": {
    "batch_id": "batch_1",
    "startup_name": "MyStartup",
    "total_iterations": 6,
    "processing_time_seconds": 142.5
  },
  "_usage": { /* token usage and costs per agent */ }
}

Configuration

Model Assignment

Edit src/config.py to customize which LLM models are used:

DEFAULT_MODEL = "anthropic/claude-haiku-4-5"

AGENT_MODELS: dict[int, str | None] = {
    1: "gpt-4.1-nano",                      # Cheap extraction
    2: "anthropic/claude-opus-4-6",         # Best reasoning
    3: "minimax/MiniMax-M2.7",              # Good value
    4: "minimax/MiniMax-M2.7",
    5: "minimax/MiniMax-M2.7",
    6: "anthropic/claude-opus-4-6",         # Best strategic synthesis
    7: "minimax/MiniMax-M2.7",
}

See agent_models.md for detailed model recommendations and cost profiles.

Supported Providers

Provider	Model Examples	Cost (per 1M tokens)
Anthropic	`anthropic/claude-opus-4-6`	$15 / $75 (in/out)
	`anthropic/claude-sonnet-4-6`	$3 / $15
	`anthropic/claude-haiku-4-5`	$1 / $5
OpenAI	`gpt-4.1`, `gpt-4o`, `o3-mini`	$2-8 / $8-44
	`gpt-4.1-mini`, `gpt-4o-mini`	$0.15-0.40 / $0.60-1.60
MiniMax	`minimax/MiniMax-M2.7`	$0.30 / $1.20
Gemini	`gemini/gemini-2.0-flash`	Free tier available
Groq	`groq/llama-3.3-70b-versatile`	Free tier available
Ollama	`ollama/qwen3:4b`, `ollama/llama3.2`	Free (local)

Retry & Fallback

# src/config.py
RETRY_ATTEMPTS = 3
RETRY_BASE_DELAY = 2          # Exponential backoff: 2s, 4s, 8s
FALLBACK_MODEL = "anthropic/claude-haiku-4-5"
RECOVERY_CHECK_INTERVAL = 3   # Successes before attempting recovery to primary
RECOVERY_COOLDOWN = 60        # Seconds before retrying primary

Cost Estimates

Single startup evaluation (Agents 1-6):

Config	Cost
Premium (Opus on 2 & 6, Sonnet on 3-5)	~$0.77
Balanced (Sonnet on 2 & 6, MiniMax on 3-5)	~$0.13
Budget (MiniMax across all)	~$0.03

Batch of 10 startups + ranking: 10× single cost + ~$0.05 for Agent 7.

See agent_models.md for detailed cost breakdowns.

Project Structure

EvalBot/
├── agents/
│   ├── Agent1/prompt.md  # Intake Parser system prompt
│   ├── Agent2/prompt.md  # Venture Analyst system prompt
│   ├── Agent3/prompt.md  # Market & Competition Analyst
│   ├── Agent4/prompt.md  # Product & Positioning Analyst
│   ├── Agent5/prompt.md  # Founder Fit Analyst
│   ├── Agent6/prompt.md  # Recommendation Agent
│   └── Agent7/prompt.md  # Ranking Committee Agent
├── src/
│   ├── agents.py         # CrewAI agent factory
│   ├── config.py         # Model assignment, retry settings, constants
│   ├── db.py             # SQLite schema and all persistence operations
│   ├── docs.py           # Document loading (PDF/DOCX/text)
│   ├── models.py         # Pydantic output schemas for all 7 agents
│   ├── pipeline.py       # Sequential agent orchestration + feedback loop
│   ├── retry_utils.py    # Exponential backoff, fallback, recovery logic
│   └── tasks.py          # CrewAI task creation with structured outputs
├── frontend/
│   ├── frontend.py       # Reflex app entry point + page registration
│   ├── state/
│   │   ├── dashboard.py  # DashboardState — batch list, stats, top startups
│   │   ├── batch.py      # BatchState — leaderboard, charts, shortlist
│   │   ├── startup.py    # StartupState — agent outputs, radar, SWOT, tabs
│   │   └── run.py        # RunState — upload, subprocess, live progress stream
│   ├── pages/
│   │   ├── dashboard.py  # / route
│   │   ├── batch.py      # /batch/[batch_id] route
│   │   ├── startup.py    # /batch/[batch_id]/[startup_name] route
│   │   ├── run.py        # /run route
│   │   └── roadmap.py    # /roadmap route
│   └── components/
│       ├── navbar.py     # Top nav + page layout wrapper
│       ├── badges.py     # Verdict badge, score bar, status pill
│       └── charts.py     # Radar, bar, and pie chart wrappers (rx.recharts)
├── tests/
│   ├── conftest.py       # crewai/pdfplumber mocks
│   ├── test_agents.py
│   ├── test_config.py
│   ├── test_db.py
│   ├── test_docs.py
│   ├── test_frontend.py  # State helper unit tests
│   ├── test_models.py
│   ├── test_pages.py     # Page compilation + state structure tests
│   ├── test_retry_utils.py
│   └── test_tasks.py
├── deploy/
│   ├── evalbot-web.service  # systemd service file for production
│   └── nginx-evalbot.conf   # nginx reverse-proxy config
├── main.py               # CLI entry point (single / batch modes)
├── deploy.sh             # Server deployment script
├── rxconfig.py           # Reflex configuration
├── evalbot.db            # SQLite database (gitignored)
├── evalbot_run.json      # Run state persistence (survives server restarts)
├── requirements.txt      # Pinned dependencies
└── output/               # Generated results (gitignored)

Development

Running Tests

source .venv313/bin/activate
pytest                    # 253 tests, ~2s, no API keys needed
pytest tests/test_db.py   # single file
pytest -v -k "batch"      # filter by name

Tests cover: agent creation, config resolution, all 7 Pydantic schemas, SQLite operations, document loading, task building, retry logic, frontend state helpers, and page compilation.

Adding/Modifying Agents

Create/update agents/Agent{N}/prompt.md
Define output model in src/models.py (class Agent{N}Output)
Add metadata to _AGENT_META in src/agents.py
Add expected output to _EXPECTED_OUTPUT in src/tasks.py
Update AGENT_OUTPUT_MODELS in src/models.py

Updating Dependencies

source .venv313/bin/activate
pip install <new-package>
pip freeze > requirements.txt

Deployment (DigitalOcean — 167.99.43.130)

# SSH alias (configured in ~/.ssh/config)
ssh evalbot   # connects as root@167.99.43.130

# Pull updates, install deps, run tests
ssh evalbot "bash /opt/evalbot/deploy.sh"

# Service management
ssh evalbot "systemctl status evalbot-web"
ssh evalbot "systemctl restart evalbot-web"

The web service runs as evalbot-web (systemd), started with:

reflex run --env prod --loglevel warning

Note: Reflex requires Node.js on the server. deploy.sh handles installation on first run.

Advanced Features

Feedback Loop System

Agents 2-6 can request re-runs from earlier agents when detecting insufficient information:

Logs the feedback request to the database
Invalidates downstream outputs
Re-runs from the requested agent with feedback context
Enforces a maximum iteration limit (default: 18)

-- Track feedback events
sqlite3 evalbot.db "SELECT * FROM feedback_log WHERE batch_id='batch_1';"

Run State Persistence

evalbot_run.json persists the current run's status so the UI can detect interrupted runs after a server restart and prompt the user to start a new batch.

Database Schema

Tables: batches, startups, agent_outputs (with iteration tracking), feedback_log, retry_log, hall_of_fame.

Troubleshooting

Python 3.14+ Detected

EvalBot automatically re-launches with .venv313/bin/python if Python 3.14+ is detected. Create the environment manually if needed:

python3.13 -m venv .venv313
source .venv313/bin/activate
pip install -r requirements.txt

API Errors

The system retries automatically and falls back to FALLBACK_MODEL. Check retry stats in batch_summary.md or:

sqlite3 evalbot.db "SELECT * FROM retry_log ORDER BY timestamp DESC LIMIT 20;"

Incomplete Batch

Resume from failed startups:

sqlite3 evalbot.db "SELECT startup_name FROM startups WHERE batch_id='batch_1' AND pipeline_status != 'completed';"
python main.py single Startups/FailedStartupName/

Reflex / Frontend Issues

# Re-initialize if .web/ is corrupted
rm -rf .web/
reflex init
reflex run

Documentation

agent_models.md — Model recommendations and cost analysis
RETRY_LOGGING.md — Retry and fallback system details

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.claude		.claude
.github		.github
agents		agents
deploy		deploy
frontend		frontend
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
OVERVIEW.md		OVERVIEW.md
README.md		README.md
RETRY_LOGGING.md		RETRY_LOGGING.md
RUNNING.md		RUNNING.md
agent_models.md		agent_models.md
deploy.sh		deploy.sh
evalbot_run.json		evalbot_run.json
main.py		main.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
rxconfig.py		rxconfig.py

Folders and files

Latest commit

History

Repository files navigation

EvalBot — AI-Powered Startup Evaluation Pipeline

What It Does

Architecture

Multi-Agent Pipeline

Key Features

Web Interface

Pages

Running the Frontend Locally

Quick Start

Prerequisites

Installation

Environment Variables

Usage

Process a Single Startup

Process a Batch of Startups

Via the Web UI

Output Structure

Configuration

Model Assignment

Supported Providers

Retry & Fallback

Cost Estimates

Project Structure

Development

Running Tests

Adding/Modifying Agents

Updating Dependencies

Deployment (DigitalOcean — 167.99.43.130)

Advanced Features

Feedback Loop System

Run State Persistence

Database Schema

Troubleshooting

Python 3.14+ Detected

API Errors

Incomplete Batch

Reflex / Frontend Issues

Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages