DOE Optimization Assistant

A guided Design of Experiments (DOE) tool for bioprocess and engineering applications. The wizard walks users through experimental setup — from defining objectives and factors to statistical analysis and optimal factor settings — ensuring statistical principles are understood before results are calculated.

Business Case

Experimental design is one of the highest-leverage activities in product development and process optimisation. A poorly designed experiment wastes resources, produces misleading results, and delays development timelines.

Problems this tool solves:

Pain Point	How DOE Assistant Addresses It
Scientists run one-factor-at-a-time (OFAT) experiments	Wizard teaches multi-factor design before the matrix is built
Wrong design chosen for the objective	Automated recommendation engine based on objective and factor count
Statistics done post-hoc without checking assumptions	Guided setup enforces response/factor definition before analysis
Results interpreted without model adequacy checks	Dashboard surfaces R², Adj-R², Adequate Precision, and p-values prominently
Optimum never confirmed experimentally	Explicit warning on all results pages; confirmation run guidance built in

Target users: Process development scientists, bioprocess engineers, quality teams, and anyone running structured experiments in biomanufacturing, chemistry, or engineering.

ROI drivers:

Reduce experiments needed to characterise a process by 40–60% vs OFAT
Detect interaction effects invisible to OFAT studies
Generate defensible, statistically-grounded process characterisation data for regulatory filings

Features

6-Step Guided Wizard

Objective — Screening / Characterisation / Optimisation / Robustness
Factors — Name, units, low/high levels; inline validation
Responses — Name, units, goal (maximise/minimise/target)
Design — Automated recommendation + power preview for all design types
Review — Randomised run matrix with coded and actual values side-by-side
Data Entry — Response values entered per run; model type selected

Statistical Analysis

OLS model fitting via NumPy lstsq; supports main effects, 2-factor interaction (2FI), and full quadratic models
ANOVA table: SS, MS, F-value, p-value for model and residuals
Model adequacy metrics: R², Adj-R², Predicted R² (PRESS), Adequate Precision
Per-term statistics: coefficient, standard error, t-value, p-value, significance flag
Process optimum finder: grid search over coded space + L-BFGS-B refinement; 95% prediction interval

Design Types

Design	Typical Use	Run Count (k=3)
Full Factorial (2^k)	Complete characterisation, ≤3 factors	11 (incl. 3 center)
Fractional Factorial	Screening / characterisation, ≥4 factors	11 (2^(4−1))
Central Composite (CCD)	Optimisation with curvature	17 (face-centred)
Box-Behnken	Optimisation, no extreme combinations	15
Plackett-Burman	Screening, many factors (up to 19)	12

REST API

All wizard functionality is available programmatically:

POST /api/v1/studies                        create study
GET  /api/v1/studies                        list studies
POST /api/v1/studies/<id>/factors           save factors
POST /api/v1/studies/<id>/responses         save responses
POST /api/v1/studies/<id>/design            generate design matrix
POST /api/v1/studies/<id>/run_data          save response data
POST /api/v1/studies/<id>/analyze           run OLS + ANOVA
GET  /api/v1/studies/<id>/analyses          get analysis results
POST /api/v1/design/recommend               get design recommendation
POST /api/v1/design/preview                 get run count preview

Architecture

DOEOptimizationAssistant/
├── doe/
│   ├── design_builder.py   # DesignBuilder: all 5 design types + coded/actual conversion
│   ├── statistics.py       # DOEStatistics: OLS fit, ANOVA, find_optimum; FitResult dataclass
│   └── wizard.py           # WizardStep enum, WizardValidator, per-step guidance text
├── doeassist/
│   ├── model.py            # SQLite persistence: studies, factors, responses, designs, run_data, analyses
│   ├── seed.py             # 2 pre-analyzed example studies
│   ├── app.py              # Flask app: wizard routes + REST API
│   └── templates/
│       ├── base.html           dark theme, wizard progress bar, guidance box
│       ├── dashboard.html      stats tiles + studies table
│       ├── wizard_step1.html   objective selection
│       ├── wizard_step2.html   factor definition (dynamic add/remove rows)
│       ├── wizard_step3.html   response definition
│       ├── wizard_step4.html   design selection with run count previews
│       ├── wizard_step5.html   run matrix review (coded + actual)
│       ├── wizard_step6.html   data entry + model type selection
│       └── results.html        full ANOVA + coefficients + optimum
└── tests/
    ├── test_design_builder.py  23 tests covering all design types + coding helpers
    ├── test_statistics.py      18 tests for OLS fitting and optimum finding
    └── test_model.py           20 tests for SQLite layer + seed integration

Data flow:

User text → Wizard (6 steps) → SQLite
                                 ↓
                    DesignBuilder.build() → run matrix
                                 ↓
                    DOEStatistics.fit()  → FitResult (ANOVA)
                                 ↓
                    find_optimum()       → OptimumResult
                                 ↓
                    results.html         → ANOVA table + optimum

Test Cases

Test Case 1: Viral Vector Transduction Optimization (CCD, 3 factors)

Context: A cell and gene therapy team needs to maximise lentiviral transduction efficiency in T-cells.

Factors:

Seed Density: 0.3–1.0 ×10⁶ cells/mL
MOI (Multiplicity of Infection): 2–10
Transduction Time: 4–24 hours

Responses: Viable Transduced Cell Density (VCD), Viability %

Design: Face-centred CCD (α=1), 17 runs

Statistical model: Full Quadratic

Expected outputs:

Significant factors: MOI (+), Transduction Time (+), MOI² (−, diminishing returns)
R² > 0.95, Adequate Precision > 4
Optimum: MOI ≈ 8, Seed ≈ 0.5×10⁶, Time ≈ 18 h → predicted VCD ≈ 3.5×10⁶/mL

Test Case 2: Media Screening Study (Plackett-Burman, 6 factors)

Context: A CHO cell culture team needs to identify which of 6 media components most affects peak viable cell density on day 7.

Factors: Glucose, Glutamine, Iron, Zinc, Copper, Vitamin B12

Response: Peak VCD (×10⁶/mL)

Design: 12-run Plackett-Burman + 3 center points

Statistical model: Main Effects only (screening design)

Expected outputs:

Glucose and Glutamine identified as significant (large coefficients)
Iron moderate effect; Zinc, Copper, B12 not significant
R² > 0.90 for the main-effects model

Installation & Running

# Install dependencies
pip install -r requirements.txt

# Run tests (61 tests, 0 failures)
python3 -m pytest tests/ -v

# Start the app (seeds 2 example studies on first run)
python3 run.py
# → Visit http://localhost:5090

Quick API example

# Create a study
curl -s -X POST http://localhost:5090/api/v1/studies \
  -H "Content-Type: application/json" \
  -d '{"name":"My Study","objective":"optimization"}' | python3 -m json.tool

# Save factors
curl -s -X POST http://localhost:5090/api/v1/studies/<study_id>/factors \
  -H "Content-Type: application/json" \
  -d '[{"name":"Temp","units":"C","low_val":30,"high_val":37},
       {"name":"pH","units":"","low_val":6.8,"high_val":7.4}]'

# Generate CCD design
curl -s -X POST http://localhost:5090/api/v1/studies/<study_id>/design \
  -H "Content-Type: application/json" \
  -d '{"design_type":"ccd","center_points":3}'

Statistical Notes

Model adequacy thresholds (industry standard, e.g., Design-Expert):

R² ≥ 0.80 for a good predictive model
Adj-R² within 0.20 of R² (larger gap suggests overfitting)
Adequate Precision ≥ 4 (required to use model for design space navigation)
Model p-value < 0.05 for significance

Center points serve two purposes: (1) estimate pure experimental error (replication), (2) test for curvature. Three is the recommended minimum.

Confirmation runs: Always run at least 3 experiments at the predicted optimum settings before implementation. The predicted response with 95% CI is provided, but confirmation converts a statistical prediction into experimental evidence.

Dependencies

Package	Version	Purpose
Flask	≥ 3.0	Web framework / REST API
numpy	≥ 1.24	OLS fitting via `lstsq`, model matrix construction
scipy	≥ 1.10	F- and t-distribution CDFs, optimisation via `minimize`
pytest	≥ 8.0	Test runner (61 tests, 0 failures)

Python ≥ 3.9 required. SQLite is used for persistence (stdlib, no separate install needed).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DOE Optimization Assistant

Business Case

Features

6-Step Guided Wizard

Statistical Analysis

Design Types

REST API

Architecture

Test Cases

Test Case 1: Viral Vector Transduction Optimization (CCD, 3 factors)

Test Case 2: Media Screening Study (Plackett-Burman, 6 factors)

Installation & Running

Quick API example

Statistical Notes

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
doe		doe
doeassist		doeassist
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Folders and files

Latest commit

History

Repository files navigation

DOE Optimization Assistant

Business Case

Features

6-Step Guided Wizard

Statistical Analysis

Design Types

REST API

Architecture

Test Cases

Test Case 1: Viral Vector Transduction Optimization (CCD, 3 factors)

Test Case 2: Media Screening Study (Plackett-Burman, 6 factors)

Installation & Running

Quick API example

Statistical Notes

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages