A guided Design of Experiments (DOE) tool for bioprocess and engineering applications. The wizard walks users through experimental setup — from defining objectives and factors to statistical analysis and optimal factor settings — ensuring statistical principles are understood before results are calculated.
Experimental design is one of the highest-leverage activities in product development and process optimisation. A poorly designed experiment wastes resources, produces misleading results, and delays development timelines.
Problems this tool solves:
| Pain Point | How DOE Assistant Addresses It |
|---|---|
| Scientists run one-factor-at-a-time (OFAT) experiments | Wizard teaches multi-factor design before the matrix is built |
| Wrong design chosen for the objective | Automated recommendation engine based on objective and factor count |
| Statistics done post-hoc without checking assumptions | Guided setup enforces response/factor definition before analysis |
| Results interpreted without model adequacy checks | Dashboard surfaces R², Adj-R², Adequate Precision, and p-values prominently |
| Optimum never confirmed experimentally | Explicit warning on all results pages; confirmation run guidance built in |
Target users: Process development scientists, bioprocess engineers, quality teams, and anyone running structured experiments in biomanufacturing, chemistry, or engineering.
ROI drivers:
- Reduce experiments needed to characterise a process by 40–60% vs OFAT
- Detect interaction effects invisible to OFAT studies
- Generate defensible, statistically-grounded process characterisation data for regulatory filings
- Objective — Screening / Characterisation / Optimisation / Robustness
- Factors — Name, units, low/high levels; inline validation
- Responses — Name, units, goal (maximise/minimise/target)
- Design — Automated recommendation + power preview for all design types
- Review — Randomised run matrix with coded and actual values side-by-side
- Data Entry — Response values entered per run; model type selected
- OLS model fitting via NumPy
lstsq; supports main effects, 2-factor interaction (2FI), and full quadratic models - ANOVA table: SS, MS, F-value, p-value for model and residuals
- Model adequacy metrics: R², Adj-R², Predicted R² (PRESS), Adequate Precision
- Per-term statistics: coefficient, standard error, t-value, p-value, significance flag
- Process optimum finder: grid search over coded space + L-BFGS-B refinement; 95% prediction interval
| Design | Typical Use | Run Count (k=3) |
|---|---|---|
| Full Factorial (2^k) | Complete characterisation, ≤3 factors | 11 (incl. 3 center) |
| Fractional Factorial | Screening / characterisation, ≥4 factors | 11 (2^(4−1)) |
| Central Composite (CCD) | Optimisation with curvature | 17 (face-centred) |
| Box-Behnken | Optimisation, no extreme combinations | 15 |
| Plackett-Burman | Screening, many factors (up to 19) | 12 |
All wizard functionality is available programmatically:
POST /api/v1/studies create study
GET /api/v1/studies list studies
POST /api/v1/studies/<id>/factors save factors
POST /api/v1/studies/<id>/responses save responses
POST /api/v1/studies/<id>/design generate design matrix
POST /api/v1/studies/<id>/run_data save response data
POST /api/v1/studies/<id>/analyze run OLS + ANOVA
GET /api/v1/studies/<id>/analyses get analysis results
POST /api/v1/design/recommend get design recommendation
POST /api/v1/design/preview get run count preview
DOEOptimizationAssistant/
├── doe/
│ ├── design_builder.py # DesignBuilder: all 5 design types + coded/actual conversion
│ ├── statistics.py # DOEStatistics: OLS fit, ANOVA, find_optimum; FitResult dataclass
│ └── wizard.py # WizardStep enum, WizardValidator, per-step guidance text
├── doeassist/
│ ├── model.py # SQLite persistence: studies, factors, responses, designs, run_data, analyses
│ ├── seed.py # 2 pre-analyzed example studies
│ ├── app.py # Flask app: wizard routes + REST API
│ └── templates/
│ ├── base.html dark theme, wizard progress bar, guidance box
│ ├── dashboard.html stats tiles + studies table
│ ├── wizard_step1.html objective selection
│ ├── wizard_step2.html factor definition (dynamic add/remove rows)
│ ├── wizard_step3.html response definition
│ ├── wizard_step4.html design selection with run count previews
│ ├── wizard_step5.html run matrix review (coded + actual)
│ ├── wizard_step6.html data entry + model type selection
│ └── results.html full ANOVA + coefficients + optimum
└── tests/
├── test_design_builder.py 23 tests covering all design types + coding helpers
├── test_statistics.py 18 tests for OLS fitting and optimum finding
└── test_model.py 20 tests for SQLite layer + seed integration
Data flow:
User text → Wizard (6 steps) → SQLite
↓
DesignBuilder.build() → run matrix
↓
DOEStatistics.fit() → FitResult (ANOVA)
↓
find_optimum() → OptimumResult
↓
results.html → ANOVA table + optimum
Context: A cell and gene therapy team needs to maximise lentiviral transduction efficiency in T-cells.
Factors:
- Seed Density: 0.3–1.0 ×10⁶ cells/mL
- MOI (Multiplicity of Infection): 2–10
- Transduction Time: 4–24 hours
Responses: Viable Transduced Cell Density (VCD), Viability %
Design: Face-centred CCD (α=1), 17 runs
Statistical model: Full Quadratic
Expected outputs:
- Significant factors: MOI (+), Transduction Time (+), MOI² (−, diminishing returns)
- R² > 0.95, Adequate Precision > 4
- Optimum: MOI ≈ 8, Seed ≈ 0.5×10⁶, Time ≈ 18 h → predicted VCD ≈ 3.5×10⁶/mL
Context: A CHO cell culture team needs to identify which of 6 media components most affects peak viable cell density on day 7.
Factors: Glucose, Glutamine, Iron, Zinc, Copper, Vitamin B12
Response: Peak VCD (×10⁶/mL)
Design: 12-run Plackett-Burman + 3 center points
Statistical model: Main Effects only (screening design)
Expected outputs:
- Glucose and Glutamine identified as significant (large coefficients)
- Iron moderate effect; Zinc, Copper, B12 not significant
- R² > 0.90 for the main-effects model
# Install dependencies
pip install -r requirements.txt
# Run tests (61 tests, 0 failures)
python3 -m pytest tests/ -v
# Start the app (seeds 2 example studies on first run)
python3 run.py
# → Visit http://localhost:5090# Create a study
curl -s -X POST http://localhost:5090/api/v1/studies \
-H "Content-Type: application/json" \
-d '{"name":"My Study","objective":"optimization"}' | python3 -m json.tool
# Save factors
curl -s -X POST http://localhost:5090/api/v1/studies/<study_id>/factors \
-H "Content-Type: application/json" \
-d '[{"name":"Temp","units":"C","low_val":30,"high_val":37},
{"name":"pH","units":"","low_val":6.8,"high_val":7.4}]'
# Generate CCD design
curl -s -X POST http://localhost:5090/api/v1/studies/<study_id>/design \
-H "Content-Type: application/json" \
-d '{"design_type":"ccd","center_points":3}'Model adequacy thresholds (industry standard, e.g., Design-Expert):
- R² ≥ 0.80 for a good predictive model
- Adj-R² within 0.20 of R² (larger gap suggests overfitting)
- Adequate Precision ≥ 4 (required to use model for design space navigation)
- Model p-value < 0.05 for significance
Center points serve two purposes: (1) estimate pure experimental error (replication), (2) test for curvature. Three is the recommended minimum.
Confirmation runs: Always run at least 3 experiments at the predicted optimum settings before implementation. The predicted response with 95% CI is provided, but confirmation converts a statistical prediction into experimental evidence.
| Package | Version | Purpose |
|---|---|---|
| Flask | ≥ 3.0 | Web framework / REST API |
| numpy | ≥ 1.24 | OLS fitting via lstsq, model matrix construction |
| scipy | ≥ 1.10 | F- and t-distribution CDFs, optimisation via minimize |
| pytest | ≥ 8.0 | Test runner (61 tests, 0 failures) |
Python ≥ 3.9 required. SQLite is used for persistence (stdlib, no separate install needed).