Skip to content

timjm25/DOEOptimizationAssistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOE Optimization Assistant

A guided Design of Experiments (DOE) tool for bioprocess and engineering applications. The wizard walks users through experimental setup — from defining objectives and factors to statistical analysis and optimal factor settings — ensuring statistical principles are understood before results are calculated.


Business Case

Experimental design is one of the highest-leverage activities in product development and process optimisation. A poorly designed experiment wastes resources, produces misleading results, and delays development timelines.

Problems this tool solves:

Pain Point How DOE Assistant Addresses It
Scientists run one-factor-at-a-time (OFAT) experiments Wizard teaches multi-factor design before the matrix is built
Wrong design chosen for the objective Automated recommendation engine based on objective and factor count
Statistics done post-hoc without checking assumptions Guided setup enforces response/factor definition before analysis
Results interpreted without model adequacy checks Dashboard surfaces R², Adj-R², Adequate Precision, and p-values prominently
Optimum never confirmed experimentally Explicit warning on all results pages; confirmation run guidance built in

Target users: Process development scientists, bioprocess engineers, quality teams, and anyone running structured experiments in biomanufacturing, chemistry, or engineering.

ROI drivers:

  • Reduce experiments needed to characterise a process by 40–60% vs OFAT
  • Detect interaction effects invisible to OFAT studies
  • Generate defensible, statistically-grounded process characterisation data for regulatory filings

Features

6-Step Guided Wizard

  1. Objective — Screening / Characterisation / Optimisation / Robustness
  2. Factors — Name, units, low/high levels; inline validation
  3. Responses — Name, units, goal (maximise/minimise/target)
  4. Design — Automated recommendation + power preview for all design types
  5. Review — Randomised run matrix with coded and actual values side-by-side
  6. Data Entry — Response values entered per run; model type selected

Statistical Analysis

  • OLS model fitting via NumPy lstsq; supports main effects, 2-factor interaction (2FI), and full quadratic models
  • ANOVA table: SS, MS, F-value, p-value for model and residuals
  • Model adequacy metrics: R², Adj-R², Predicted R² (PRESS), Adequate Precision
  • Per-term statistics: coefficient, standard error, t-value, p-value, significance flag
  • Process optimum finder: grid search over coded space + L-BFGS-B refinement; 95% prediction interval

Design Types

Design Typical Use Run Count (k=3)
Full Factorial (2^k) Complete characterisation, ≤3 factors 11 (incl. 3 center)
Fractional Factorial Screening / characterisation, ≥4 factors 11 (2^(4−1))
Central Composite (CCD) Optimisation with curvature 17 (face-centred)
Box-Behnken Optimisation, no extreme combinations 15
Plackett-Burman Screening, many factors (up to 19) 12

REST API

All wizard functionality is available programmatically:

POST /api/v1/studies                        create study
GET  /api/v1/studies                        list studies
POST /api/v1/studies/<id>/factors           save factors
POST /api/v1/studies/<id>/responses         save responses
POST /api/v1/studies/<id>/design            generate design matrix
POST /api/v1/studies/<id>/run_data          save response data
POST /api/v1/studies/<id>/analyze           run OLS + ANOVA
GET  /api/v1/studies/<id>/analyses          get analysis results
POST /api/v1/design/recommend               get design recommendation
POST /api/v1/design/preview                 get run count preview

Architecture

DOEOptimizationAssistant/
├── doe/
│   ├── design_builder.py   # DesignBuilder: all 5 design types + coded/actual conversion
│   ├── statistics.py       # DOEStatistics: OLS fit, ANOVA, find_optimum; FitResult dataclass
│   └── wizard.py           # WizardStep enum, WizardValidator, per-step guidance text
├── doeassist/
│   ├── model.py            # SQLite persistence: studies, factors, responses, designs, run_data, analyses
│   ├── seed.py             # 2 pre-analyzed example studies
│   ├── app.py              # Flask app: wizard routes + REST API
│   └── templates/
│       ├── base.html           dark theme, wizard progress bar, guidance box
│       ├── dashboard.html      stats tiles + studies table
│       ├── wizard_step1.html   objective selection
│       ├── wizard_step2.html   factor definition (dynamic add/remove rows)
│       ├── wizard_step3.html   response definition
│       ├── wizard_step4.html   design selection with run count previews
│       ├── wizard_step5.html   run matrix review (coded + actual)
│       ├── wizard_step6.html   data entry + model type selection
│       └── results.html        full ANOVA + coefficients + optimum
└── tests/
    ├── test_design_builder.py  23 tests covering all design types + coding helpers
    ├── test_statistics.py      18 tests for OLS fitting and optimum finding
    └── test_model.py           20 tests for SQLite layer + seed integration

Data flow:

User text → Wizard (6 steps) → SQLite
                                 ↓
                    DesignBuilder.build() → run matrix
                                 ↓
                    DOEStatistics.fit()  → FitResult (ANOVA)
                                 ↓
                    find_optimum()       → OptimumResult
                                 ↓
                    results.html         → ANOVA table + optimum

Test Cases

Test Case 1: Viral Vector Transduction Optimization (CCD, 3 factors)

Context: A cell and gene therapy team needs to maximise lentiviral transduction efficiency in T-cells.

Factors:

  • Seed Density: 0.3–1.0 ×10⁶ cells/mL
  • MOI (Multiplicity of Infection): 2–10
  • Transduction Time: 4–24 hours

Responses: Viable Transduced Cell Density (VCD), Viability %

Design: Face-centred CCD (α=1), 17 runs

Statistical model: Full Quadratic

Expected outputs:

  • Significant factors: MOI (+), Transduction Time (+), MOI² (−, diminishing returns)
  • R² > 0.95, Adequate Precision > 4
  • Optimum: MOI ≈ 8, Seed ≈ 0.5×10⁶, Time ≈ 18 h → predicted VCD ≈ 3.5×10⁶/mL

Test Case 2: Media Screening Study (Plackett-Burman, 6 factors)

Context: A CHO cell culture team needs to identify which of 6 media components most affects peak viable cell density on day 7.

Factors: Glucose, Glutamine, Iron, Zinc, Copper, Vitamin B12

Response: Peak VCD (×10⁶/mL)

Design: 12-run Plackett-Burman + 3 center points

Statistical model: Main Effects only (screening design)

Expected outputs:

  • Glucose and Glutamine identified as significant (large coefficients)
  • Iron moderate effect; Zinc, Copper, B12 not significant
  • R² > 0.90 for the main-effects model

Installation & Running

# Install dependencies
pip install -r requirements.txt

# Run tests (61 tests, 0 failures)
python3 -m pytest tests/ -v

# Start the app (seeds 2 example studies on first run)
python3 run.py
# → Visit http://localhost:5090

Quick API example

# Create a study
curl -s -X POST http://localhost:5090/api/v1/studies \
  -H "Content-Type: application/json" \
  -d '{"name":"My Study","objective":"optimization"}' | python3 -m json.tool

# Save factors
curl -s -X POST http://localhost:5090/api/v1/studies/<study_id>/factors \
  -H "Content-Type: application/json" \
  -d '[{"name":"Temp","units":"C","low_val":30,"high_val":37},
       {"name":"pH","units":"","low_val":6.8,"high_val":7.4}]'

# Generate CCD design
curl -s -X POST http://localhost:5090/api/v1/studies/<study_id>/design \
  -H "Content-Type: application/json" \
  -d '{"design_type":"ccd","center_points":3}'

Statistical Notes

Model adequacy thresholds (industry standard, e.g., Design-Expert):

  • R² ≥ 0.80 for a good predictive model
  • Adj-R² within 0.20 of R² (larger gap suggests overfitting)
  • Adequate Precision ≥ 4 (required to use model for design space navigation)
  • Model p-value < 0.05 for significance

Center points serve two purposes: (1) estimate pure experimental error (replication), (2) test for curvature. Three is the recommended minimum.

Confirmation runs: Always run at least 3 experiments at the predicted optimum settings before implementation. The predicted response with 95% CI is provided, but confirmation converts a statistical prediction into experimental evidence.


Dependencies

Package Version Purpose
Flask ≥ 3.0 Web framework / REST API
numpy ≥ 1.24 OLS fitting via lstsq, model matrix construction
scipy ≥ 1.10 F- and t-distribution CDFs, optimisation via minimize
pytest ≥ 8.0 Test runner (61 tests, 0 failures)

Python ≥ 3.9 required. SQLite is used for persistence (stdlib, no separate install needed).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors