Disclaimer: Flagship is experimental software. APIs, workflows, and behavior may change without notice, and it is not recommended for production-critical use without independent validation.
Flagship is an Agent Skill (skills/flagship/SKILL.md) that helps teams run product experiments faster and with less risk by using feature flags and cohort analysis.
Given a budget, an agent using this skill runs the experiment, analyzes cohort results, and keeps the option that performs best.
flowchart TD
A["Define experiment (objective, KPI, budget)"] --> B["Create initial change and open PR"]
B --> C["Human review and merge"]
C --> D["Runtime serves variants and records events in PostHog"]
D --> E["Daily analyze loop runs metrics + policy gates"]
E --> F{"Decision"}
F -->|"HOLD / SHIP"| D
F -->|"ITERATE"| G["Generate follow-up PR and merge"]
G --> D
Flagship helps you test product ideas without ad-hoc scripts. It combines cohort data from PostHog (via MCP), deterministic safety gates, and Codex-generated treatment updates. The result is a repeatable loop: measure, decide, propose, review, and merge. You keep control through budgets, guardrails, and normal PR review.
-
Create a GitHub Environment named
flagship. -
Add secrets:
OPENAI_API_KEY,POSTHOG_API_KEY,POSTHOG_MCP_URL. -
Create an experiment in Codex:
Use the flagship skill and create a new flagship experiment for onboarding activation. -
Commit the generated files:
.flagship/experiments/<experiment_id>.yaml.flagship/state/<experiment_id>.yaml
-
Generate or update
.github/workflows/flagship-loop.ymlfrom the template and run it once manually. -
Review the PR on
codex/flagship/<experiment_id>, then merge when ready.
- Use PostHog as source of truth for exposure assignment and experiment results.
- Use repo manifest/state as source of truth for budget, guardrails, and automation state.
- Reuse an existing feature-flag provider when detected in the repo.
- Default to PostHog only when no provider is clearly present.
- Force
HOLDwhen critical settings drift between manifest and PostHog.
.flagship/experiments/: experiment manifests.flagship/state/: runtime state per experiment.flagship/reports/: daily run reports.flagship/ledger/: decision history.github/workflows/: scheduled loop workflow
- Validate manifest and state.
- Read cohort metrics through PostHog MCP.
- Evaluate policy gates (sample size, guardrails, budget).
- Propose treatment iteration only when final action is
ITERATE. - Write report + ledger entry and open/update the PR.
- Create + initial A/B code PR:
docs/diagrams/01-create-and-initial-pr.md - Live runtime cohort routing:
docs/diagrams/02-live-runtime-cohorts.md - GitHub Actions analyze loop:
docs/diagrams/03-github-actions-analyze-loop.md - Iterate PR loop:
docs/diagrams/04-iterate-pr-loop.md
Use small manifests in README and keep full schema details in references.
# .flagship/experiments/onboarding-copy-v1.yaml
experiment_id: onboarding-copy-v1
primary_kpi: activation_24h_rate
max_budget_usd: 1000
feature_flag:
provider: posthog
key: onboarding.copy_variant
status: active# .flagship/state/onboarding-copy-v1.yaml
spent_usd_total: 0
budget_remaining_usd: 1000
last_decision: HOLD
open_pr_number: null- Keep
objective,primary_kpi, andmax_budget_usdimmutable after creation. - Stop immediately when budget remaining is
<= 0. - Hold when sample size is insufficient or any guardrail fails.
- Ship repository changes through PRs only.
- Track each run in reports and ledger for auditability.
- Skill workflow:
skills/flagship/SKILL.md - Experiment schema:
skills/flagship/references/experiment-schema.md - Policy gates:
skills/flagship/references/policy-gates.md - PostHog MCP queries:
skills/flagship/references/posthog-mcp-queries.md - Provider + hybrid model:
skills/flagship/references/provider-and-hybrid.md - Advanced local scripts:
skills/flagship/scripts/