Skip to content

velvetmonkey/flywheel-concept

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flywheel

Flywheel Concept

A falsifiable study of whether cross-model activations reveal structured concept geometry — or shared training-corpus artefact.

Status: v0.1 working draft post-Trial-2 (2026-05-10) — RESHAPE-NARROW active; repo freeze pending council pass. Sibling flywheel-geometry Method 6 falsifier resolved FAIL on 2026-05-10 (full audit: trial2-postmortem.md). After two same-day turns on Concept's gate-spec, the active resolution is RESHAPE-NARROW (turn-2; supersedes turn-1 stay-the-course same-day). The active gate-spec is in the vault at tech/flywheel/flywheel-concept-falsification-gate.md (Resolution v2 section).

The narrowed bridge claim, Bet 0 BM25 safety floor, three-control decision rule (B1 + B2 + C1 lexical/corpus-frequency, all simultaneously), and 6 structured concept families panel are not yet reflected in the body of this README or in docs/pre-registration.md — those updates ship at the v0.1-prereg-frozen freeze tag, after a council pass on the README and pre-registration doc rewrites. Until then, the body below describes the original v0.1-09 spec; the active spec is in the vault gate-spec linked above.

A telescope's value is judged by what it reveals about stars, not by what it tells you about itself. Neural networks reveal traces — activations, behaviours, distances. The science is measuring which traces faithfully reveal latent concept structure, and which are artefacts of the instrument.


What this is

Flywheel Concept is a research programme. Not a product, not a benchmark, not a model evaluator, not a cosmology. Recent interpretability work — Goodfire AI's manifold-steering programme (Lubana et al., 2025–2026), the Platonic Representation Hypothesis (Huh et al., 2024), Hindupur–Lubana–Fel–Ba's Projecting Assumptions (NeurIPS 2025) — shows neural networks across architectures encode meaning on curved manifolds and converge on shared latent geometry without coordinated training. The question Concept addresses is the next one: when is that convergence revealing real structure, and when is it shared training-corpus artefact?

The work is vault-first and pre-[[pilot]]. When the pilot ships, numbers will be published regardless of outcome.


How we'll know this works

Concept commits, before any experiment runs, to a single bridge claim:

Pre-registered cross-model latent alignment under structural transforms predicts task transfer with incremental coefficient of determination ΔR² ≥ 0.10, bootstrap 95% CI excluding 0, holding on ≥2 of 3 task domains AND on the code-heavy holdout model.

Full metric definitions, bootstrap procedure, and unit-of-analysis discussion live in benchmark/metrics.md. [[protocol]] freeze rules in docs/pre-registration.md.

Tasks (three relational-structure domains, all literature-grounded — full scope in benchmark/tasks.md):

  • BATS semantic subsets (Gladkova et al. 2016) — relational linguistic structure. Lexicographic + encyclopedic subsets only; inflectional and derivational morphology subsets excluded as token-level rather than concept-level.
  • WordNet taxonomic distance — hierarchical concept structure.
  • Color-circumplex ordering — perceptual concept geometry, the canonical curved-manifold case.

Models (five open-weight models, ≥2 model families, ≥2 training distributions — full table in benchmark/models.md):

  • Llama 3.1 8B (Meta, RedPajama-style corpus)
  • Gemma 2 9B (Google, proprietary public-corpus mix)
  • Pythia 12B (EleutherAI, the Pile)
  • Qwen 2.5 Coder 7B (Alibaba, code-heavy corpus) — cross-distribution stress test
  • Mistral 7B (Mistral, mixed corpus)

The Qwen-Coder run is a cross-distribution stress holdout, not an independently decisive falsifier. A code-heavy model still has substantial natural-language pretraining (documentation, comments, package names, tutorials), so alignment success does not prove cross-corpus concept geometry, and alignment failure does not prove the effect is corpus artefact (could be tokenization, scale, post-training, layer choice, or task mismatch). Outcome is diagnostic and feeds the interpretation; it is not on its own the gate.

Baselines (two, both literature-grounded):

  • B1 — Raw per-model probe: per-model linear/MLP probe on residual stream. The artefact-y baseline Concept claims to beat.
  • B2 — Cross-model linear probe transfer: train probe on model A, evaluate on model B's activations after a best-fit linear map. Literature-grounded (Conneau et al. cross-lingual transfer; Bansal et al. model stitching).

Decision rule (precise version in benchmark/metrics.md): ΔR² ≥ 0.10 over both B1 and B2, bootstrap 95% CI (BCa, 1000 resamples, clustered by concept-item / relation-group / model-pair) excluding 0, on ≥2 of 3 domains AND on the Qwen-Coder cross-distribution holdout. Any post-hoc parameter selection or task-set change after the protocol-freeze tag is automatic refutation. Pre-registration in Flywheel Ideas lands when the protocol freezes.

If the bridge claim fails to beat both baselines on the agreed criteria, the negative result is the launch artifact.


What this is not

  • Not a leaderboard. The benchmark exists only to falsify the bridge claim. There is no ranking of models by score.
  • Not the Universal Semantic Coordinate System. USCS is held in docs/philosophy.md as the upper bound this work could become; the bridge claim is the necessary condition, not the conclusion.
  • Not a finance product. No finance-domain experiments before the method is proven on cyclic time, scalar order, and concept-heavy taxonomies.
  • Not a cosmological claim about reality. The cosmological reading lives in docs/philosophy.md, owned and linked, not on this front door.
  • Not a rotator-tier claim. Rotator usefulness — can we move along the geometry, can behaviour follow geodesic interventions — is Tier 3 of the model-comparison ladder. Concept is Tier 1: instrument fidelity.

How this slots in

Concept is umbrella to Geometry's research lane. Geometry's adversarial-screen discipline produced the falsified probe that motivated Concept. Concept does not replace Geometry; it is what Geometry's empirical floor was pointing at.

The Flywheel Suite

  • flywheel-memory — local-first MCP server. Hybrid BM25 + semantic search, knowledge graph, safe writes over an Obsidian vault.
  • flywheel-crank — Obsidian plugin. Visual layer over Memory's graph: sidebar, vault health, semantic search UI.
  • flywheel-ideas — falsifiable decision ledger. Pre-registered assumptions, multi-model AI council dissent, outcome-driven refutation propagation.
  • flywheel-geometry — geodesic retrieval extension. Pre-registered study of cross-domain bridge-finding via activation manifolds.
  • flywheel-concept (this repo) — research programme on whether cross-model activations reveal structured concept geometry.

Status

Draft pre-registration. No pilot run yet. This repo holds:

  • The bridge claim, the task list, the model list, the baseline list, and the decision rule (above).
  • The frozen protocol document at docs/pre-registration.md. Currently DRAFT — moves to FROZEN at a tagged commit when the user signs off.
  • The evidence record at evidence/cheap-probe-360/ — the failed introspective-probe sweep that motivated this programme. Mirrored from flywheel-geometry.
  • No baseline numbers, no pilot results, no claim of empirical outcome.

Concept follows Flywheel Geometry — Geometry's pilot lands first, Concept's protocol freezes after, pre-registration in Flywheel Ideas at protocol freeze.


Evidence

The programme stands on one piece of preserved evidence at evidence/cheap-probe-360/: the 360-call adversarial sweep on @slashreboot's introspective coordinate elicitation probe (12 concepts × 6 prompt variants × 5 runs, against claude-sonnet-4-6, 2026-05-08). Pre-registered Spearman rank-correlation pass criterion ρ > 0.5 between core and adversarial variants. All four screens failed (A: 0.078, B: 0.290, C: 0.159, E: 0.108). The pre-registered failure-signal trap (variant D, coherence-pressure framing) fired at ρ = 0.759 with 18× core's ground-truth pair separation. Two load-bearing assumptions in Flywheel Ideas refuted: asm-HvE9muhM and asm-VotY4n8g, both parented to idea idea-b4ZeRCoa.

The bridge claim above is the next assumption in that lineage. The chain itself is the work.


Credit & Collaboration

Concept exists because of work that is not Concept's:

  • Goodfire AI — Ekdeep Singh Lubana, Thomas Fel, @GoodfireAI — for the manifold-steering programme that established representation-and-behaviour geometry as evidence of structure in data.
  • @slashreboot — Matthew — for publishing the introspective coordinate elicitation probe with full chain-of-thought traces, the data that made the falsification possible.
  • Hindupur, Lubana, Fel & Ba — Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry (NeurIPS 2025, arXiv:2503.01822) — for the instrument-fidelity framing.
  • Anthropic NLA team — for the Natural Language Autoencoders publication that defined the cleanest available baseline for introspective decoding.
  • BetaTomorrow / Deep Manifold — Single Token Geometry 04 — for the framing of activations and behaviour as traces of boundary-conditioned neural computation, not the manifold itself.
  • Huh, Cheung, Wang & Isola — for the Platonic Representation Hypothesis.

Coauthorship on derivative academic work belongs upstream.


Philosophy

The cosmological reading, the Universal Semantic Coordinate System reframe, and the four-piece suite headline are held in docs/philosophy.md. Owned, not hidden. Read it for the story so far. Read this README for what is being claimed and how it could be wrong.


Related repositories

Part of the Flywheel suite — local-first knowledge infrastructure over a plain-markdown Obsidian vault:

  • vault-core — Shared infrastructure for the Flywheel ecosystem.
  • flywheel-memory — Persistent knowledge-graph memory MCP server: semantic search, read, and write over your vault.
  • flywheel-crank — Desktop window into your vault's Flywheel MCP server.
  • flywheel-gravity — A compressed, reality-filtered context field over a vault.
  • flywheel-ideas — Local-first decision ledger: falsifiable bets, accepted outcomes, reusable lessons.
  • mega-monkey — Telegram-native AI research cockpit over an Obsidian vault.
  • roundtable — Local MCP server for delegating tasks to multiple AI models.

Research and experiments:

  • flywheel-concept (this repo) — A falsifiable study of cross-model concept geometry.
  • flywheel-geometry — A pre-registered study of cross-domain knowledge retrieval.
  • flywheel-universe — Lean 4 / Mathlib-verified core of the descent argument.
  • flywheel-velvetgram — Local widescreen Telegram reader for long-form reading.

Verified-cognition demo: mcp-seal (verified MCP approval gate) and canary (the seal demo host).

License

Apache 2.0. See LICENSE.

About

Pre-registered cross-model latent alignment as a falsifiable research programme on whether neural networks reveal structured concept geometry.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors