Computational linguistics analysis of AI-generated text.
Modeltell systematically measures the lexical and syntactic fingerprints of large language models. Unlike word-level "AI detector" lists, this project analyzes grammatical constructions - the sentence structures, rhetorical patterns, and compositional habits that define each model's linguistic identity.
Everyone knows LLMs overuse "delve" and "leverage." That's table stakes. Modeltell goes deeper:
- Syntactic pattern detection: Tricolon frequency, hedging constructions, pseudo-inclusive openers ("Whether you're X or Y"), em-dash dramatic pivots, power verb stacking density
- Structural analysis: Opener/closer classification, bullet-to-prose ratios, formatting density, sentence length variance
- Cross-model fingerprinting: Radar profiles that reveal each model's unique combination of linguistic habits
- Evolution tracking: How do these patterns shift across model versions? Are models converging or diverging?
prompts/catalog.json 30 standardized content generation prompts
patterns/definitions.json 15 syntactic patterns + lexical watchlists
src/runner/ Multi-provider API runner (3 runs per prompt per model)
src/analysis/ Lexical (cross-model TF-IDF, Burrows's Delta) + Syntactic (regex patterns)
src/publish/ Transforms analysis → community JSON (published/)
src/cli/ Zero-dependency content linter (npm run check)
scripts/mock-run.ts Synthetic dataset generator (no API keys needed)
data/{run-id}/ Raw outputs + analysis results per run
published/ Versioned community dataset (the static JSON API)
frontend/ Scrollytelling visualization (React + Vite + D3)
| Tier | Models |
|---|---|
| Frontier | Claude Opus 4.8 / 4.7 / 4.6, GPT-5.5, Gemini 3.1 Pro |
| Mid | Claude Sonnet 4.6, GPT-5.4 Mini, Gemini 3.5 Flash |
| Open Source | Llama 4 Maverick, DeepSeek V4 Pro / V3.2, Mistral Large (2512), Qwen3.7 Max |
The current registry lives in src/runner/models.ts (13 models, with family/version for drift). Adding a model = one entry there.
| Pattern | What It Detects | Severity |
|---|---|---|
| Tricolon | "innovative, scalable, and transformative" | High |
| Whether-You're | "Whether you're a startup or enterprise..." | Critical |
| Hedging Opener | "It's worth noting that..." | High |
| Not-Just-But | "not just a tool, but a partner" | High |
| In-Today's-Landscape | "In today's fast-paced digital landscape..." | Critical |
| Em-Dash Pivot | "one platform - endless possibilities" | Medium |
| Power Verb Stacking | "drive, boost, and accelerate" | High |
| Future-Forward Closing | "As we move forward..." | High |
| ... and 7 more | See patterns/definitions.json |
# Install
npm install
# Configure keys: copy and edit .env (auto-loaded by the runner)
cp .env.example .envKeys: set OPENROUTER_API_KEY. The default registry routes all 13 models
through OpenRouter's single OpenAI-compatible API, so that one key is all you
need. Open models are provider-pinned (and the serving backend is recorded in
each result) for reproducibility.
The runner also ships direct-provider adapters (
callAnthropic,callOpenAI, …), but they're only used if you change a model'sproviderinsrc/runner/models.tsaway from"openrouter". With the default registry the direct keys (ANTHROPIC_API_KEY, …) are not consulted.
# Run generation (all models you have keys for)
npm run run:all
# Fast smoke run: a few models, 1 generation per prompt
RUNS_PER_PROMPT=1 ONLY_MODELS=claude-sonnet-4.6,gpt-5.5 npm run run:all
# German corpus instead of English (prompts, patterns, stopwords)
LOCALE=de npm run run:all
# Convenience model groups (just preset ONLY_MODELS filters - everything
# still runs through OpenRouter, these don't hit providers directly)
npm run run:anthropic # the Claude models (run:openai, run:google, run:opensource too)
# Analyze results (set LOCALE=de for German lexical stopwords)
npm run analyze:lexical -- data/<run-id>
npm run analyze:syntactic -- data/<run-id>
# Publish the community dataset → published/<locale>/
npm run publish:data -- data/<run-id>For developing the analysis, publisher, or frontend without spending anything, generate a full synthetic dataset (model-specific pattern density, no API calls):
npx tsx scripts/mock-run.ts
RUN=$(ls -td data/*_mock | head -1)
npx tsx src/analysis/lexical.ts "$RUN" && npx tsx src/analysis/syntactic.ts "$RUN" && npx tsx src/publish/publish.ts "$RUN"A zero-dependency, bilingual (EN/DE) linter that flags AI patterns in any
text. The language is auto-detected; override with --locale en|de. Exits
non-zero on grade C or below, so it drops straight into CI:
npm run check -- "In today's fast-paced landscape, whether you're a pro or a beginner..."
npx modeltell check --locale de "In der heutigen schnelllebigen Welt optimieren wir nahtlos."
npx modeltell check --file landing-page.md --json
echo "some text" | npx modeltell check -It embeds its own patterns/words per language (no external files), so it runs
standalone after npm run build:cli (output: dist/cli/check.js).
A GitHub Action runs the full pipeline monthly and commits results. See .github/workflows/monthly-run.yml.
A scroll-driven data story built with React, Vite, Framer Motion, and custom
SVG/D3 visualizations. It reads the published/ dataset - no runtime API. See
frontend/README.md for details.
cd frontend && npm install && npm run dev # auto-syncs published/ firstSections, in scroll order:
- Hook - an annotated AI sentence, every "tell" highlighted inline
- The Word Cloud - frequent AI-associated vocabulary, sized by frequency
- Model Fingerprints - a radar chart per model across 8 syntactic dimensions
- Head to Head - overlay any two models + a similarity score
- Version Drift - one model family across its own versions (e.g. Opus 4.6→4.8)
- Pattern Deep-Dive - each construction with a real example, CI whiskers, significance markers, the fix
- Tier Analysis - Frontier vs Mid vs Open Source averages
- Similarity Heatmap - every pair, scored (radar similarity ⇄ Burrows's Delta)
- The Watchlist - full pattern list + CLI call-to-action
A language switcher (EN/DE) flips both the UI and the underlying corpus.
The pipeline publishes a versioned, self-contained JSON dataset under
published/<locale>/, served as a static API via GitHub Pages
(.github/workflows/pages.yml). Each corpus language gets its own namespace:
published/locales.json discovery: which languages exist
published/<locale>/index.json entry point (locale = en | de | …)
published/<locale>/models/{model-id}.json per-model fingerprint (self-contained)
published/<locale>/watchlist/words.json frequent AI-associated words
published/<locale>/watchlist/constructions.json syntactic constructions
published/<locale>/comparisons/tier-summary.json frontier/mid/opensource averages
published/<locale>/comparisons/similarity-matrix.json radar similarity matrix
published/<locale>/comparisons/delta-matrix.json Burrows's Delta matrix
published/<locale>/validation/regex-precision.json manual precision audit (optional)
validation/regex-precision.json is optional and currently English-only - a
locale's index.json lists files.regex_precision only when that audit exists,
so always read it from the index rather than assuming the path.
Start at locales.json, then published/en/index.json (or …/de/index.json).
On GitHub Pages that's e.g. https://<owner>.github.io/modeltell/en/index.json.
Raw outputs: data/{run-id}/{model-id}.json
Lexical analysis: data/{run-id}/_lexical_analysis.json
Syntactic analysis: data/{run-id}/_syntactic_analysis.json
All data is committed to the repo for full transparency and reproducibility.
We try to be precise about what these numbers do and don't support. The full
methodology & limitations page (in the frontend, route #/methodology) is the
canonical reference. In short:
- Not an AI detector. This describes tendencies of model output in aggregate, not the provenance of any single text. Published AI detectors are unreliable and biased; don't use this to judge whether a person wrote something.
- No human baseline (yet). The lexical scores measure cross-model distinctiveness (TF-IDF vs the aggregate of all models), not "overuse vs humans." A matched human corpus is future work.
- Heuristic detection, partially validated. Patterns are regular expressions.
A manual precision audit (
validation/regex-precision.json) measured ~71% micro-precision with a wide spread - clean for some patterns (not-just-but, power verbs), but ~0% forparenthetical_asideand ~0.75 fortricolon/em_dash_pivot(they over-count in long lists). Recall is not yet measured, and the audit is single-annotator. - Significance is pattern-level. Each frequency carries a 95% bootstrap CI and an FDR-corrected permutation test (model vs. field); full pairwise tests are not yet done.
- Task/prompt bias. One domain (business/marketing copy), one system prompt, default decoding - "model style" is entangled with "task style."
PRs welcome - see CONTRIBUTING.md. In short: add patterns to
patterns/definitions.json, models to src/runner/models.ts, prompts to
prompts/catalog.json. Keep the pipeline zero-dependency and tsc --noEmit
clean. (Pattern regexes are JavaScript - no Python-style (?i) inline flags.)
MIT
Built by Third Shift Lab as an open research project.