Integrate Gemma 4 with proper context window utilization by thehwang · Pull Request #1 · thehwang/Scripta

thehwang · 2026-05-16T15:27:47Z

Summary

Adds Gemma 4 E2B and E4B as recommended models for AI summary generation,
and along the way fixes a long-standing bug where Scripta's summaries were
silently truncated to the last ~5 minutes of every meeting.

The bug (commit `c211678`)

SummaryService.swift sent transcripts to Ollama without specifying num_ctx,
which made Ollama silently apply its default cap of 2048 tokens regardless
of the model's actual context window. Combined with a hardcoded 3000-character
tail truncation in buildPrompt(), every Scripta summary was effectively
generated from just the last few minutes of any meeting.

The fix is to look up the selected model's true context window
(SummaryModelManager.contextWindow(for:)) and pass it as num_ctx, then
size the transcript tail truncation dynamically:

let contextTokens = SummaryModelManager.contextWindow(for: modelName)
// ...
"options": [
    "num_ctx": contextTokens,
    // ...
]

Two changes that compound: 1. Fix num_ctx default. Ollama silently caps every model at 2048 tokens when num_ctx is not specified, regardless of the model's true capability. SummaryService now passes the model's actual context window (32K for Qwen 2.5, 128K for Llama 3.2 and Gemma 4) so the summarizer actually sees the full transcript. The 3000-char tail truncation in buildPrompt() is replaced with a dynamic limit derived from the context. 2. Add Gemma 4 E2B (7.2 GB, 128K context) and E4B (9.6 GB, 128K context) as recommended models. Both are flagged isNew so the setup UI shows a green "NEW" badge and surfaces the context window inline. Default model remains qwen2.5:3b to keep first-time install footprint small; users can opt into Gemma 4 by clicking it in the picker, or by setting SCRIPTA_INSTALL_GEMMA4=1 in the install script. Bumps version to 3.2.0. Co-authored-by: Cursor <cursoragent@cursor.com>

scripts/benchmark_models.sh runs Ollama against the same transcript across any set of installed models and a configurable num_ctx, recording wall clock, tokens/second, and prompt/eval token counts. Designed to make the "Ollama defaults to 2048" finding reproducible — run once with NUM_CTX=2048 and once with NUM_CTX=32768 to see the difference firsthand. The prompt template is kept in sync with SummaryService.swift:buildPrompt() (instructions embedded in prompt, no system field) so the benchmark matches production behavior — and avoids tripping Gemma 4 into a thinking-mode pattern that consumes the num_predict budget without emitting visible output. benchmarks/synthetic-transcript.md is a 60-minute fictional Atlas Robotics all-hands transcript. All names, projects, customers, and numbers are invented. Real meeting recordings must never be committed to this directory; .gitignore enforces that the only tracked content is the synthetic fixture, the README, and the findings report. benchmarks/findings.md documents the qualitative outputs for the default ctx=2048 vs ctx=32768 runs with both Qwen 2.5 3B and Gemma 4 E2B, including the interesting result that Gemma 4 reports the truncation back to the user. Co-authored-by: Cursor <cursoragent@cursor.com>

thehwang and others added 2 commits May 16, 2026 10:24

thehwang merged commit 4281a0f into main May 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Gemma 4 with proper context window utilization#1

Integrate Gemma 4 with proper context window utilization#1
thehwang merged 2 commits into
mainfrom
feature/gemma-4-integration

thehwang commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thehwang commented May 16, 2026

Summary

The bug (commit c211678)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

The bug (commit `c211678`)