Skip to content

Integrate Gemma 4 with proper context window utilization#1

Merged
thehwang merged 2 commits into
mainfrom
feature/gemma-4-integration
May 16, 2026
Merged

Integrate Gemma 4 with proper context window utilization#1
thehwang merged 2 commits into
mainfrom
feature/gemma-4-integration

Conversation

@thehwang
Copy link
Copy Markdown
Owner

Summary

Adds Gemma 4 E2B and E4B as recommended models for AI summary generation,
and along the way fixes a long-standing bug where Scripta's summaries were
silently truncated to the last ~5 minutes of every meeting.

The bug (commit c211678)

SummaryService.swift sent transcripts to Ollama without specifying num_ctx,
which made Ollama silently apply its default cap of 2048 tokens regardless
of the model's actual context window. Combined with a hardcoded 3000-character
tail truncation in buildPrompt(), every Scripta summary was effectively
generated from just the last few minutes of any meeting.

The fix is to look up the selected model's true context window
(SummaryModelManager.contextWindow(for:)) and pass it as num_ctx, then
size the transcript tail truncation dynamically:

let contextTokens = SummaryModelManager.contextWindow(for: modelName)
// ...
"options": [
    "num_ctx": contextTokens,
    // ...
]

thehwang and others added 2 commits May 16, 2026 10:24
Two changes that compound:

1. Fix num_ctx default. Ollama silently caps every model at 2048 tokens
   when num_ctx is not specified, regardless of the model's true capability.
   SummaryService now passes the model's actual context window
   (32K for Qwen 2.5, 128K for Llama 3.2 and Gemma 4) so the summarizer
   actually sees the full transcript. The 3000-char tail truncation in
   buildPrompt() is replaced with a dynamic limit derived from the context.

2. Add Gemma 4 E2B (7.2 GB, 128K context) and E4B (9.6 GB, 128K context)
   as recommended models. Both are flagged isNew so the setup UI shows a
   green "NEW" badge and surfaces the context window inline. Default model
   remains qwen2.5:3b to keep first-time install footprint small; users can
   opt into Gemma 4 by clicking it in the picker, or by setting
   SCRIPTA_INSTALL_GEMMA4=1 in the install script.

Bumps version to 3.2.0.

Co-authored-by: Cursor <cursoragent@cursor.com>
scripts/benchmark_models.sh runs Ollama against the same transcript across
any set of installed models and a configurable num_ctx, recording wall
clock, tokens/second, and prompt/eval token counts. Designed to make the
"Ollama defaults to 2048" finding reproducible — run once with
NUM_CTX=2048 and once with NUM_CTX=32768 to see the difference firsthand.

The prompt template is kept in sync with SummaryService.swift:buildPrompt()
(instructions embedded in prompt, no system field) so the benchmark
matches production behavior — and avoids tripping Gemma 4 into a
thinking-mode pattern that consumes the num_predict budget without
emitting visible output.

benchmarks/synthetic-transcript.md is a 60-minute fictional Atlas
Robotics all-hands transcript. All names, projects, customers, and
numbers are invented. Real meeting recordings must never be committed
to this directory; .gitignore enforces that the only tracked content
is the synthetic fixture, the README, and the findings report.

benchmarks/findings.md documents the qualitative outputs for the
default ctx=2048 vs ctx=32768 runs with both Qwen 2.5 3B and Gemma 4
E2B, including the interesting result that Gemma 4 reports the
truncation back to the user.

Co-authored-by: Cursor <cursoragent@cursor.com>
@thehwang thehwang merged commit 4281a0f into main May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant