Integrate Gemma 4 with proper context window utilization#1
Merged
Conversation
Two changes that compound: 1. Fix num_ctx default. Ollama silently caps every model at 2048 tokens when num_ctx is not specified, regardless of the model's true capability. SummaryService now passes the model's actual context window (32K for Qwen 2.5, 128K for Llama 3.2 and Gemma 4) so the summarizer actually sees the full transcript. The 3000-char tail truncation in buildPrompt() is replaced with a dynamic limit derived from the context. 2. Add Gemma 4 E2B (7.2 GB, 128K context) and E4B (9.6 GB, 128K context) as recommended models. Both are flagged isNew so the setup UI shows a green "NEW" badge and surfaces the context window inline. Default model remains qwen2.5:3b to keep first-time install footprint small; users can opt into Gemma 4 by clicking it in the picker, or by setting SCRIPTA_INSTALL_GEMMA4=1 in the install script. Bumps version to 3.2.0. Co-authored-by: Cursor <cursoragent@cursor.com>
scripts/benchmark_models.sh runs Ollama against the same transcript across any set of installed models and a configurable num_ctx, recording wall clock, tokens/second, and prompt/eval token counts. Designed to make the "Ollama defaults to 2048" finding reproducible — run once with NUM_CTX=2048 and once with NUM_CTX=32768 to see the difference firsthand. The prompt template is kept in sync with SummaryService.swift:buildPrompt() (instructions embedded in prompt, no system field) so the benchmark matches production behavior — and avoids tripping Gemma 4 into a thinking-mode pattern that consumes the num_predict budget without emitting visible output. benchmarks/synthetic-transcript.md is a 60-minute fictional Atlas Robotics all-hands transcript. All names, projects, customers, and numbers are invented. Real meeting recordings must never be committed to this directory; .gitignore enforces that the only tracked content is the synthetic fixture, the README, and the findings report. benchmarks/findings.md documents the qualitative outputs for the default ctx=2048 vs ctx=32768 runs with both Qwen 2.5 3B and Gemma 4 E2B, including the interesting result that Gemma 4 reports the truncation back to the user. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Gemma 4 E2B and E4B as recommended models for AI summary generation,
and along the way fixes a long-standing bug where Scripta's summaries were
silently truncated to the last ~5 minutes of every meeting.
The bug (commit c211678)
SummaryService.swiftsent transcripts to Ollama without specifyingnum_ctx,which made Ollama silently apply its default cap of 2048 tokens regardless
of the model's actual context window. Combined with a hardcoded 3000-character
tail truncation in
buildPrompt(), every Scripta summary was effectivelygenerated from just the last few minutes of any meeting.
The fix is to look up the selected model's true context window
(
SummaryModelManager.contextWindow(for:)) and pass it asnum_ctx, thensize the transcript tail truncation dynamically: