Gemini Nexus gives your browser a native AI layer by combining Gemini Web, the Google Gemini API, OpenAI-compatible APIs, and dedicated third-party API providers in one Chrome extension. It is more than a side panel: the extension includes an injected floating toolbar, image and screenshot input, Chrome DevTools Protocol based browser-control tools, and optional external MCP tools for browser-native AI workflows.
Gemini Nexus currently focuses on these browser AI workflows:
- Switch among Gemini Web, Gemini API, OpenAI Compatible API, OpenAI Official API, DeepSeek API, OpenRouter API, Qwen / DashScope API, Anthropic API, and Zhipu API, with provider-specific
Base URL,API Key, andModel IDs. - Enable Gemini Web temporary chats so Web-provider requests are not added to Gemini Recent chats.
- Use Gemini API Google Search grounding and show web sources in responses.
- Use OpenAI-compatible web search through Responses API
web_searchor Chat Completionsweb_search_options, depending on the current endpoint. - Limit side-panel scope by tab to reduce distraction on pages where the assistant is not needed.
- Re-edit historical user messages and continue from that point; this feature is enabled for API providers only.
- Manage context with summary compression and recent-turn trimming to reduce the risk of exceeding model context limits.
- Mark browser-control tasks with Chrome native tab groups and keep
list_pages/select_pagefocused on the controlled scope. - Open external links in new browser tabs to avoid failed third-party loading inside the side panel.
- Preserve settings as much as possible across extension identity and local upgrade paths.
The project includes provider drivers under services/providers and adapts behavior dynamically in code:
| Provider | Entry | Models | Strength | Requirement |
|---|---|---|---|---|
| Web Client | web.js |
Current Gemini Web chat modes | No API key; reuses the Gemini web session; optional temporary chats | Keep a Google account signed in |
| Official API | official.js |
Gemini Flash/Pro preview models | Fast responses with Thinking and Google Search grounding | Google AI Studio key |
| OpenAI Compatible | openai_compatible.js |
GPT, Claude, and compatible models | Highly extensible; supports Chat Completions / Responses API and optional web search | Third-party service key |
| OpenAI Official | openai_compatible.js |
GPT reasoning/search models | Dedicated Responses API path with reasoning summary and optional web search | OpenAI API key |
| DeepSeek API | openai_compatible.js |
DeepSeek chat/reasoning models | Dedicated defaults for DeepSeek Chat Completions and reasoning_content display |
DeepSeek API key |
| OpenRouter API | openai_compatible.js |
OpenRouter model IDs | Fetches /models, supports provider routing JSON, and sends native reasoning |
OpenRouter API key |
| Qwen / DashScope | openai_compatible.js |
Qwen text and VL models | Dedicated DashScope compatible endpoint with enable_thinking and VL image input |
DashScope API key |
| Anthropic API | anthropic.js |
Claude models | Native Messages API adapter with image input and extended-thinking stream display | Anthropic API key |
| Zhipu API | openai_compatible.js |
GLM models | Dedicated GLM Chat Completions profile with native thinking toggle payloads | Zhipu API key |
Built on background/control/ and Chrome DevTools Protocol, Gemini Nexus lets AI perform agentic browser tasks through a local tool loop:
| Category | Core commands | Implementation |
|---|---|---|
| Navigation | navigate_page, new_page, close_page, list_pages, select_page |
Manages page lifecycle through chrome.tabs |
| Interaction | click, hover, fill, fill_form, press_key, type_text |
Uses Accessibility Tree UIDs for precise actions, hover, batch form fill, shortcuts, and input |
| Observation | take_snapshot, wait_for, handle_dialog |
Extracts reusable accessibility-tree UIDs, waits for target text, and handles blocking dialogs |
| Script execution | evaluate_script |
Runs custom JavaScript in the page context |
After browser control is enabled, Gemini Nexus locks onto a target tab and uses a Chrome native tab group to show the current task title. select_page switches inside the controlled tab group by default; regular new_page tabs join the group, while background: true opens a separate popup window to reduce focus interruption.
Gemini Nexus can connect to one or more external MCP servers through SSE, streamable HTTP, or WebSocket, then expose their tools to the existing tool loop.
Chrome extensions cannot directly run stdio-based MCP servers, so the recommended setup is to run a local proxy, such as MCP SuperAssistant Proxy. Configure your MCP servers, including stdio servers, in the proxy, then connect Gemini Nexus to the proxy endpoint.
Common proxy endpoints:
- SSE:
http://127.0.0.1:3006/sse - Streamable HTTP:
http://127.0.0.1:3006/mcp - WebSocket:
ws://127.0.0.1:3006/mcp
-
Start your MCP proxy and configure MCP servers inside it.
-
In Settings -> Connection -> External MCP Tools:
- Enable External MCP Tools.
- Add or select a server entry. Active Server means the entry currently being edited; conversations use all enabled servers.
- Choose the transport and set the server URL: SSE, streamable HTTP, or WebSocket.
- Use SSE or streamable HTTP if you need custom request headers; browser-extension WebSocket transport does not support custom headers.
- Click Test Connection and Refresh Tools.
-
Optional, and recommended when many tools exist: set Expose Tools to Selected tools only, then enable only the tools you want the model to see or use.
-
Start a normal conversation. When the model needs tools, it outputs a JSON tool block like the one below. In multi-server mode, the model may use unique tool names in the
serverId__toolNameformat to route calls to a specific server:{ "tool": "tool_name", "args": { "key": "value" } }
- Smart side panel: Built on the
sidePanelAPI for fast conversation access and full-text history search. - Selection toolbar: Injected content scripts let selected text be translated, summarized, explained, grammar-fixed, or inserted back into forms.
- Image and screenshot input:
- OCR and screenshot translation: Canvas cropping extracts and translates selected image regions.
- Screen or window capture: The side panel can use
display-captureto select another screen or app window as image input. - Floating image detection: Detects page images and shows a floating AI analysis button.
- Generated image display: Shows fetched Gemini images without local pixel rewriting.
- Gemini Web currently supports image attachments through the reverse provider. Use Gemini API for PDF/text/document attachments.
- Safe rendering: Markdown, LaTeX, and code blocks render inside the isolated
sandboxenvironment.
Gemini Web is reverse engineered and can change without notice. The current contract is documented in docs/gemini-web-reverse.md, including the verified tokens, RPC paths, upload flow, model hashes, temporary-chat markers, unsupported image-preview model routes, and the manual drift check command.
The repository root is the runnable Chrome extension project root. package.json, manifest.json, Vite config, source code, tests, and packaging scripts all live at the root. Cross-runtime shared utilities live in shared/ and are grouped by capability under shared/attachments/, shared/config/, shared/dom/, shared/logging/, shared/mcp/, shared/media/, shared/messaging/, shared/models/, shared/settings/, shared/text/, shared/ui/, and shared/utils/; the project no longer keeps top-level shared/*.js compatibility entry points. Directory aggregation entry points consistently use an index.js inside the directory to avoid sibling foo.js and foo/ modules. Runtime entry points remain at each runtime root, such as background/index.js, content/index.js, sandbox/index.js, sidepanel/index.js, and the standalone settings page settings/index.js. Runtime code uses snake_case filenames, while repository tooling scripts and workflow files may use kebab-case.
- Download the latest ZIP from Releases and unzip it.
- Open
chrome://extensions/in Chrome and enable Developer mode in the top-right corner. - Click Load unpacked and select the extracted folder.
npm install
npm run package:extensionAfter packaging, choose artifacts/chrome-extension when using Chrome Load unpacked. For development, you can also load the repository root directly, but releases and manual installs should use the packaged directory. npm run build only creates the Vite UI output in dist/; it is not a complete extension directory. The package step merges multiple content scripts into a single content/index.js in manifest.json order and rewrites the packaged manifest, avoiding reliance on a long manual script list in release artifacts.
Chrome Web Store credentials should stay on your local machine and must not be committed:
cp .env.chrome-webstore.example .env.chrome-webstoreEdit .env.chrome-webstore and fill in CHROME_WEBSTORE_PUBLISHER_ID, CHROME_WEBSTORE_ITEM_ID, and CHROME_WEBSTORE_ACCESS_TOKEN with the https://www.googleapis.com/auth/chromewebstore scope. After preparing the ZIP, run:
npm run publish:chrome-webstoreThe script uploads the ZIP pointed to by CHROME_WEBSTORE_ZIP_PATH through Chrome Web Store API v2, then submits it for review.
- Build tools: Vite + TypeScript
- Architecture protocols: Chrome MV3 + Chrome DevTools Protocol + local/external MCP tool calls
- Core libraries: Marked.js, KaTeX, Highlight.js, Fuse.js
This project is open sourced under the MIT License.
This project has been shared in the LINUX DO community. Thanks to the community for support and feedback.