Gemini Nexus

Give your browser a native AI layer

Project Overview

Gemini Nexus gives your browser a native AI layer by combining Gemini Web, the Google Gemini API, OpenAI-compatible APIs, and dedicated third-party API providers in one Chrome extension. It is more than a side panel: the extension includes an injected floating toolbar, image and screenshot input, Chrome DevTools Protocol based browser-control tools, and optional external MCP tools for browser-native AI workflows.

Capability Overview

Gemini Nexus currently focuses on these browser AI workflows:

Switch among Gemini Web, Gemini API, OpenAI Compatible API, OpenAI Official API, DeepSeek API, OpenRouter API, Qwen / DashScope API, Anthropic API, and Zhipu API, with provider-specific Base URL, API Key, and Model IDs.
Enable Gemini Web temporary chats so Web-provider requests are not added to Gemini Recent chats.
Use Gemini API Google Search grounding and show web sources in responses.
Use OpenAI-compatible web search through Responses API web_search or Chat Completions web_search_options, depending on the current endpoint.
Limit side-panel scope by tab to reduce distraction on pages where the assistant is not needed.
Re-edit historical user messages and continue from that point; this feature is enabled for API providers only.
Manage context with summary compression and recent-turn trimming to reduce the risk of exceeding model context limits.
Mark browser-control tasks with Chrome native tab groups and keep list_pages / select_page focused on the controlled scope.
Open external links in new browser tabs to avoid failed third-party loading inside the side panel.
Preserve settings as much as possible across extension identity and local upgrade paths.

Provider Comparison

The project includes provider drivers under services/providers and adapts behavior dynamically in code:

Provider	Entry	Models	Strength	Requirement
Web Client	`web.js`	Current Gemini Web chat modes	No API key; reuses the Gemini web session; optional temporary chats	Keep a Google account signed in
Official API	`official.js`	Gemini Flash/Pro preview models	Fast responses with Thinking and Google Search grounding	Google AI Studio key
OpenAI Compatible	`openai_compatible.js`	GPT, Claude, and compatible models	Highly extensible; supports Chat Completions / Responses API and optional web search	Third-party service key
OpenAI Official	`openai_compatible.js`	GPT reasoning/search models	Dedicated Responses API path with reasoning summary and optional web search	OpenAI API key
DeepSeek API	`openai_compatible.js`	DeepSeek chat/reasoning models	Dedicated defaults for DeepSeek Chat Completions and `reasoning_content` display	DeepSeek API key
OpenRouter API	`openai_compatible.js`	OpenRouter model IDs	Fetches `/models`, supports provider routing JSON, and sends native `reasoning`	OpenRouter API key
Qwen / DashScope	`openai_compatible.js`	Qwen text and VL models	Dedicated DashScope compatible endpoint with `enable_thinking` and VL image input	DashScope API key
Anthropic API	`anthropic.js`	Claude models	Native Messages API adapter with image input and extended-thinking stream display	Anthropic API key
Zhipu API	`openai_compatible.js`	GLM models	Dedicated GLM Chat Completions profile with native thinking toggle payloads	Zhipu API key

Browser Control

Built on background/control/ and Chrome DevTools Protocol, Gemini Nexus lets AI perform agentic browser tasks through a local tool loop:

Category	Core commands	Implementation
Navigation	`navigate_page`, `new_page`, `close_page`, `list_pages`, `select_page`	Manages page lifecycle through `chrome.tabs`
Interaction	`click`, `hover`, `fill`, `fill_form`, `press_key`, `type_text`	Uses Accessibility Tree UIDs for precise actions, hover, batch form fill, shortcuts, and input
Observation	`take_snapshot`, `wait_for`, `handle_dialog`	Extracts reusable accessibility-tree UIDs, waits for target text, and handles blocking dialogs
Script execution	`evaluate_script`	Runs custom JavaScript in the page context

After browser control is enabled, Gemini Nexus locks onto a target tab and uses a Chrome native tab group to show the current task title. select_page switches inside the controlled tab group by default; regular new_page tabs join the group, while background: true opens a separate popup window to reduce focus interruption.

External MCP Tools

Gemini Nexus can connect to one or more external MCP servers through SSE, streamable HTTP, or WebSocket, then expose their tools to the existing tool loop.

Recommended Setup: Local Proxy for stdio Servers

Chrome extensions cannot directly run stdio-based MCP servers, so the recommended setup is to run a local proxy, such as MCP SuperAssistant Proxy. Configure your MCP servers, including stdio servers, in the proxy, then connect Gemini Nexus to the proxy endpoint.

Common proxy endpoints:

SSE: http://127.0.0.1:3006/sse
Streamable HTTP: http://127.0.0.1:3006/mcp
WebSocket: ws://127.0.0.1:3006/mcp

Setup Steps

Start your MCP proxy and configure MCP servers inside it.
In Settings -> Connection -> External MCP Tools:
- Enable External MCP Tools.
- Add or select a server entry. Active Server means the entry currently being edited; conversations use all enabled servers.
- Choose the transport and set the server URL: SSE, streamable HTTP, or WebSocket.
- Use SSE or streamable HTTP if you need custom request headers; browser-extension WebSocket transport does not support custom headers.
- Click Test Connection and Refresh Tools.
Optional, and recommended when many tools exist: set Expose Tools to Selected tools only, then enable only the tools you want the model to see or use.
Start a normal conversation. When the model needs tools, it outputs a JSON tool block like the one below. In multi-server mode, the model may use unique tool names in the serverId__toolName format to route calls to a specific server:
```
{ "tool": "tool_name", "args": { "key": "value" } }
```

Key Features

Smart side panel: Built on the sidePanel API for fast conversation access and full-text history search.
Selection toolbar: Injected content scripts let selected text be translated, summarized, explained, grammar-fixed, or inserted back into forms.
Image and screenshot input:
- OCR and screenshot translation: Canvas cropping extracts and translates selected image regions.
- Screen or window capture: The side panel can use display-capture to select another screen or app window as image input.
- Floating image detection: Detects page images and shows a floating AI analysis button.
- Generated image display: Shows fetched Gemini images without local pixel rewriting.
- Gemini Web currently supports image attachments through the reverse provider. Use Gemini API for PDF/text/document attachments.
Safe rendering: Markdown, LaTeX, and code blocks render inside the isolated sandbox environment.

Gemini Web Maintenance

Gemini Web is reverse engineered and can change without notice. The current contract is documented in docs/gemini-web-reverse.md, including the verified tokens, RPC paths, upload flow, model hashes, temporary-chat markers, unsupported image-preview model routes, and the manual drift check command.

Quick Start

Repository Structure

The repository root is the runnable Chrome extension project root. package.json, manifest.json, Vite config, source code, tests, and packaging scripts all live at the root. Cross-runtime shared utilities live in shared/ and are grouped by capability under shared/attachments/, shared/config/, shared/dom/, shared/logging/, shared/mcp/, shared/media/, shared/messaging/, shared/models/, shared/settings/, shared/text/, shared/ui/, and shared/utils/; the project no longer keeps top-level shared/*.js compatibility entry points. Directory aggregation entry points consistently use an index.js inside the directory to avoid sibling foo.js and foo/ modules. Runtime entry points remain at each runtime root, such as background/index.js, content/index.js, sandbox/index.js, sidepanel/index.js, and the standalone settings page settings/index.js. Runtime code uses snake_case filenames, while repository tooling scripts and workflow files may use kebab-case.

Install from Release

Download the latest ZIP from Releases and unzip it.
Open chrome://extensions/ in Chrome and enable Developer mode in the top-right corner.
Click Load unpacked and select the extracted folder.

Build and Package from Source

npm install
npm run package:extension

After packaging, choose artifacts/chrome-extension when using Chrome Load unpacked. For development, you can also load the repository root directly, but releases and manual installs should use the packaged directory. npm run build only creates the Vite UI output in dist/; it is not a complete extension directory. The package step merges multiple content scripts into a single content/index.js in manifest.json order and rewrites the packaged manifest, avoiding reliance on a long manual script list in release artifacts.

Publish to Chrome Web Store

Chrome Web Store credentials should stay on your local machine and must not be committed:

cp .env.chrome-webstore.example .env.chrome-webstore

Edit .env.chrome-webstore and fill in CHROME_WEBSTORE_PUBLISHER_ID, CHROME_WEBSTORE_ITEM_ID, and CHROME_WEBSTORE_ACCESS_TOKEN with the https://www.googleapis.com/auth/chromewebstore scope. After preparing the ZIP, run:

npm run publish:chrome-webstore

The script uploads the ZIP pointed to by CHROME_WEBSTORE_ZIP_PATH through Chrome Web Store API v2, then submits it for review.

Tech Stack

Build tools: Vite + TypeScript
Architecture protocols: Chrome MV3 + Chrome DevTools Protocol + local/external MCP tool calls
Core libraries: Marked.js, KaTeX, Highlight.js, Fuse.js

License

This project is open sourced under the MIT License.

Acknowledgements

This project has been shared in the LINUX DO community. Thanks to the community for support and feedback.

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.github		.github
assets		assets
background		background
content		content
css		css
docs		docs
sandbox		sandbox
scripts		scripts
services		services
settings		settings
shared		shared
sidepanel		sidepanel
test-pages		test-pages
test		test
vendor		vendor
.env.chrome-webstore.example		.env.chrome-webstore.example
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SECURITY.md		SECURITY.md
knip.json		knip.json
logo.png		logo.png
manifest.json		manifest.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemini Nexus

Give your browser a native AI layer

Project Overview

Capability Overview

Provider Comparison

Browser Control

External MCP Tools

Recommended Setup: Local Proxy for stdio Servers

Setup Steps

Key Features

Gemini Web Maintenance

Quick Start

Repository Structure

Install from Release

Build and Package from Source

Publish to Chrome Web Store

Tech Stack

License

Acknowledgements

About

Uh oh!

Releases 17

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gemini Nexus

Give your browser a native AI layer

Project Overview

Capability Overview

Provider Comparison

Browser Control

External MCP Tools

Recommended Setup: Local Proxy for stdio Servers

Setup Steps

Key Features

Gemini Web Maintenance

Quick Start

Repository Structure

Install from Release

Build and Package from Source

Publish to Chrome Web Store

Tech Stack

License

Acknowledgements

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 17

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages