Skip to content

tantara/transformers.js-chrome

Repository files navigation

🚧 Transformers.js Chrome Extension 🚧

This is an example Chrome extension for Transformers.js, a library for running LLMs in the browser, built on top of Plasmo.

⚠️ Please note that this project is still under development and is not ready for production or enterprise use. APIs, features, and code structures may change without notice. The Chrome extension process could also be stopped by the browser at any time. Thank you for your understanding! πŸ™

Examples

Here is the link to the demo videos (intro and advanced usage of llm, multi modal, tts and reasoning). Each scenario can be found below.

Task Example
Text Summarization Example Text Summarization
Code Generation Example Code Generation
Image Understanding Example Multi Modal LLM
Image Generation Example Image Generation
Speech to Text Example Speech to Text
Reasoning Example Reasoning
Text to Speech WIP
Text Classification TODO
Image Segmentation TODO
Remove Background TODO

Features

  • Integrate Transformers.js with Chrome extension
  • Use modern web development tooling (TypeScript, Parcel, Tailwind CSS, Shadcn, etc.)
  • Change generation parameters (e.g. max_tokens, temperature, top_p etc.)
  • Load LLaMA variants
  • Load other LLM models
  • Release extension to Chrome Web Store
  • Load multi modal LLM models
  • Load Whisper (Speech-to-Text)
  • Load DeepSeek R1 (Reasoning)
  • Load OuteTTS (Text-to-Speech)
  • SAM (Segment Anything Model), Text-classification, etc.
  • Chat history (save to local storage, export to CSV)
  • Call 3rd party LLM APIs
  • Error handling
  • Resource management (e.g. orchestrate and stop generations, unload models)

Performance

All the numbers below are measured on a MacBook Pro M1 Max with 32GB RAM.

Prompt: "Write python code to compute the nth fibonacci number."

Model Throughput
Llama-3.2-1B (q4f16) 40.3 tokens/sec
Phi-3.5-mini (q4f16) 32.9 tokens/sec
SmolLM2-1.7B (q4f16) 46.2 tokens/sec
Qwen2.5-Coder-1.5B (q4f16) 36.1 tokens/sec
Janus 1.3B (q4f16) 30.9 tokens/sec
Whipser Base (fp32 + q4) 30.5 tokens/sec
DeepSeek R1 (q4f16) 32.7 tokens/sec

Installation

Chrome Web Store

Install '[Private AI Assistant[(https://chromewebstore.google.com/detail/private-ai-assistant-runn/jojlpeliekadmokfnikappfadbjiaghp)]' from the Chrome Web Store.

From source

You should install node and pnpm to build the project.

First, install the dependencies:

pnpm install

Then, start the development server:

pnpm dev

Open your Chrome browser (i.e. chrome://extensions) and load the appropriate development build. For example, if you are developing for the chrome browser, using manifest v3, use: build/chrome-mv3-dev.

For further guidance, visit Plasmo's Documentation or create an issue.

Deployment

Making production build

Run the following:

pnpm build & pnpm package

This should create a production bundle for your extension, ready to be zipped and published to the stores.

Submit to the webstores

The easiest way to deploy your Plasmo extension is to use the built-in bpp GitHub action. Prior to using this action however, make sure to build your extension and upload the first version to the store to establish the basic credentials. Then, simply follow this setup instruction and you should be on your way for automated submission!

Debugging

Debug service worker

Open chrome://extensions and find the "Inspect views" section for the extension.

Inspect views

Memory usage for inference

Open Chrome > More Tools > Task Manager.

Task manager

Local storage for cached checkpoints

Run Chrome extension, open inspect, go to Application tab, find Local Storage section, and find the transformers-cache entry.

Local storage

References