This is an example Chrome extension for Transformers.js, a library for running LLMs in the browser, built on top of Plasmo.
Here is the link to the demo videos (intro and advanced usage of llm, multi modal, tts and reasoning). Each scenario can be found below.
- Integrate Transformers.js with Chrome extension
- Use modern web development tooling (TypeScript, Parcel, Tailwind CSS, Shadcn, etc.)
- Change generation parameters (e.g. max_tokens, temperature, top_p etc.)
- Load LLaMA variants
- Load other LLM models
- Release extension to Chrome Web Store
- Load multi modal LLM models
- Load Whisper (Speech-to-Text)
- Load DeepSeek R1 (Reasoning)
- Load OuteTTS (Text-to-Speech)
- SAM (Segment Anything Model), Text-classification, etc.
- Chat history (save to local storage, export to CSV)
- Call 3rd party LLM APIs
- Error handling
- Resource management (e.g. orchestrate and stop generations, unload models)
All the numbers below are measured on a MacBook Pro M1 Max with 32GB RAM.
Prompt: "Write python code to compute the nth fibonacci number."
Model | Throughput |
---|---|
Llama-3.2-1B (q4f16) | 40.3 tokens/sec |
Phi-3.5-mini (q4f16) | 32.9 tokens/sec |
SmolLM2-1.7B (q4f16) | 46.2 tokens/sec |
Qwen2.5-Coder-1.5B (q4f16) | 36.1 tokens/sec |
Janus 1.3B (q4f16) | 30.9 tokens/sec |
Whipser Base (fp32 + q4) | 30.5 tokens/sec |
DeepSeek R1 (q4f16) | 32.7 tokens/sec |
Install '[Private AI Assistant[(https://chromewebstore.google.com/detail/private-ai-assistant-runn/jojlpeliekadmokfnikappfadbjiaghp)]' from the Chrome Web Store.
You should install node
and pnpm
to build the project.
First, install the dependencies:
pnpm install
Then, start the development server:
pnpm dev
Open your Chrome browser (i.e. chrome://extensions
) and load the appropriate development build. For example, if you are developing for the chrome browser, using manifest v3, use: build/chrome-mv3-dev
.
For further guidance, visit Plasmo's Documentation or create an issue.
Run the following:
pnpm build & pnpm package
This should create a production bundle for your extension, ready to be zipped and published to the stores.
The easiest way to deploy your Plasmo extension is to use the built-in bpp GitHub action. Prior to using this action however, make sure to build your extension and upload the first version to the store to establish the basic credentials. Then, simply follow this setup instruction and you should be on your way for automated submission!
Open chrome://extensions
and find the "Inspect views" section for the extension.
Open Chrome > More Tools > Task Manager.
Run Chrome extension, open inspect
, go to Application
tab, find Local Storage
section, and find the transformers-cache
entry.
- Transformers.js Example
- Transformers.js V2 Chrome Extension
- Plasmo Documentation
- WebLLM and its Chrome Extension
- gpu.cpp
- huggingface/transformers.js#986
- microsoft/onnxruntime#20876
- https://github.com/ggaabe/extension
- https://github.com/xenova/whisper-web
- https://www.mathjax.org/
- Ilya Sutskever NeurIPS 2024 full talk (Youtube)