Skip to content

tbro/rakers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rakers

Crates.io docs.rs

A lightweight, single-binary JS renderer for JavaScript SPAs and SSR sites, where startup latency (milliseconds vs. 1-2 seconds) and memory footprint (~10 MB vs. ~300 MB) matter more than compatibility breadth.

rakers renders JavaScript into HTML. Give it an HTML file, a URL, or a bare JS script and it returns the post-execution HTML — including content rendered by React, Vue, Angular, Svelte, Preact, Mithril, Elm, Riot, and other JS frameworks.

Built on html5ever (Servo's HTML5 parser) with a choice of JS engine: QuickJS via rquickjs (default) or boa_engine (pure-Rust, no C compiler required).

Install

Pre-built binaries (recommended)

Download the latest release binary for your platform from the releases page:

Platform Binary
Linux x86-64 rakers-linux-x86_64
macOS Apple Silicon rakers-macos-aarch64
# Linux example
curl -L https://github.com/tbro/rakers/releases/latest/download/rakers-linux-x86_64 -o rakers
chmod +x rakers
sudo mv rakers /usr/local/bin/

Build from source

Requires Rust and a C compiler (for the default QuickJS engine).

cargo install --path .

For a pure-Rust build without a C compiler, use the boa engine instead:

cargo install --path . --no-default-features --features boa

Usage

rakers [OPTIONS] [INPUT]

INPUT is a file path, an http/https URL, or omit to read from stdin.

Input type Example
URL rakers https://example.com
HTML file rakers page.html
JS file rakers script.js
stdin echo '<script>document.write("hi")</script>' | rakers

By default output goes to stdout. Use -o to write to a file:

rakers https://example.com -o rendered.html

Options

Flag Description
-o FILE Write output to FILE instead of stdout
-A UA Set the User-Agent header for all HTTP requests
-H "Name: Value" Add a custom request header (repeatable)
--clean Strip <script> elements and unwrap <noscript> — see Clean mode
--pretty Format output HTML with two-space indentation
--json Emit {"raw_bytes":N,"rendered_bytes":N,"html":"..."} instead of bare HTML
--diff Show a unified diff of raw vs rendered HTML (both sides are pretty-printed first)
--selector SELECTOR Filter output to elements matching a CSS selector; multiple matches are newline-separated
--max-scripts N Limit the number of remote <script src> fetches (inline scripts are not counted)
--timeout SECS Per-script wall-clock timeout in seconds; fractions allowed (e.g. 0.5). Default: 30
--no-timeout Remove the per-script timeout entirely (conflicts with --timeout)
--verbose Print informational messages to stderr: [fetch], [skip], [console], [module-shim]
--forward-headers Forward custom -H headers on XHR requests made by page scripts (off by default — see note below)

Note on custom headers: Headers passed via -H (including Authorization, cookies, and API keys) are sent on the page fetch and any external script fetches, but are not forwarded on XHR requests the page's JavaScript initiates. Use --forward-headers to opt in to forwarding them on XHR too — but avoid this when rendering untrusted HTML, as page scripts could trigger XHR requests to arbitrary cross-origin destinations carrying your credentials.

How it works

  1. Fetches the page (or reads from file/stdin)
  2. Parses HTML with html5ever into a DOM tree
  3. Collects <script> tags — inline and external (src="...") — and fetches any external scripts
    • External scripts that open with import/export (ES module files requiring a module loader) are automatically skipped; self-contained bundles tagged type="module" still execute
    • Cloudflare Rocket Loader (type="<hash>-text/javascript") is recognized and executed
  4. Executes all scripts in order in a sandboxed JS context with browser globals stubbed out
  5. Flushes any deferred callbacks (setTimeout, requestAnimationFrame, MessageChannel, queueMicrotask) so async-rendered frameworks have a chance to run
  6. Reads back document.body.innerHTML and serializes the final HTML
    • Large server-rendered bodies (SSR sites) are preserved when the JS-rendered body is substantially smaller, avoiding measurement/analytics divs from clobbering real content

.js files are automatically wrapped in a minimal HTML document before processing.

console.log, console.warn, and console.error print to stderr with a [console] prefix when --verbose is set. Script errors are non-fatal — execution continues with the next script.

Output modes

--pretty

Pretty-print the rendered HTML with two-space indentation. Block elements each start on their own line; inline elements and their content stay together.

rakers --pretty https://example.com

--json

Emit a JSON object useful for scripting and size comparisons:

{"raw_bytes":645,"rendered_bytes":4210,"html":"<html>..."}

--json and --pretty can be combined — the HTML field will contain pretty-printed, JSON-escaped HTML.

--diff

Show a unified diff of the raw vs rendered HTML. Both sides are pretty-printed before diffing for a readable result:

rakers --diff https://example.com/spa

--selector

Extract specific elements from the rendered output using a CSS selector. All matching elements are printed, newline-separated:

# Extract just the article elements from a news site
rakers --selector "article" https://example.com

# Combine with --pretty for readable output
rakers --selector "#root" --pretty https://example.com/spa

Returns an empty string (exit 0) when no elements match. Returns an error for an invalid selector.

Clean mode

--clean applies a post-processing pass that produces a static, crawlable snapshot — similar to what prerendering services (Prerender.io, rendertron) deliver to search-engine bots:

  • Removes all <script> elements (inline and external)
  • Removes <link rel="modulepreload"> and <link rel="preload" as="script">
  • Unwraps <noscript> — strips the tags but keeps the inner content, so crawlers see any fallback markup (meta redirects, image links, etc.)
rakers --clean https://example.com -o static.html

The output is self-contained HTML with no executable code — safe to serve directly to crawlers or store as a static snapshot.

JS engine choice

rakers supports two JS engines selectable at compile time.

rquickjs (default) boa
Build deps Requires a C compiler Pure Rust, no C compiler
ES standard ES2023 ES2021 (partial)
Real-world bundles Good Limited — may stack-overflow on large bundles
React / Vue SPAs Works Often hits stack limits
When to use Real-world sites (default) CI without C toolchain

Building

# rquickjs (default — recommended)
cargo build
cargo install --path .

# boa (pure Rust, no C compiler needed)
cargo build --no-default-features --features boa
cargo install --path . --no-default-features --features boa

Only one engine can be enabled at a time; the build will fail with a clear error if both or neither are selected.

Running tests

Unit tests run with either engine:

cargo test                                       # rquickjs (default)
cargo test --no-default-features --features boa  # boa

Integration tests that fetch real SPAs require rquickjs (boa overflows the native stack on large React/Rocket Loader bundles):

cargo test --test integration

Browser environment

The following globals are stubbed so typical JS bundles run without errors:

  • documentcreateElement, getElementById, querySelector / querySelectorAll (including compound comma-separated selectors and script[type="X"] queries), body, head, currentScript, and the full DOM manipulation API (appendChild with move semantics, insertBefore, removeChild, setAttribute, innerHTML, firstChild, lastChild, childNodes, etc.)
  • window.location — all fields (href, pathname, hostname, protocol, host, port, search, hash, origin) are parsed from the page URL; setting hash fires onhashchange via the deferred-callback queue; assign, replace, and reload are no-ops
  • window.historypushState and replaceState update history.state; navigation methods are no-ops
  • windownavigator, screen, performance, localStorage, sessionStorage, matchMedia, getComputedStyle, and all standard event/observer constructors
  • URL / URLSearchParams — relative URL resolution against the page URL; searchParams with full get/set/has
  • fetch — returns Promise.resolve(response) with an empty 200 OK body; .then() chains run, apps don't crash, but no data is loaded
  • XMLHttpRequest — synchronous mode (open(method, url, false)) fetches via the same HTTP client as the main page fetch; async mode schedules onload / onreadystatechange callbacks with the real response body; they fire during the deferred-callback flush pass
  • Script injectionappendChild evals child.text when the child is a <script> element, supporting compilers (e.g. Riot 2.x) that register components via dynamic script injection
  • DOMException / customElements — Web Components registry and DOM exception constructor
  • process — Node.js-style globals for webpack/Vite bundler compatibility
  • TimerssetTimeout, setInterval, requestAnimationFrame, queueMicrotask, and MessageChannel callbacks are collected and flushed after scripts finish
  • import() — dynamic imports return Promise.resolve({}) (a stub module); .then() chains run but no real module is loaded

Comparison

rakers headless Chrome Playwright / Puppeteer Splash
JS compatibility Good (QuickJS / ES2023) Full Full Full (WebKit)
Requires browser No Yes Yes Yes (via Docker)
Startup time ~10 ms ~1–2 s ~1–2 s ~500 ms
Memory ~10 MB ~150–300 MB ~150–300 MB ~200 MB
Network calls from JS No (stubbed) Yes Yes Yes
CSS / layout No Yes Yes Yes
Embeddable as library Yes (Rust crate) No No No
Installation Single binary Chrome + chromedriver Browser + Node Docker image
Language Rust Any JS / many bindings Python / Lua

When to use rakers — fast HTML extraction in a scraping pipeline, CI environments without a browser, embedding in a Rust service, or anywhere startup latency and memory footprint matter more than pixel-perfect rendering.

When to use a headless browser — pages that rely on CSS-driven layout, canvas, WebGL, WebSockets, or JavaScript that makes authenticated network requests during render.

Demo

TodoMVC React is the canonical demo. The server returns a 645-byte skeleton:

<section class="todoapp" id="root"></section>

rakers executes the React bundle and returns the fully rendered app:

<div id="root">
  <header class="header">
    <h1>todos</h1>
    <div class="input-container">
      <input class="new-todo" type="text">
      ...
    </div>
  </header>
  ...
</div>

Compatibility

Tested against real-world sites with rquickjs:

Site Framework Result
react.dev Next.js (SSR) ✓ no errors
svelte.dev SvelteKit (SSR) ✓ no errors
vuejs.org Vite (SSR) ✓ no errors
tailwindcss.com Next.js (SSR) ✓ no errors
remix.run Remix (SSR) ✓ no errors
jsbench.me React SPA ✓ full render
babylonbee.com Cloudflare Rocket Loader ✓ articles intact
linear.app Next.js ✓ renders (1 minor error)
github.com Custom SSR ✓ renders (4 minor errors)

TodoMVC sweep

19 of 20 TodoMVC examples render correctly. The sweep runs automatically on every push via the todomvc-compat CI job.

Framework Result
React ✓ full render
React + Redux ✓ full render
Vue ✓ full render
Preact ✓ full render
Svelte ✓ full render
Angular ✓ full render
Mithril ✓ full render
Elm ✓ full render
Riot ✓ template rendered
Ember ✓ app shell rendered
Backbone, KnockoutJS, jQuery, Dojo, Aurelia, Backbone Marionette, Vanilla ES5/ES6, Web Components ✓ prerendered content preserved
Lit ✗ native ES-module bundle (no IIFE fallback) — needs a full module loader

About

A lightweight, single-binary JS renderer for JavaScript SPAs and SSR sites, where startup latency (milliseconds vs. 1-2 seconds) and memory footprint (~10 MB vs. ~300 MB) matter more than compatibility breadth.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors