@yawlabs/fetch-mcp

A comprehensive HTTP fetch MCP server for AI assistants. Bring-your-own client: runs as a stdio MCP server so any MCP-compatible client (Claude Code, Claude Desktop, Cursor, mcph, …) can fetch web content safely.

One click adds this to your mcp.hosting account so it syncs to every MCP client you use. Or install manually below.

What it gives the model

Tool	What it does
`http_get` / `http_head` / `http_options`	Bare HTTP requests with headers, auth, timeout, size cap, retry
`http_post` / `http_put` / `http_patch` / `http_delete`	Write-method HTTP with JSON or raw body
`fetch_html_to_markdown`	GET a page and convert to clean markdown (3–8× smaller than raw HTML)
`fetch_html_to_text`	GET a page and convert to plain text with block structure preserved
`fetch_reader`	Reader-mode extraction — isolates the article body and returns title + markdown
`fetch_meta`	Extract `<head>` metadata: title, description, OpenGraph, Twitter cards, JSON-LD, feeds, icons
`fetch_links`	Extract every outbound link, resolved to absolute URLs, classified internal/external
`fetch_sitemap`	Parse `sitemap.xml` (including gzipped and sitemap-index chaining)
`fetch_feed`	Parse an RSS 2.0 or Atom 1.0 feed into entries
`fetch_robots`	Parse a site's `robots.txt`, return the verdict for a given path & user-agent

Safety

SSRF protection is on by default. The server refuses requests to:

Loopback (127.0.0.0/8, ::1)
RFC1918 private ranges (10/8, 172.16/12, 192.168/16)
Link-local (169.254/16, fe80::/10) — including the cloud metadata endpoint 169.254.169.254
CGNAT (100.64/10)
Unique-local IPv6 (fc00::/7)
Multicast / broadcast
IPv4-mapped IPv6 (::ffff:0:0/96) re-checked against the IPv4 rules
Non-http/https schemes (file://, gopher://, javascript:, …)
Hostname localhost and any *.localhost

DNS is resolved once per redirect hop, every returned address is checked, and the verified IP is pinned into the HTTP dispatcher so the subsequent TCP connection dials that exact address — closing the DNS-rebinding TOCTOU window. Authorization headers are stripped on cross-origin redirects. A 302 to http://127.0.0.1 through a public host gets caught. Set allow_private_hosts: true per-request when you really do need internal access (e.g. development).

Install & run

# One-off
npx -y @yawlabs/fetch-mcp

# Or globally
npm i -g @yawlabs/fetch-mcp
fetch-mcp

Requires Node ≥20.

Configure in Claude Code / Claude Desktop

Add to your client's MCP config (usually claude_desktop_config.json or ~/.claude.json):

{
  "mcpServers": {
    "fetch": {
      "command": "npx",
      "args": ["-y", "@yawlabs/fetch-mcp"]
    }
  }
}

Or via mcph:

mcph add fetch

Tool reference

`http_get`, `http_post`, `http_put`, `http_patch`, `http_delete`, `http_head`, `http_options`

Common parameters:

Field	Type	Default	Meaning
`url`	string	—	Absolute URL
`headers`	object	—	Custom request headers
`timeout_ms`	int	`10000`	Request timeout
`max_bytes`	int	`5242880` (5 MiB)	Truncate body if larger
`max_redirects`	int	`5`	Redirect hops allowed
`retries`	int	`0`	Retry count on 408/425/429/5xx with backoff (honors `Retry-After`)
`user_agent`	string	`@yawlabs/fetch-mcp/<v>`	`User-Agent` override
`basic_auth`	`{username,password}`	—	Injects `Authorization: Basic …`
`bearer_token`	string	—	Injects `Authorization: Bearer …`
`allow_private_hosts`	bool	`false`	Bypass SSRF block
`decode_text`	bool	auto	When unset, auto-detects by response Content-Type (text for text/*, JSON, XML, JS, form-urlencoded; binary otherwise). Set explicitly `true` to force text decoding, `false` to force base64 in `body_base64`.

Body-capable tools (POST/PUT/PATCH/DELETE) also take:

Field	Type	Meaning
`body`	string	Raw request body
`body_json`	any	Structured body — encoded as JSON, `Content-Type: application/json` set automatically
`content_type`	string	Overrides `Content-Type`

Response shape:

{
  ok: boolean;
  status: number;
  statusText: string;
  url: string;              // final URL after redirects
  headers: Record<string,string>;
  body_text?: string;
  body_base64?: string;     // when decode_text=false
  json?: unknown;           // auto-parsed when response is application/json
  truncated?: boolean;      // set when max_bytes hit
  redirects?: string[];     // chain of intermediate URLs
  duration_ms: number;
  error?: string;
}

`fetch_html_to_markdown`

GET the URL, strip scripts/styles/iframes/svg/canvas plus <nav>, <footer>, <aside>, convert to atx-headed markdown with fenced code blocks and dash bullets. Intended for feeding pages into an LLM without blowing the context budget.

`fetch_html_to_text`

Same fetch, but emits plain text with block-level structure preserved as newlines. Useful when the model doesn't need markdown formatting.

`fetch_reader`

Isolates the main article body using, in order: <article>, <main>, itemprop="articleBody", common CMS class names (post-content, entry-content, etc.), then <body> as fallback. Returns:

{
  url: string;          // final URL after redirects
  title?: string;       // og:title, then <title>, then <h1>
  byline?: string;      // meta[name=author] / article:author
  wordCount: number;
  markdown: string;     // main content converted to markdown
}

`fetch_meta`

GET a URL and return its head metadata without downloading the full body (caps at 2 MiB by default):

{
  url: string;
  title?: string;
  description?: string;
  canonical?: string;
  language?: string;
  robots?: string;
  og:      Record<string, string>;       // first value per key (og:title, og:image, og:type, ...)
  twitter: Record<string, string>;       // first value per key
  article: Record<string, string>;       // first value per key
  ogAll:      Record<string, string[]>;  // only keys that appear > once (e.g. multiple og:image)
  twitterAll: Record<string, string[]>;
  articleAll: Record<string, string[]>;
  icons:   Array<{ rel: string; href: string; sizes?: string }>;
  feeds:   Array<{ href: string; title?: string; type?: string }>;   // RSS/Atom
  jsonLd:  unknown[];                    // parsed application/ld+json blocks
}

`fetch_links`

GET a page and return every <a href> with text, resolved to absolute URLs. Respects <base href>. Skips #, javascript:, mailto:, tel:, data:, file:. Each link is classified internal or external vs. the page host. Optional filter/dedupe/limit.

`fetch_sitemap`

Fetch a sitemap.xml or sitemap-index and return the URL list:

{
  sitemaps: string[];       // indexes followed, in order
  urlCount: number;
  truncated: boolean;       // hit max_urls
  urls: Array<{
    loc: string;
    lastmod?: string;
    changefreq?: string;
    priority?: number;
  }>;
}

Gzipped .xml.gz sitemaps are auto-decompressed. max_depth controls how many levels of sitemap-index to follow (default 1). Setting max_depth: 0 on a sitemap-index returns the index's childSitemaps list without fetching any child (useful to discover structure cheaply). Partial failures — one child sitemap 500s while others succeed — are returned under warnings rather than aborting the whole call.

`fetch_feed`

Parse an RSS 2.0 or Atom 1.0 feed:

{
  kind: "rss" | "atom" | "unknown";
  title?: string;
  description?: string;
  link?: string;
  updated?: string;
  entryCount: number;
  truncated: boolean;       // hit limit
  entries: Array<{
    title?: string;
    link?: string;
    id?: string;
    published?: string;
    updated?: string;
    author?: string;
    summary?: string;
    content?: string;
    categories?: string[];
  }>;
}

`fetch_robots`

Fetches <origin>/robots.txt, parses it, and returns:

{
  robotsUrl: string;
  status: number;
  userAgent: string;
  path: string;
  allowed: boolean;
  matchedRule: string | null;  // the longest-match Allow/Disallow that decided it
  crawlDelay: number | null;   // from the matched group
  sitemaps: string[];          // top-level Sitemap: declarations
  rawRobotsText: string;       // first 512KB
}

The parser follows Google's rules: longest match wins, * is a wildcard segment, $ anchors the end of the path, specific user-agent group beats the * wildcard group when the UA matches (comparison uses the length of the actually-matched agent token, not the group's first agent). Allow beats Disallow on equal-length ties.

Development

npm install
npm run build
npm test         # vitest
npm run lint     # biome
npm run typecheck

Tests spin up a local loopback HTTP server on 127.0.0.1:0 to exercise the real request/response path — no mocking of HTTP. SSRF tests verify that the default-deny still applies to that local server unless the request opts into allow_private_hosts.

License

MIT © Yaw Labs

Links

npm: https://www.npmjs.com/package/@yawlabs/fetch-mcp
issues: https://github.com/YawLabs/fetch-mcp/issues
Yaw Labs: https://mcp.hosting

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
biome.json		biome.json
package-lock.json		package-lock.json
package.json		package.json
release.sh		release.sh
server.json		server.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@yawlabs/fetch-mcp

What it gives the model

Safety

Install & run

Configure in Claude Code / Claude Desktop

Tool reference

`http_get`, `http_post`, `http_put`, `http_patch`, `http_delete`, `http_head`, `http_options`

`fetch_html_to_markdown`

`fetch_html_to_text`

`fetch_reader`

`fetch_meta`

`fetch_links`

`fetch_sitemap`

`fetch_feed`

`fetch_robots`

Development

License

Links

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

@yawlabs/fetch-mcp

What it gives the model

Safety

Install & run

Configure in Claude Code / Claude Desktop

Tool reference

http_get, http_post, http_put, http_patch, http_delete, http_head, http_options

fetch_html_to_markdown

fetch_html_to_text

fetch_reader

fetch_meta

fetch_links

fetch_sitemap

fetch_feed

fetch_robots

Development

License

Links

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`http_get`, `http_post`, `http_put`, `http_patch`, `http_delete`, `http_head`, `http_options`

`fetch_html_to_markdown`

`fetch_html_to_text`

`fetch_reader`

`fetch_meta`

`fetch_links`

`fetch_sitemap`

`fetch_feed`

`fetch_robots`

Packages