A self-hosted scholarly RSS feed generator that pulls article metadata from CrossRef, enriches abstracts via Semantic Scholar and OpenAlex, and serves RSS/Atom feeds compatible with FreshRSS or any feed reader.
- Journal autocomplete — instant local search across 167K+ journals (SQLite FTS5), with online CrossRef fallback
- Multiple add methods — by name, ISSN, DOI lookup, or bulk ISSN import
- Tiered abstract enrichment — CrossRef for articles, then Semantic Scholar batch API (one request for all DOIs), then OpenAlex individual lookups for anything still missing
- RSS/Atom/JSON feeds — one feed per journal, plus OPML export for bulk import into feed readers
- Daily auto-refresh — background scheduler with configurable interval
- Configurable lookback — set how far back to fetch articles (default 365 days), adjustable from the UI
- MCP server — expose cached articles to LLMs via Model Context Protocol
- Docker ready — single container, bind-mount data directory, reverse-proxy friendly
-
Clone and configure:
cd ScholRSS cp docker-compose.example.yml docker-compose.yml # Edit docker-compose.yml — set MAILTO, OPENALEX_API_KEY, BASE_URL
-
Get API keys:
- OpenAlex (required) — free key from https://openalex.org/settings/api
- Semantic Scholar (optional, improves abstract coverage) — from https://www.semanticscholar.org/product/api#api-key
-
Build and run:
docker compose up -d --build
-
Open the UI at
http://localhost:8844 -
Add journals — start typing a name in the autocomplete box, or switch tabs for ISSN/DOI/bulk import
-
Add feeds to your reader:
- Individual:
http://localhost:8844/feed/0028-0836 - RSS format:
http://localhost:8844/feed/0028-0836?format=rss - Bulk: download OPML from
http://localhost:8844/opml
- Individual:
| Variable | Default | Description |
|---|---|---|
MAILTO |
scholrss@example.com |
Email for polite API pool access (CrossRef/OpenAlex) |
OPENALEX_API_KEY |
(empty) | Free API key from OpenAlex (required) |
SEMANTIC_SCHOLAR_API_KEY |
(empty) | Optional API key for faster Semantic Scholar access |
BASE_URL |
http://localhost:8844 |
External URL for feed self-links |
INTERNAL_URL |
(empty) | Optional internal URL (e.g. http://scholrss:8844) shown alongside BASE_URL for container-to-container readers that bypass reverse-proxy auth |
UPDATE_INTERVAL_HOURS |
24 |
Hours between automatic feed refreshes (used when no daily refresh time is set) |
LOOKBACK_DAYS |
365 |
Default lookback window (overridden by UI setting) |
MAX_ARTICLES |
100 |
Max articles fetched/cached per journal (1–1000; overridden by UI setting) |
DATA_DIR |
/data |
Where journals config and cache are stored |
All runtime data lives in the bind-mounted ./data directory:
journals.json— tracked journalssettings.json— UI-configurable settings (lookback days, etc.)cache/— cached article data per journal (JSON files)
The journal autocomplete database (journals/journals.db, ~58MB) is baked into the Docker image. Use the Update DB button in the UI to rebuild it from online sources.
ScholRSS includes an MCP (Model Context Protocol) server so LLMs can query your cached research.
Tools available:
| Tool | Description |
|---|---|
list_journals |
List tracked journals with article/abstract counts |
latest_articles(count, journal?) |
Most recent articles, optionally filtered by journal |
search_articles(query, count?) |
Keyword search across titles and abstracts |
Setup for Claude Code / Claude Desktop:
{
"mcpServers": {
"scholrss": {
"command": "python3",
"args": ["/path/to/ScholRSS/mcp_server.py"],
"env": {
"SCHOLRSS_DATA_DIR": "/path/to/ScholRSS/data"
}
}
}
}| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Web UI |
/feed/{issn} |
GET | Atom feed (?format=rss for RSS 2.0) |
/feed/{issn}/json |
GET | Raw JSON feed data |
/opml |
GET | OPML export of all feeds |
/api/autocomplete?q= |
GET | Local journal autocomplete (FTS5) |
/api/search/journal?q= |
GET | Online journal search via CrossRef |
/api/search/doi?doi= |
GET | Look up journal from a DOI |
/api/journal |
POST | Add a journal {issn, title, publisher} |
/api/journal/bulk |
POST | Bulk import {issns: ["1234-5678", ...]} |
/api/journal/{issn} |
DELETE | Remove a journal |
/api/journal/{issn}/filter |
PUT | Set/clear keyword+author filter {keywords: [...], authors: [...], match: "any"|"all"} |
/api/journal/filtered |
POST | Create a new filtered feed variant {issn, title, publisher, label, keywords, authors, match} — lets you stack multiple filters on the same ISSN |
/api/refresh/{issn} |
POST | Refresh one journal |
/api/refresh-all |
POST | Refresh all journals |
/api/settings |
GET/PUT | Read/update settings (e.g. {lookback_days: 365, refresh_hour_utc: 10, refresh_minute_utc: 30}) |
/api/logs |
GET | Get log file tail (?lines=N, ?level=ERROR) |
/api/logs |
DELETE | Clear log file |
/api/update-journal-db |
POST | Rebuild journal autocomplete database |
Mega-journals and preprint servers like SSRN Electronic Journal (1556-5068) or arXiv (2331-8422) publish thousands of papers per week. Fetching them unfiltered would flood your reader and waste API calls. Instead, click the ⌕ Filter button on the journal card and set:
- Keywords — comma-separated terms matched against title + abstract (OR by default, switch to AND if you need all)
- Authors — comma-separated name fragments matched against author display names (OR)
When a filter is set, ScholRSS switches that journal's fetch path from CrossRef to OpenAlex's /works endpoint with server-side filtering:
filter=primary_location.source.issn:1556-5068,
from_publication_date:2025-04-01,
title_and_abstract.search:privacy|regulation,
authorships.author.display_name.search:jane+smith
Only matching works transit the wire — one request per refresh, no client-side culling. OpenAlex usually returns abstracts inline; anything still missing goes through the normal Semantic Scholar → OpenAlex enrichment fallback. Clearing all fields reverts the journal to the standard CrossRef pipeline.
Two additional options improve coverage for tricky sources:
- OpenAlex source ID — some sources (notably SSRN) don't map cleanly from ISSN to the correct OpenAlex source record. If results seem too few, supply the OpenAlex source ID directly (e.g.
S4210172589for SSRN Electronic Journal — find it atopenalex.org/sources). When set, the query usesprimary_location.source.id:instead ofprimary_location.source.issn:. - Also search Semantic Scholar — enables a parallel keyword search via the Semantic Scholar API, which crawls SSRN directly and catches papers that never get DOIs or CrossRef registration. Results are merged and deduplicated by DOI and title. Use the S2 venue field (e.g.
SSRN) to restrict S2 results to a specific venue.
SSRN's ISSN (1556-5068) resolves to the wrong OpenAlex source by default. For reliable results:
- Set OpenAlex source ID to
S4210172589 - Enable Also search Semantic Scholar
- Set S2 venue to
SSRN
This queries both OpenAlex (with the correct source ID) and Semantic Scholar (restricted to SSRN papers), giving the best coverage of SSRN's mix of DOI'd and non-DOI'd uploads.
To track several independent slices of the same mega-journal (e.g. one SSRN feed for "privacy" and another for "AI safety"), open the Add Journal panel and switch to the Filtered feed tab. Give each variant a label and its own keywords/authors — every submission creates a separate entry keyed by <issn>__<slug> with its own cache, feed URL (/feed/1556-5068__privacy, /feed/1556-5068__ai_safety, …), and OPML line. The original unfiltered entry keeps working unchanged.
For each journal refresh:
- CrossRef — fetch up to 100 recent articles (primary source, freshest metadata)
- Semantic Scholar — single batch POST for all DOIs missing abstracts (very efficient, up to 500 DOIs per request)
- OpenAlex — individual DOI lookups for anything still missing (rate-limited at 150ms per call)
ScholRSS is polite to upstream APIs:
- Semantic Scholar: 1 batch request per journal
- OpenAlex: 150ms between individual DOI lookups
- CrossRef: 500ms delay after fetching works
- 1s delay between journals during bulk refresh
- All requests include
mailto/ API keys for polite pool access
The Lookback setting (default 365 days) controls which articles are fetched and displayed. ScholRSS uses publication date rather than CrossRef's index date to determine relevance:
- CrossRef queries use
from-pub-date(publication date) instead offrom-index-date(when the record was added/modified in CrossRef) - At ingestion time, works published before the lookback window are dropped regardless of source (CrossRef, OpenAlex, or Semantic Scholar)
This prevents decades-old papers that were recently back-indexed or assigned DOIs from flooding your feed.
pip install pytest
pytest tests/ -v # all tests
pytest tests/ -v -m "not integration" # unit tests only (fast, no API calls)
pytest tests/ -v -m integration # integration tests (hits real APIs)ScholRSS works with Cosmos Server's reverse proxy. Either:
- URL mode: Add a route in Cosmos pointing to
scholrss:8844 - Labels mode: Uncomment the
cosmos.hostname/cosmos.portlabels indocker-compose.yml
Set BASE_URL to match your external hostname.