Index and search RSS feeds using semantic search techniques. This API provides endpoints to embed RSS feed entries into a vector database and search them based on semantic similarity.
Index RSS feeds by posting their URLs to the /embed endpoint. This will
parse the feed entries and store title, summary, published date, link, and
vector embeddings in the database. The vector embeddings are based on the
content of the title and summary.
The endpoint requires authorization with a bearer token, which must be set
as an environment variable SEMANTIC_RSS_SEARCH_API_TOKEN.
Search indexed RSS feeds by posting a query to the /search endpoint.
The search will return the top k entries that match the query based on
semantic similarity. The results include the title, summary, published date,
link, distance from the query vector, and the number of tokens of the indexed
content.
- FastAPI: For building the API endpoints.
- SQLite Vector: For storing vector embeddings in a SQLite database.
- Sentence Transformers: For generating vector embeddings from text.
- all-MiniLM-L6-v2: A pre-trained model for generating sentence and paragraph embeddings for semantic search.
An environment to run containerized applications, such as Docker or Kubernetes.
For local development, you need Python 3.11 or later.
Set up the necessary environment by running the script setup.sh. This will
create a virtual environment
and install the required dependencies.
bash setup.shFor running tests execute the setup script using the --with-tests flag.
This will install additional dependencies.
bash setup.sh --with-testsdocker run --detach \
--name semantic-rss-search \
--publish 8000:8000 \
--environment SEMANTIC_RSS_SEARCH_API_TOKEN=your_token_here \
--volume ./models:/app/models \
--volume ./db:/app/db \
ghcr.io/tschaefer/semantic-rss-search:latestExport the required environment variables.
export SEMANTIC_RSS_SEARCH_API_TOKEN=your_token_here
export NLTK_DATA_DIR=venv/nltk_data
export HF_HOME=modelsActivate the virtual environment.
source venv/bin/activateRun the production server using FastAPI and Uvicorn.
python run.pyFor development purposes, you can run FastAPI with hot reloading and additional logging information.
fastapi run app/main.pyNote
On first run, the application will download the pre-trained model.
To run the tests, ensure you have installed the additional dependencies, see
the Prerequisites section. Then execute the tests using pytest with a
temporary SQLite database.
SEMANTIC_RSS_SEARCH_DB=/tmp/semantic_rss_search.db pytest -v tests/
rm -f /tmp/semantic_rss_search.dbFind the API documentation at http://localhost:8000/redoc.
curl --silent --include \
--header "Authorization: Bearer token_48e46df9b9e3dc6251877724f8328c39da2158fc892846ab6710b1f29afe98eb" \
--json '{ "url": "https://www.theregister.com/headlines.atom" }' \
http://localhost:8000/embed
HTTP/1.1 201 Created
date: Wed, 28 May 2025 19:14:49 GMT
server: uvicorn
content-length: 14
content-type: application/json
{"entries":50}curl --silent \
--json '{ "query": "Is using AI an energy waste?", "k": 3 }' \
http://localhost:8000/search
{
"query": "Is using AI an energy waste?",
"k": 3,
"results": [
{
"title": "AI's enormous energy appetite can be curbed, but only through lateral thinking",
"summary": "...",
"published": 1748331006,
"link": "https://go.theregister.com/feed/www.theregister.com/2025/05/27/opinion_column_ai_energy/",
"distance": 0.7886583209037781,
"token": 117
},
{
"title": "'Some Signs of AI Model Collapse Begin To Reveal Themselves'",
"summary": "...",
"published": 1748433783,
"link": "https://slashdot.org/story/25/05/28/0242240/some-signs-of-ai-model-collapse-begin-to-reveal-themselves?utm_source=rss1.0mainlinkanon&utm_medium=feed",
"distance": 0.8932643532752991,
"token": 525
},
{
"title": "Nothing's Carl Pei Says Your Smartphone's OS Will Replace All of Its Apps",
"summary": "...",
"published": 1748433783,
"link": "https://mobile.slashdot.org/story/25/05/28/0316239/nothings-carl-pei-says-your-smartphones-os-will-replace-all-of-its-apps?utm_source=rss1.0mainlinkanon&utm_medium=feed",
"distance": 0.906494677066803,
"token": 421
}
]
}Contributions are welcome! Please fork the repository and submit a pull request. For major changes, open an issue first to discuss what you would like to change.
Ensure that your code adheres to the existing style and includes appropriate tests.
This project is licensed under the MIT License.