What is this?

This repo contains a simple Python Flask webserver that hosts a single API:

/api/anidb/id?name={series_name}

This API uses a pre-generated Pytorch embedding and Huggingface dataset for AniDB series titles from this Huggingface repo: https://huggingface.co/datasets/khellific/anidb-series-embeddings. The API loads the embeddings into memory, generates a new embedding of a user's query from the same sentence-transformers model and then performs a cosine similarity search of the user query on those embeddings and returns the highest ranked matches' mapped AniDB ids as JSON of the form:

[{ id: "anidb-id", "name": "anidb entry match title", "score": "similarity score" }]

By default, the API will return up to five matches.

This API is intended to be used with my forked version of the HamaTV Plex agent to match anime series with AniDB entries, allowing users to disregard the typical naming conventions required for that agent to normally work.

Note that the embeddings obviously need to be updated (and you need to download new versions) to keep this server up to date if you choose to run it yourself (see below).

Do I need to run it myself?

I'm hosting a version of it (and keeping it updated where possible) on spare capacity here:

https://anidb.khell.net/api/anidb/id

It is behind Cloudflare so you may get rate-limited. I make no guarantees about its availability, reliability, latency or otherwise, and you should understand that while I don't explicitly retain any logs they are kept in Docker memory for the lifetime of the container (so I can theoretically see what you query).

Running manually

Setup a virtual environment with Python 3.10.9 (other versions will most likely work, but I didn't test them).
Install requirements: pip install -r requirements.txt
If you are running on an Apple Silicon Mac:

gunicorn 'main:app' --workers 1 --timeout 60 --bind 127.0.0.1:8080

Otherwise, you must set TORCH_DEVICE as an environment variable to either cpu or cuda (if available). On Unix systems, you can launch like this:

TORCH_DEVICE=cpu gunicorn 'main:app' --workers 1 --timeout 60 --bind 127.0.0.1:8080

You may want to configure TRUST_X_FORWARDED to any integer n, where n is the number of reverse proxies you are running behind (if any).
First startup may be slow, as embeddings and dataset must be downloaded from Huggingface.

Running with Docker

You can just use the prebuilt image with Docker Compose: docker compose up -d
You might want to change the TORCH_DEVICE environment variable in the Compose file. It's set to run on cpu by default.
Note that mps is not available through Docker even if running on Apple Silicon: pytorch/pytorch#81224
By default TRUST_X_FORWARDED is set to trust reverse proxies to a depth of 1. This is suitable for the default Compose configuration.

Increasing number of results

Set RESULTS_COUNT environment variable to an integer value n for n results.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
anidb_query_tool.py		anidb_query_tool.py
docker-build.sh		docker-build.sh
docker-compose.yml		docker-compose.yml
logger.py		logger.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What is this?

Do I need to run it myself?

Running manually

Running with Docker

Increasing number of results

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

khell/anidb-semantic-search-api

Folders and files

Latest commit

History

Repository files navigation

What is this?

Do I need to run it myself?

Running manually

Running with Docker

Increasing number of results

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages