Skip to content

AniDB title embeddings semantic search API, intended for Plex and HamaTV.

License

Notifications You must be signed in to change notification settings

khell/anidb-semantic-search-api

Repository files navigation

What is this?

This repo contains a simple Python Flask webserver that hosts a single API:

/api/anidb/id?name={series_name}

This API uses a pre-generated Pytorch embedding and Huggingface dataset for AniDB series titles from this Huggingface repo: https://huggingface.co/datasets/khellific/anidb-series-embeddings. The API loads the embeddings into memory, generates a new embedding of a user's query from the same sentence-transformers model and then performs a cosine similarity search of the user query on those embeddings and returns the highest ranked matches' mapped AniDB ids as JSON of the form:

[{ id: "anidb-id", "name": "anidb entry match title", "score": "similarity score" }]

By default, the API will return up to five matches.

This API is intended to be used with my forked version of the HamaTV Plex agent to match anime series with AniDB entries, allowing users to disregard the typical naming conventions required for that agent to normally work.

Note that the embeddings obviously need to be updated (and you need to download new versions) to keep this server up to date if you choose to run it yourself (see below).

Do I need to run it myself?

I'm hosting a version of it (and keeping it updated where possible) on spare capacity here:

https://anidb.khell.net/api/anidb/id

It is behind Cloudflare so you may get rate-limited. I make no guarantees about its availability, reliability, latency or otherwise, and you should understand that while I don't explicitly retain any logs they are kept in Docker memory for the lifetime of the container (so I can theoretically see what you query).

Running manually

  1. Setup a virtual environment with Python 3.10.9 (other versions will most likely work, but I didn't test them).
  2. Install requirements: pip install -r requirements.txt
  3. If you are running on an Apple Silicon Mac:
gunicorn 'main:app' --workers 1 --timeout 60 --bind 127.0.0.1:8080
  1. Otherwise, you must set TORCH_DEVICE as an environment variable to either cpu or cuda (if available). On Unix systems, you can launch like this:
TORCH_DEVICE=cpu gunicorn 'main:app' --workers 1 --timeout 60 --bind 127.0.0.1:8080
  1. You may want to configure TRUST_X_FORWARDED to any integer n, where n is the number of reverse proxies you are running behind (if any).
  2. First startup may be slow, as embeddings and dataset must be downloaded from Huggingface.

Running with Docker

  1. You can just use the prebuilt image with Docker Compose: docker compose up -d
  2. You might want to change the TORCH_DEVICE environment variable in the Compose file. It's set to run on cpu by default.
  3. Note that mps is not available through Docker even if running on Apple Silicon: pytorch/pytorch#81224
  4. By default TRUST_X_FORWARDED is set to trust reverse proxies to a depth of 1. This is suitable for the default Compose configuration.

Increasing number of results

  1. Set RESULTS_COUNT environment variable to an integer value n for n results.

About

AniDB title embeddings semantic search API, intended for Plex and HamaTV.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published