Skip to content

willwade/morphologyAPIwLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Morphology API (LLM-backed)

FastAPI service that provides morphology endpoints (conjugations, lemmas, plurals, number spelling, etc.). The primary backend uses Simon Willison's llm library so you can swap between Gemini, Ollama, OpenAI, or any other supported provider. A lightweight deterministic rule engine ships with the service for offline or forced-deterministic flows: We want to extend this to HFSTs in the future so less is by LLM and more by these.

Features

  • /v1/conjugations/{lang}/{form} – full verb paradigms with metadata
  • /v1/lemmas/{lang}/{form} – lemma candidates + POS
  • /v1/plurals/{lang}/{form}, /v1/singulars/{lang}/{form} – noun/adj inflection
  • /v1/numbers/{lang}/{digits} – number → words
  • /v1/definitions, /v1/terms – dictionary-style lookups (LLM-driven today)
  • /v1/help/* – catalogs mirroring ULAPI (partsofspeech, tenses, etc.)
  • Hybrid orchestration: LLM first, deterministic rule engine fallback on low confidence
  • English deterministic backend implemented with HFST finite-state transducers (falls back gracefully if HFST is unavailable)
  • force_deterministic=true shortcut for rule-only responses (returns HTTP 422 if unavailable)

Getting Started

pipx install uv  # one-time, if uv is not already available
uv venv
source .venv/bin/activate
uv pip install -e '.[dev]'

Configure your preferred LLM provider. For Gemini:

export GEMINI_API_KEY="AIza..."
uv run llm install llm-gemini  # installs the Gemini plugin once per environment

Prefer not to keep secrets in your shell history? Drop the key into a file and point the service at it:

mkdir -p .secrets
echo "AIza..." > .secrets/gemini.key
echo "MORPHOLOGY_GEMINI_API_KEY_FILE=.secrets/gemini.key" >> .env

The settings loader will read the key on startup and expose it to the llm plugin automatically.

HFST finite-state support

The deterministic English backend now prefers HFST transducers. The Python bindings are pulled in automatically through hfst when you install the project dependencies. On platforms without prebuilt HFST wheels, the service will fall back to the legacy heuristic rules while still exposing the same API surface. To force HFST usage, ensure the package installs cleanly in your environment:

uv pip install hfst

If HFST bindings are present but unstable in your environment, disable them and stick to deterministic rules by setting:

export MORPHOLOGY_DISABLE_HFST=1

A bundled German HFST transducer (from https://master.dl.sourceforge.net/project/hfst/resources/morphological-transducers/hfst-german.tar.gz?viasf=1) is included under src/morphology_service/data/hfst. Rule-based German conjugation/lemmatization/pluralization will use this transducer when source_preference=rule or force_deterministic=true.

To validate that your HFST installation is healthy, run the diagnostics script (uses the project venv):

uv run python scripts/hfst_diagnose.py

At runtime you can confirm the backend selection via /v1/help/languages, which now annotates each language with its deterministic provider.

Run the API:

uv run uvicorn morphology_service.main:app --reload

Tip: using uv run … ensures you pick up the project virtualenv (where HFST is installed) instead of a system Python without bindings.

Example request:

curl "http://localhost:8000/v1/conjugations/en/slept?expand_compound=true&variety=en-GB"

Configuration

Settings are loaded via environment variables using the MORPHOLOGY_ prefix. Key options:

Env var Default Description
MORPHOLOGY_LLM_MODEL_NAME gemini-2.0-flash Model identifier passed to llm.get_model
MORPHOLOGY_LLM_TEMPERATURE 0.0 Temperature for generations
MORPHOLOGY_HYBRID_CONFIDENCE_THRESHOLD 0.8 Confidence cut-off before falling back to deterministic rules
MORPHOLOGY_MAX_LLM_RETRIES 2 Retries for transient LLM failures

Optional .env files are supported.

Tests

uv run pytest

tests/ contains coverage around the hybrid orchestration and rule engine heuristics. Add your own gold lists for target languages as you plug in richer rule datasets.

To exercise the English/German demo prompts against a running API instance, start the server locally and run:

RUN_DEMO_SMOKE=1 uv run pytest tests/test_demo_examples.py

You can point at a remote instance by setting MORPHOLOGY_BASE_URL (defaults to http://localhost:8000/v1).

Extending

  • Swap models by installing the relevant llm-* plugin and updating MORPHOLOGY_LLM_MODEL_NAME
  • Implement richer deterministic engines via the RuleEngine class (e.g., integrate Pattern, Apertium)
  • Add caching or persistence by wrapping calls in your storage layer of choice

Roadmap Ideas

  • JSON Schema validation + structured retries for LLM responses
  • More granular provenance (per conjugated form)
  • Batch endpoints (/v1/batch/*) for higher throughput
  • Sandboxed offline bundle with Ollama or llama.cpp backends

About

a API for word morphology

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors