Skip to content

textwizard-dev/WizardLangID

Repository files navigation

WizardLangID Banner


WizardLangID

PyPI - Version PyPI - Downloads/month License

WizardLangID is a Python library for Language identification via character n-gram profiles. Candidate gating guided by priors and linguistic cues, then probability estimation for each language. Supports 161 languages. Returns a top-1 ISO code or a probability-ordered list.


Contents


Installation

Requires Python 3.9+.

pip install wizardlangid

Quick start

import wizardlangid as wl

text = "hello world"
lang = wl.lang_detect(text, return_top1=True)
print(lang) 

Language detection

Parameters

  • text: Input string (Unicode).
  • top_k: Number of candidates to return (default 3).
  • profiles_dir: Override the bundled profiles directory.
  • use_mmap: If True, memory-map the profile tries (lower RAM; first access may be slightly slower).
  • return_top1: If True, return only the best language code; otherwise a list of (lang, prob).

Examples

Top-1 (single code)

import wizardlangid as wl

text = "Ciao, come stai oggi?"
lang = wl.lang_detect(text, return_top1=True)
print(lang) 

Output

it

Top-k distribution

import wizardlangid as wl

text = "The quick brown fox jumps over the lazy dog."
langs = wl.lang_detect(text, top_k=5, return_top1=False)
print(langs) 

Output

[('en', 0.9999376335362183), ('mg', 4.719212057614953e-05), ('fy', 1.4727973350205069e-05), ('rm', 2.8718519851832537e-07), ('la', 1.5918465665694727e-07)]

Batch example

import wizardlangid as wl

tests = [
    "これは日本語のテスト文です。",
    "Alex parle un peu français, aber nicht so viel.",
    "¿Dónde está la estación de tren?",
]
for s in tests:
    print("TOP1:", wl.lang_detect(s, return_top1=True))

Output

TOP1: ja
TOP1: fr
TOP1: es

Custom profiles & mmap

from pathlib import Path
import wizardlangid as wl

langs = wl.lang_detect(
    "Buongiorno a tutti!",
    profiles_dir=Path("/opt/wizardlangid/profiles"),  # custom profiles
    use_mmap=True,                                   # lower RAM
    top_k=3,
)
print(langs)

License

AGPL-3.0-or-later.

RESOURCES


Contact & Author

Author: Mattia Rubino
Email: textwizard.dev@gmail.com

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published