whichlang

whichlang is a Python library for identifying the language of the given text for Indian languages.

Installation

Use the package manager pip to install whichlang.

pip install whichlang

Usage

from whichlang import whichlang as wl

f = open('sample-test-files\\sample-hindi.txt','r')
data = f.read()

# returns tuple of top 3 probable languages, first one being most probable language
print (wl.which_lang(data))
>>> ('Hindi', 'Marathi', 'Punjabi') #Hindi is most probable.

# For training a language model
# assamese.txt is train data
# Assamese is the language model created
python train_lang_models.py -f train-data\as\assamese.txt -l Assamese

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Available Languages

Hindi, Telugu, Tamil, Kannada, Malayalam, Punjabi, Marathi, Gujarati, Oriya, Assamese.

Acknowledgements

We would like to thank the Leipzig Corpora collection where we collected data for training models. Dirk Goldhahn, Thomas Eckart and Uwe Quasthoff (2012): Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), 2012
whichlang is based on N-gram based Text categorization: Cavnar, William B., and John M. Trenkle. "N-gram-based text categorization." Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval. Vol. 161175. 1994. The same approach was used in library langdetect. We found this approach quite effective and wanted to explore for Indian languages. In whichlang, we train, optimize and make models readily available for Indian languages since these languages have been less explored.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
whichlang		whichlang
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whichlang

whichlang

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

setup.py

setup.py

Repository files navigation

whichlang

Installation

Usage

Contributing

License

Available Languages

Acknowledgements

About

Releases

Packages

Languages

License

xtraspeed/whichlang

Folders and files

Latest commit

History

Repository files navigation

whichlang

Installation

Usage

Contributing

License

Available Languages

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages