KPT: Kannada Pre-trained Transformer
-
Updated
Jul 7, 2024 - Python
KPT: Kannada Pre-trained Transformer
Finite-state script normalization and processing utilities
OCR Tamil is a powerful tool that can detect and recognize text in Tamil images with high accuracy on Natural Scenes
Code repository for "Introducing Airavata: Hindi Instruction-tuned LLM"
Resources and tools for Indian language Natural Language Processing
Indic evals for quantised models AWQ / GPTQ / EXL2
Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need
A pipeline for transliteration, spell correction, POS tagging and word sense disambiguation of Hinglish code mixed data to Hindi Devanagari script.
इंग्रजी ते मराठीचा कोश. English to Marathi thesaurus.
Python program to transliterate Malayalam text into ISO Latin script equivalents.
Code for the ACL 2020 Paper on Schwa Deletion in Hindi and Punjabi
ShowCase Fork; Open source transliteration models for Indian languages (Roman script to Native scripts)
Parse token from bangla text
Establish Semantic Relatedness across Languages Documentation - http://kshitijkarthick.github.io/tvecs
Handy Web App for performing OCR for Indian languages, a.k.a Indic Vision UI
Repository containing experimentation platform on how to train, infer on wav2vec2 models.
Generate large textual corpora for almost any language by crawling the web
Created a multilingual training corpus across 15 Indian languages (including English) by compiling different sources
transliteration for indic language
Add a description, image, and links to the indic-languages topic page so that developers can more easily learn about it.
To associate your repository with the indic-languages topic, visit your repo's landing page and select "manage topics."