# Transliteration

### 1. What?

Conversion of text from one **script** to another by replacing letters, such that the sound of words remains the same

### 2. Example

**"mera naam ankur hai"** =====> **"मेरा नाम अंकुर है"**

### 3. How is it useful?

1. Helps pronounce names from different languages since names are not translated

2. Helps in search with English keyboards

3. Can help someone in learning a foreign language sounds using native script

## Libraries

1. [Indic Transliteration](https://github.com/indic-transliteration/indic_transliteration_py)
2. [AI4Bharat Transliteration](https://github.com/AI4Bharat/IndianNLP-Transliteration)

## 1. Indic Transliteration

In [1]:
from indic_transliteration import sanscript
from indic_transliteration.sanscript import transliterate, SCHEMES

The package supports the following scripts:

- Bengali
- Devanagari
- Gujarati
- Kannada
- Malayalam
- Telugu
- Tamil
- Oriya
- Gurmukhi/ Punjabi/ Panjabi

and the following romanizations:

- HK = ‘hk’
- IAST = ‘iast’
- ITRANS = ‘itrans’
- OPTITRANS = ‘optitrans’
- KOLKATA = ‘kolkata’
- SLP1 = ‘slp1’
- VELTHUIS = ‘velthuis’
- WX = ‘wx’

Reference: https://indic-transliteration.github.io/indic_transliteration_py/build/html/indic_transliteration_sanscript.html

In [4]:
transliterate('merA nAma ankura hai', sanscript.ITRANS, sanscript.DEVANAGARI)

'मेरा नाम अन्कुर है'

In [5]:
%%timeit
transliterate('merA nAma ankur hai', sanscript.ITRANS, sanscript.DEVANAGARI)

45.9 µs ± 4.14 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


### Pros

1. Faster conversion
2. Supports english to indic scripts (9) and vice-versa
3. Supports indic to indic conversion

### Cons

1. Rule based
2. Does not handle running typing style. Input needs to follow script rules

## 2. AI4Bharat Transliteration

In [7]:
from ai4bharat.transliteration import XlitEngine

Supported Languages

- Bengali - বাংলা
- Gujarati - ગુજરાતી
- Hindi - हिंदी
- Kannada - ಕನ್ನಡ
- Konkani Goan - कोंकणी
- Maithili - मैथिली
- Malayalam - മലയാളം
- Marathi - मराठी
- Panjabi Eastern - ਪੰਜਾਬੀ
- Sindhi - سنڌي‎
- Sinhala - සිංහල
- Telugu - తెలుగు
- Tamil - தமிழ்
- Urdu - اُردُو

In [8]:
e = XlitEngine("hi")
out = e.translit_word("mera naam ankur hai", topk=5, beam_width=10)

Loading hi...


In [9]:
out

{'hi': ['मेरा नाम अंकुर है']}

In [10]:
%%timeit
out = e.translit_word("mera naam ankur hai", topk=5, beam_width=10)

993 ms ± 17.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Pros

1. Can handle running input style
2. Suppots english to indic scripts (14)
3. HTTP API for web application

### Cons

1. Slower than rule based library
2. Does not support indic to English or indic to indic conversion