Tamizh

Machine/Deep Learning with the Tamil (Oldest Language)

The aim is to create ML specific applications and enhancements with Tamil Language.

W2V_Model

Code Usage

tawiki_convert_bz2_to_xml.ipynb

Extracts the bz2 compressed article to xml file. Mainly created to avoid large transactions to cloud. Upload the compressed file and extract in the cloud. Download the article page from WikiMedia Tamil Data

tawiki_data_extraction_cleaning.ipynb

Extracts each articles "Title" and clean "Content" from the XML tree and exports to Tabular Data for easy use. Used many regex rules to clean the data.

Tamizh_Word2Vec.ipynb

Word embeddings for Tamil words using gensim library. This creates the similarity metrics between the words.

Training model parameter can be adjusted to extend its usage.

Phonetic_Translation

The idea and chars are copied from this repo. https://github.com/wickkiey/open-tamil/

This approach is to convert tamil to english and vice versa (Phonetic translation).

data folder has pickle file, which can be used to continue further.

Note : Work in progress

Word Mappings

The idea is to create a mapping between the words.

root word
synonyms
antonyms
related words

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.vscode		.vscode
Phonetic_Translation		Phonetic_Translation
Spell_Check		Spell_Check
Utilities		Utilities
W2V_Model		W2V_Model
Words		Words
data_preparation		data_preparation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vscode

.vscode

Phonetic_Translation

Phonetic_Translation

Spell_Check

Spell_Check

Utilities

Utilities

W2V_Model

W2V_Model

Words

Words

data_preparation

data_preparation

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Tamizh

W2V_Model

Code Usage

tawiki_convert_bz2_to_xml.ipynb

tawiki_data_extraction_cleaning.ipynb

Tamizh_Word2Vec.ipynb

NEXT

Phonetic_Translation

Word Mappings

About

Releases

Packages

Contributors 4

Languages

License

wickkiey/Tamizh

Folders and files

Latest commit

History

Repository files navigation

Tamizh

W2V_Model

Code Usage

tawiki_convert_bz2_to_xml.ipynb

tawiki_data_extraction_cleaning.ipynb

Tamizh_Word2Vec.ipynb

NEXT

Phonetic_Translation

Word Mappings

About

Resources

License

Stars

Watchers

Forks

Languages