sumukha

Project sumukha - This is the official repository for building domain adapted embeddings for robust NLP tasks

Motivation

Generic Embeddings for text representation may not be accurate in every domain of interest, this project is to leverage knowledge from the domain.

Description

sumukha uses the state of the art techniques like standard preprocessing techniques on texts, fasttext models to train embeddings, scikit-learn decomposition libraries for further processing.

How to Use

Run pip install . to install all the requirements.
Run sumukha -r ./ preprocess --input dataset_path --output preprocess_path
Run sumukha -r ./ train --input preprocess_path --output trained_results_path
Run sumukha -r ./ encode --input preprocess_path --gen_emb_path general_embeddings_path --dom_emb_path domain_embeddings_path

Back to The Top

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
embeddings		embeddings
scripts		scripts
src/sumukha		src/sumukha
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
imdb_dataset.csv		imdb_dataset.csv
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sumukha

Motivation

Generic Embeddings for text representation may not be accurate in every domain of interest, this project is to leverage knowledge from the domain.

Table of Contents

Description

How to Use

About

Releases

Packages

Languages

u4ece10128/sumukha

Folders and files

Latest commit

History

Repository files navigation

sumukha

Motivation

Generic Embeddings for text representation may not be accurate in every domain of interest, this project is to leverage knowledge from the domain.

Table of Contents

Description

How to Use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages