Neural Augmentation for Text Classification

Kaggle Competition: https://www.kaggle.com/competitions/nlp-getting-started

Comparative analysis of the performance of Bert, DistilBert, ULMFiT, and logistic regression (baseline). Models relies on different library (pytorch, hugginface, fastai and sklearn)

AUGMENTER

To tackle the classification task, a brand new augmentation pipeline was implemented. Initially, a PEGASUS sequence-to-sequence model, fine-tuned on a paraphrasing task, is deployed to get n paraphrases of the input text. By deploying a pre-trained and fine-tuned large language model to rephrase the input sentence we can guarantee the syntactic and semantic consistency of the generated data, differently from augmentation methods on the word level. The subsequent step involves ranking the n-generated sentences based on their similarity to the original sentence, utilizing cosine similarity. Cosine similarity is a method for determining the similarity between two vectors in a high-dimensional space. It is calculated as the cosine of the angle between the two vectors. To accomplish this, we utilized the Spacy Library and its “en_core_web_sm” pre-trained model to get embedded sentences and compute the cosine similarity between sentences.

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
.idea		.idea
data		data
model's notebook		model's notebook
models' python files		models' python files
preprocessing		preprocessing
statistical testing		statistical testing
visual		visual
.DS_Store		.DS_Store
README.md		README.md
utils4gpu.py		utils4gpu.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Augmentation for Text Classification

AUGMENTER

About

Releases

Packages

Contributors 3

Languages

tommasomncttn/Neural-Augmentation-for-Text-Classification

Folders and files

Latest commit

History

Repository files navigation

Neural Augmentation for Text Classification

AUGMENTER

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages