Skip to content
/ ToxTest Public

Python script to train a classification TensorFlow model, and a streamlit app to use the model.

License

Notifications You must be signed in to change notification settings

vluz/ToxTest

Repository files navigation

Note: Due to the nature of toxic comments please cosider this project as explicit.

Toxic Comment Test

Python script to train a classification TensorFlow model, and a streamlit app to use the model.

Data is from kaggle, the Toxic Comment Classification Challenge
https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/
Original data:
https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/data?select=train.csv.zip


Demo running instance: https://huggingface.co/spaces/vluz/Tox


To use pretrained model, please donload toxmodel.keras and vectorizer.pkl from HuggingFace link:
https://huggingface.co/vluz/toxmodel30/tree/main/model


To download the cleaned up data please go here:
https://huggingface.co/datasets/vluz/Tox/blob/main/alt_format/train.csv


Open a command prompt and cd to a new directory of your choosing.

Create a virtual environment with:

python -m venv "venv"
venv\Scripts\activate

To install do:

git clone https://github.com/vluz/ToxTest.git
cd ToxTest
pip install -r requirements.txt

Put train.csv into the data dir
and/or
Put toxmodel.keras and vectorizer.pkl into the model dir.

To train do:

python toxtrain.py

To test using existing model do:

stramlit run toxtest.py

To exit the virtual environment do:

venv\Scripts\deactivate

The helper script dataclean.py provides text cleaning for original data

The helper script renderwordcloud.py renders wordclouds for both the data as a whole, and toxic comments


About

Python script to train a classification TensorFlow model, and a streamlit app to use the model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages