New Knowledge's Semantic Email Classifier

Assumptions in data labeling:

419 emails are labeled as spam
enron emails are labeled as not spam
All JPL data abuse dataset emails are treated as foe - exceptions are FalsePositive and Recon which were dropped (former due to self-explanatory reason, and latter due to "lack of full understanding" reasons). Unknown was dropped as well.

Built on top of the New Knowledge character-level convolutional neural network text classification system - SIMON:

Switching Binary and Multiclass Classifiers

You can swap out the multiclass multilabel model (enabled by default) for the binary model by modifying config.ini as specified in deployed_checkpoints/checkpoint_descriptions.txt

Be sure to rebuild docker images, if using dockerized version of code, after making this edit.

gRPC Dockerized Classifier

The gRPC interface consists of the following components: *) grapevine.proto in protos/ which generates grapevine_pb2.py and grapevine_pb2_grpc.py according to instructions in protos/README.md -- these have to be generated every time grapevine.proto is changed *) spam_clf_server.py which is the main gRPC server, serving on port 50052 (configurable via config.ini) *) spam_clf_client.py which is an example script demonstrating how the main gRPC server can be accessed to classify emails

To build corresponding docker image: sudo docker build -t nk-email-classifier:latest .

To run docker image, simply do sudo docker run -it -p 50052:50052 nk-email-classifier:latest

Finally, edit spam_clf_client.py with example email of interest for classification, and then run that script as python3 spam_clf_client.py

REST Dockerized Classifier

Comment out gRPC server command at the bottom of Dockerfile (which is set as default serving protocol), and uncomment the REST server command.

To build docker image: sudo docker build -t nk-email-classifier:latest .

To run docker image, simply do sudo docker run -it -p 5000:5000 nk-email-classifier:latest

Finally, edit clientRestScriptExample.py to fetch jsonl email of interest, and then run that script as python3 clientRestScriptExample.py

Batch classification capabilities will be added next, for both serving protocols.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.dvc		.dvc
data		data
deployed_checkpoints		deployed_checkpoints
dry_run_data		dry_run_data
prior_preprocessing_scripts		prior_preprocessing_scripts
protos		protos
.dockerignore		.dockerignore
.gitignore		.gitignore
ClassifyJSONLEmail.py		ClassifyJSONLEmail.py
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SimonRestListener.py		SimonRestListener.py
clientRestScriptExample.py		clientRestScriptExample.py
config.ini		config.ini
eval_spam_classifier.py		eval_spam_classifier.py
grapevine_pb2.py		grapevine_pb2.py
grapevine_pb2_grpc.py		grapevine_pb2_grpc.py
requirements.txt		requirements.txt
spam_clf_client.py		spam_clf_client.py
spam_clf_server.py		spam_clf_server.py
start_flask.sh		start_flask.sh
start_gRPC.sh		start_gRPC.sh
train_spam_classifier.py		train_spam_classifier.py
transfer_train_spam_classifier.py		transfer_train_spam_classifier.py
transfer_train_spam_classifier_multilabel.py		transfer_train_spam_classifier_multilabel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

New Knowledge's Semantic Email Classifier

Switching Binary and Multiclass Classifiers

gRPC Dockerized Classifier

REST Dockerized Classifier

About

Releases

Packages

Contributors 3

Languages

License

uncharted-recourse/NK-email-classifier

Folders and files

Latest commit

History

Repository files navigation

New Knowledge's Semantic Email Classifier

Switching Binary and Multiclass Classifiers

gRPC Dockerized Classifier

REST Dockerized Classifier

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages