Phishing project

A project in three parts:

A tool that uses ML to detect emails that may contain phishing scams.
A webservice that provides an endpoint for the ML tool.
A proof-of-concept service that monitors a Gmail inbox, sending new messages to the webservice and recording the results.

Phishing detector

See /phish_detector. This is a sequential neural net written in Python using the Tensorflow library. Trained on the dataset https://www.kaggle.com/datasets/naserabdullahalam/phishing-email-dataset.

The raw emails are first converted to matrices of Term Frequency-Inverse Document Frequency (TF-IDF) features, which measure how significant individual words are within an individual email and across the corpus. The term frequency measures how often a term appears in an email. The inverse document frequency measures how rare or common a term is across the entire corpus.

The array of TF-IDF features are inputted to a NN with this structure:

Model: "sequential"
+---------------------+-----------------+-----------------+
| Layer (type)        | Output Shape    |         Param # |
+---------------------+-----------------+-----------------+
| dense (Dense)       | (None, 128)     |         640,128 |
+---------------------+-----------------+-----------------+
| dropout (Dropout)   | (None, 128)     |               0 |
+---------------------+-----------------+-----------------+
| dense_1 (Dense)     | (None, 64)      |           8,256 |
+---------------------+-----------------+-----------------+
| dropout_1 (Dropout) | (None, 64)      |               0 |
+---------------------+-----------------+-----------------+
| dense_2 (Dense)     | (None, 1)       |              65 |
+---------------------+-----------------+-----------------+

Test Loss: 0.0241, Test Accuracy: 0.9964

TODO

Evaluate for overfitting on other corpora.
Add subject and URL features.

Webservice

See /webservice. This is a Flask app that provides a single endpoint, /check, which accepts POST data with a field named email and returns 1 (phishing scam) or 0 (ham).

Run the app:

$ flask --app webservice run --debug

Sending a request:

$ curl -X POST http://127.0.0.1:5000/check -d "email=value"

$ curl -X POST http://127.0.0.1:5000/check -d "$(cat data/emails/good0.txt)"

Mail service

See /gmail_app/app/gmail.py. The script runs as a daemon using the GMail API to periodically poll a Gmail account. The first time the script runs it will launch a browser asking the user to allow access to the account. As the app is in testing only listed users can run it.

New messages are sent to the webservice and the results are logged.

Start the service:

$ python gmail_app/app/gmail.py &

Watch the log:

$ tail -f gmail_app/logs/access.log

By default access and error logs are in gmail_app/logs/. Control this location, the frequency of the calls to the GMail API, the location of the webservice and one or two other things by editing the file gmail_app.ini.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data/emails		data/emails
gmail_app		gmail_app
phish_detector		phish_detector
tests		tests
webservice		webservice
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Phishing project

Phishing detector

TODO

Webservice

Mail service

About

Uh oh!

Releases

Packages

Uh oh!

Languages

jimburton/phish

Folders and files

Latest commit

History

Repository files navigation

Phishing project

Phishing detector

TODO

Webservice

Mail service

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages