Skip to content

jimburton/phish

Repository files navigation

Phishing project

A project in three parts:

  • A tool that uses ML to detect emails that may contain phishing scams.
  • A webservice that provides an endpoint for the ML tool.
  • A proof-of-concept service that monitors a Gmail inbox, sending new messages to the webservice and recording the results.

Phishing detector

See /phish_detector. This is a sequential neural net written in Python using the Tensorflow library. Trained on the dataset https://www.kaggle.com/datasets/naserabdullahalam/phishing-email-dataset.

The raw emails are first converted to matrices of Term Frequency-Inverse Document Frequency (TF-IDF) features, which measure how significant individual words are within an individual email and across the corpus. The term frequency measures how often a term appears in an email. The inverse document frequency measures how rare or common a term is across the entire corpus.

The array of TF-IDF features are inputted to a NN with this structure:

Model: "sequential"
+---------------------+-----------------+-----------------+
| Layer (type)        | Output Shape    |         Param # |
+---------------------+-----------------+-----------------+
| dense (Dense)       | (None, 128)     |         640,128 |
+---------------------+-----------------+-----------------+
| dropout (Dropout)   | (None, 128)     |               0 |
+---------------------+-----------------+-----------------+
| dense_1 (Dense)     | (None, 64)      |           8,256 |
+---------------------+-----------------+-----------------+
| dropout_1 (Dropout) | (None, 64)      |               0 |
+---------------------+-----------------+-----------------+
| dense_2 (Dense)     | (None, 1)       |              65 |
+---------------------+-----------------+-----------------+

Test Loss: 0.0241, Test Accuracy: 0.9964

TODO

  • Evaluate for overfitting on other corpora.
  • Add subject and URL features.

Webservice

See /webservice. This is a Flask app that provides a single endpoint, /check, which accepts POST data with a field named email and returns 1 (phishing scam) or 0 (ham).

Run the app:

$ flask --app webservice run --debug

Sending a request:

$ curl -X POST http://127.0.0.1:5000/check -d "email=value"

$ curl -X POST http://127.0.0.1:5000/check -d "$(cat data/emails/good0.txt)"

Mail service

See /gmail_app/app/gmail.py. The script runs as a daemon using the GMail API to periodically poll a Gmail account. The first time the script runs it will launch a browser asking the user to allow access to the account. As the app is in testing only listed users can run it.

New messages are sent to the webservice and the results are logged.

Start the service:

$ python gmail_app/app/gmail.py &

Watch the log:

$ tail -f gmail_app/logs/access.log

By default access and error logs are in gmail_app/logs/. Control this location, the frequency of the calls to the GMail API, the location of the webservice and one or two other things by editing the file gmail_app.ini.

About

Using ML to detect phishing emails.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages