Skip to content
Python-based utility that uses supervised machine learning to detect phishing domains from the Certificate Transparency log network.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cli Expanded unit test coverage of core modules to over 90+% (skipped uni… Aug 28, 2018
db Initial commit of streamingphish. Apr 14, 2018
jupyter Minor change to system diagram. Apr 26, 2018
training_data
.gitignore Initial commit of streamingphish. Apr 14, 2018
LICENSE
README.md Update README.md Apr 25, 2018
docker-compose.yml Initial commit of streamingphish. Apr 14, 2018
install_streamingphish.sh Initial commit of streamingphish. Apr 14, 2018

README.md

StreamingPhish

This is a utility that uses supervised machine learning to detect phishing domains from the Certificate Transparency log network. The firehose of domain names and SSL certificates are made available thanks to the certstream network (certstream.calidog.io). All of the data required for training the initial predictive model is included in this project as well.

Also included is a Jupyter notebook to help explain each step of the supervised machine learning lifecycle (as it pertains to this project).

Overview

StreamingPhish Diagram

This application consists of three main components:

  • Jupyter notebook
    • Demonstrates how to train a phishing classifier from start to finish.
  • CLI utility
    • Trains classifiers and evaluates domains in manual mode or against the Certificate Transparency log network (via certstream).
  • Database
    • Stores trained classifiers, performance metrics, and code for feature extraction.

Each segment has been functionally decomposed into its own Docker container. The application is designed to be built and operated via Docker Compose.

Install and Operational Instructions

Components

  • Docker - Containers that run the application.
  • Docker Compose - Fabric for orchestrating containers and their respective services.
  • Python3 - Programming language.
  • Scikit-learn - Open source library for training classifiers using Python.

Author

  • Wes Connell

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for further details.

Resources/Acknowledgments

You can’t perform that action at this time.