GitHub - umutsevdi/pds: Machine Learning and Regex Matching based Phishing Detection System with a phishing attack scenario

Phishing Detection System

Machine Learning and Regex Matching based Phishing Detection System with a phishing attack scenario
Developed by Umut Sevdi, İsmet Güngör, Semih Yazıcı and Oğuzhan Ercan

Explore the docs »

Table of Contents

Project Definition
System Architecture
Hardware Requirements
Installation
License
Contact

1. Project Definition

Phishing is a cyber attack involving carefully crafted emails or websites to trick individuals into revealing sensitive information such as login credentials or financial information. These attacks often take the form of fake login pages or emails purporting to be from legitimate organizations, and they can have severe consequences for both individuals and organizations.

In our project, we developed a phishing scenario and a program to protect from it. In the scenario, we hosted an SMTP server and a phishing server for the attacker. Phishing server tricks users into thinking that the website is legit.

When the victim clicks on the link, a login page that imitates "edevlet.gov.tr" is returned. However, when the user logs in, all credentials are sent to the attacker. Phishing site responds with a fake dashboard to be unnoticed.

Against similar attacks, we aimed to develop a machine learning and a regex matching-based phishing detection system to identify and prevent phishing attacks. The use of machine learning algorithms and regex matching allows the system to analyze and classify email content and identify patterns and keywords commonly used in phishing attacks. This approach has the potential to be highly effective in detecting and preventing phishing attacks, as it can quickly and accurately identify suspicious emails and take action to block them.

2. System Architecture

Attacker

On the attacker's side, we developed a web server in Go to host the phishing site. The site sends a web page that looks like edevlet.gov.tr. However, unlike the original page, it does not encrypt any data while sending. And it sends directly to the attacker.

Victim

We used a MailHog server to host an SMTP server. It runs from a docker-compose file as a container for testing purposes.
To protect the victim against phishing attacks, we have implemented a system that listens to the ongoing traffic and parses SMTP to examine the mail body. After obtaining the mail body, firstly process with Yara using rules specifically generated for detecting phishing mail attacks. After checking possible malicious keywords with the Yara tool, transferring the plain text body to a Python program, a machine learning method that determines whether the incoming mail is a phishing attack or innocent.

We have called Long Short Term memory, a type of recurrent neural network (RNN) well-suited for modeling long-term dependencies in time series or sequential data. It can effectively retain information over long periods and handle variable-length input sequences. The attention layer weighs the input sequences, and the classifier predicts based on the weighted input. The model also has methods for generating initial hidden states for the LSTM layer, encoding input text using the embedding layer and LSTM layer, and applying attention to the output of the LSTM layer. In addition, we detect which words cause phishing thanks to the attention layer placed between LSTM and linear classifiers in the model. The text that came over TCP and converted to the string was not in a format that could be fed into our LSTM model. For this reason, we performed the text preprocessing steps frequently used in natural language processing tasks. The utils_preprocess_text function is used for cleaning and preprocessing text by removing punctuation and lower-casing, removing stop words, and optionally applying stemming or lemmatization. The textCleaner function applies the utils_preprocess_text function to a column of a pandas DataFrame and stores the processed text in a new column.

3. Installation

Requirements:

Yara v4
nltk
Numpy
Pandas
Docker and docker-compose

Clone the repository.

   git clone https://github.com/umutsevdi/pds.git

Run the mail server.

    cd victim
    docker-compose up

Compile and execute the Phishing detection programs.

    cd victim 
    cd mail-detect 
    python mail_detect.py &
    cd ..
    go build smtp_phishing_detection
    sudo smtp_phishing_detection/smtp_phishing_detection &

Execute the attacker programs from an external device or locally.

    cd attacker/phishing_server/cmd
    go run . &

Now you can send phishing emails using our mail script.

    cd attacker/
    pyhton mail_sender.py

5. License

Distributed under the MIT License. See LICENSE for more information.

6. Contact

You can contact any developer of this project for any suggestion or information.

Project: umutsevdi/pds

Developed by Umut Sevdi, İsmet Güngör, Semih Yazıcı and Oğuzhan Ercan

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
attacker		attacker
doc		doc
img		img
victim		victim
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phishing Detection System

1. Project Definition

2. System Architecture

Attacker

Victim

3. Installation

5. License

6. Contact

About

Releases 1

Packages

Contributors 4

Languages

License

umutsevdi/pds

Folders and files

Latest commit

History

Repository files navigation

Phishing Detection System

1. Project Definition

2. System Architecture

Attacker

Victim

3. Installation

5. License

6. Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages