Machine Learning and Regex Matching based Phishing Detection System with a phishing attack scenario
Developed by Umut Sevdi,
İsmet Güngör,
Semih Yazıcı and
Oğuzhan Ercan
Table of Contents
Phishing is a cyber attack involving carefully crafted emails or websites to trick individuals into revealing sensitive information such as login credentials or financial information. These attacks often take the form of fake login pages or emails purporting to be from legitimate organizations, and they can have severe consequences for both individuals and organizations.
In our project, we developed a phishing scenario and a program to protect from it. In the scenario, we hosted an SMTP server and a phishing server for the attacker. Phishing server tricks users into thinking that the website is legit.
When the victim clicks on the link, a login page that imitates "edevlet.gov.tr" is returned. However, when the user logs in, all credentials are sent to the attacker. Phishing site responds with a fake dashboard to be unnoticed.
Against similar attacks, we aimed to develop a machine learning and a regex matching-based phishing detection system to identify and prevent phishing attacks. The use of machine learning algorithms and regex matching allows the system to analyze and classify email content and identify patterns and keywords commonly used in phishing attacks. This approach has the potential to be highly effective in detecting and preventing phishing attacks, as it can quickly and accurately identify suspicious emails and take action to block them.
- On the attacker's side, we developed a web server in Go to host the phishing site. The site sends a web page that looks like edevlet.gov.tr. However, unlike the original page, it does not encrypt any data while sending. And it sends directly to the attacker.
-
We used a MailHog server to host an SMTP server. It runs from a docker-compose file as a container for testing purposes.
-
To protect the victim against phishing attacks, we have implemented a system that listens to the ongoing traffic and parses SMTP to examine the mail body. After obtaining the mail body, firstly process with Yara using rules specifically generated for detecting phishing mail attacks. After checking possible malicious keywords with the Yara tool, transferring the plain text body to a Python program, a machine learning method that determines whether the incoming mail is a phishing attack or innocent.
- We have called Long Short Term memory, a type of recurrent neural network (RNN)
well-suited for modeling long-term dependencies in time series or sequential data. It can
effectively retain information over long periods and handle variable-length input sequences.
The attention layer weighs the input sequences, and the classifier predicts based on the
weighted input. The model also has methods for generating initial hidden states for the LSTM
layer, encoding input text using the embedding layer and LSTM layer, and applying attention
to the output of the LSTM layer. In addition, we detect which words cause phishing thanks to
the attention layer placed between LSTM and linear classifiers in the model.
The text that came over TCP and converted to the string was not in a format that could be fed
into our LSTM model. For this reason, we performed the text preprocessing steps frequently
used in natural language processing tasks. The
utils_preprocess_text
function is used for cleaning and preprocessing text by removing punctuation and lower-casing, removing stop words, and optionally applying stemming or lemmatization. ThetextCleaner
function applies theutils_preprocess_text
function to a column of a pandas DataFrame and stores the processed text in a new column.
Requirements:
- Clone the repository.
git clone https://github.com/umutsevdi/pds.git
- Run the mail server.
cd victim
docker-compose up
- Compile and execute the Phishing detection programs.
cd victim
cd mail-detect
python mail_detect.py &
cd ..
go build smtp_phishing_detection
sudo smtp_phishing_detection/smtp_phishing_detection &
- Execute the attacker programs from an external device or locally.
cd attacker/phishing_server/cmd
go run . &
- Now you can send phishing emails using our mail script.
cd attacker/
pyhton mail_sender.py
Distributed under the MIT License. See LICENSE
for more information.
You can contact any developer of this project for any suggestion or information.
Project: umutsevdi/pds
Developed by Umut Sevdi, İsmet Güngör, Semih Yazıcı and Oğuzhan Ercan