Anamoly Detection use case
The goal is to develop a classifier that can differentiate between fraudulent and real websites
- dga.ipynb : Some Data Prep, also contained in the script dga-prep
- dga-tfidf.ipynb : Does some feature extraction with tfidf and trains a support vector machine classifier
- cleanup.sh : does some rough cleanup
- dga-prep.py : Extracts contents of dga.ipynb into a script.
I trained with the data, did some feature extraction and feature engineering, and got some results of our classifier. Some future steps might be to try better feature extraction (e.g. word2vec), and some better models than SVM.