# Data Preparation & Machine Learning for eBay Shill Bidding data
<b>by Victor Ferreira Silva<br>January 2023</b>
* [Introduction](#Introduction)
    * Data preparation
    * Data characterisation
    * Exploratory Data Analysis
    * Data cleaning
    * Feature engineering
    * Data scaling
* Dimensionality reduction
    * Principal Component Analysis (PCA)
    * Linear Discriminant Analysis (LDA)
* Machine Learning
    * Clustering algorithms
    * Classification algorithms
* Conclusion
* References

SBD Dataset Web Page

## <a id="Introduction"></a>Introduction ##
The ability to predict normal and abnormal bidding behavior of eBay users can help companies identify scams and other undesirable users on the platform. The Shill Bidding Dataset (SBD) consists of eBay auctions that have various features, including auction duration, bidder tendency and class. The goal of this report is to apply supervised and unsupervised machine learning techniques to the data set after properly preparing and characterizing it. To improve the results, scaling and feature reduction methods were used, and the performance and accuracy of the applied machine learning methods were compared. At the end of the report, the supervised and unsupervised methods that performed optimally on this dataset were identified. 

### Data preparation

### Import / Configuration


In [1]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd

import pandas_profiling
from pandas_profiling import ProfileReport

## Data Understanding

In [2]:
df = pd.read_csv('Shill Bidding Dataset.csv')

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6321 entries, 0 to 6320
Data columns (total 13 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Record_ID               6321 non-null   int64  
 1   Auction_ID              6321 non-null   int64  
 2   Bidder_ID               6321 non-null   object 
 3   Bidder_Tendency         6321 non-null   float64
 4   Bidding_Ratio           6321 non-null   float64
 5   Successive_Outbidding   6321 non-null   float64
 6   Last_Bidding            6321 non-null   float64
 7   Auction_Bids            6321 non-null   float64
 8   Starting_Price_Average  6321 non-null   float64
 9   Early_Bidding           6321 non-null   float64
 10  Winning_Ratio           6321 non-null   float64
 11  Auction_Duration        6321 non-null   int64  
 12  Class                   6321 non-null   int64  
dtypes: float64(8), int64(4), object(1)
memory usage: 642.1+ KB
