This project focuses on analyzing and predicting loan defaults for Bondora, a leading European peer-to-peer (P2P) lending company. By preprocessing the dataset, transforming target variables, and creating a binary classification model, we aim to provide actionable insights into reducing financial risks for lenders and borrowers.
- Clean and preprocess the dataset to handle missing values, duplicates, and outliers.
- Create a binary target variable (
Default
orNot Default
) based on loan statuses. - Encode categorical variables and transform the data into a model-ready format.
- Perform exploratory data analysis (EDA) to understand key features influencing loan defaults.
- Build predictive models to assess default risks.
The dataset contains historical loan data from Bondora, including:
- Loan statuses
- Borrower details
- Loan amount and interest rates
- Payment history and more
The data required for preprocessing, cleaning and labeling tasks can be downloaded from the following Google Drive link:
The Status
column is transformed into a binary variable:
- 1 (Default): Includes statuses like "Charged Off," "Late," and "Defaulted."
- 0 (Not Default): Includes statuses like "Fully Paid" and "Current."
- Handle missing values using appropriate imputation techniques.
- Remove duplicates and standardize column names.
- Convert columns (e.g., date columns) to their appropriate formats.
- Map the
Status
column into binary values:1
for loan defaults.0
for non-defaults.
- Apply label encoding for binary categorical columns.
- Use one-hot encoding for multi-category columns.
- Detect outliers in numeric columns using statistical methods.
- Cap extreme values to the 1st and 99th percentiles.
The processed dataset will be used to build classification models (e.g., Logistic Regression, Random Forest, XGBoost) to predict loan defaults. The models will be evaluated on metrics like:
- Accuracy
- Precision
- Recall
- F1 Score
- Clone the repository:
git clone https://github.com/Technocolabs100/Analysin-and-Building-Financial-Risk-System-For-P2P-Lending.git
- Install the required Python libraries:
pip install -r requirements.txt
- Programming Language: Python
- Libraries and Tools:
- Pandas, NumPy (Data Preprocessing)
- Scikit-learn (Modeling)
- Matplotlib, Seaborn (Visualization)
- Jupyter Notebook
- PowerBI and Tableau Dashboards (Visualization)
We welcome contributions! If you'd like to contribute:
- Fork this repository.
- Create a new branch.
- Commit your changes and push to your branch.
- Create a pull request, and we’ll review it.
This project is licensed under the MIT License. See the LICENSE file for details.