---
title:  Project Card
subtitle: Credit Risk Assessment
version: v0.1
card version: v0.1
author: Tathagata Talukdar
date: 24-Oct-2024
objective: >
    The purpose of Project Cards is two folds. During development, it helps the developer think about the problem in a structured way w.r. t framing the problem, assessing the business value, viability, and many other aspects. 

    It can also serve as a document giving a high level overview of the system developed and deployed. With proper versioning, one can also see the evolution of the problem. It is meant to be a high level document and as details emerge, documents such Model Cards and Data Cards can be linked.
tag: >
    This notebook uses tags to render the output. Each call has a tag. There are three tags: objective, instruction, response. 
    The cell with objective tag explains the purpose of this project card. Cells with instruction tag, are the key sections of the document that must be filled.  A cell following immediately will have a tag response. You only fill the cell with response tag. DO NOT MODIFY the cells with tag instruction. Of course, feel free to modify to your needs. Once the format is agreed upon, stick to it.
format:
    html:
        code-fold: true
---

# Business View

## Background

Financial institutions face significant challenges in assessing credit risk for loan applications. Traditional credit scoring methods often miss nuanced patterns and can be biased against certain demographic groups. This leads to both missed opportunities for good borrowers being denied and increased risk from bad loans being approved. In today's competitive banking environment, institutions need more sophisticated ways to evaluate credit applications while maintaining regulatory compliance and fairness.

## Problem

Banks need to accurately assess the creditworthiness of loan applicants to minimize default risk while maximizing approved loans to qualified borrowers. Currently, credit officers spend significant time manually reviewing applications and may make inconsistent decisions based on limited information. This results in both lost revenue from good applications being rejected and losses from bad loans being approved.

## Customer

Primary customers are:

- Credit risk officers at medium to large retail banks
- Loan application processing teams
- Banking compliance officers responsible for fair lending practices
- Financial institutions serving diverse demographic populations

## Value Proposition

- Reduce loan default rates through more accurate risk assessment
- Increase loan approval rates for qualified borrowers
- Reduce application processing time
- Improve consistency and fairness in lending decisions
- Enable regulatory compliance through transparent decision-making

## Product

Credit officers will receive a streamlined workflow where they:
1. Input standard loan application data into a familiar interface
2. Receive an immediate risk assessment score with key contributing factors
3. View detailed explanations of risk factors in an intuitive dashboard
4. Override recommendations with documented justification when needed
5. Generate standardized reports for audit and compliance purposes

## Objectives

1. Deploy MVP risk assessment system to pilot branch by Q4 2024
2. Achieve 90% user adoption among credit officers within 6 months of launch
3. Demonstrate statistical fairness across demographic groups within the next quarter after launch
4. Obtain regulatory approval by mid 2025

## Risks & Challenges

1. Data Privacy & Security
    - Handling sensitive financial and personal information
    - Ensuring compliance with data protection regulations
    - Securing data transfer between systems

2. User Adoption
    - Resistance from experienced credit officers
    - Training requirements for new system
    - Integration with existing workflows

3. Regulatory Compliance
    - Meeting fair lending requirements
    - Providing required transparency in decisions
    - Maintaining audit trails

# ML View

## Task

This is a binary classification problem predicting credit risk based on the provided dataset (dataset_id_96.csv). According to the code and data samples, we need to predict 'y' (True/False) indicating whether a loan application represents a good or bad credit risk.

## Metrics

- Primary: F1 score (currently being tracked in evaluate_model function)

- Supporting metrics:
  - Accuracy
  - Precision
  - Recall
  - Target drift score (monitored via Evidently)

## Evaluation

Based on the provided pipeline code, evaluation happens through:
1. Train/test split with configurable test size
2. Target drift detection between training and test sets using Evidently
3. Model performance tracking via MLflow
4. Production monitoring through FastAPI logging
5. Regular retraining evaluation

## Data

- Primary dataset: dataset_id_96.csv

- Features include:
  - Credit history attributes (checking_status, credit_history, etc.)
  - Personal information (age, employment, etc.)
  - Loan details (duration, credit_amount, etc.)
  - Generated features (X_1 through X_10)

- Data Pipeline:
  - Raw data ingestion via Kedro
  - Data quality checks
  - Preprocessing including scaling and encoding
  - Feature engineering
  - Train/test splitting

## Plan/Roadmap

1. Phase 1 - Initial Development
    - Enhance current data pipeline
    - Implement additional data quality checks
    - Develop model monitoring dashboard
    - Complete initial model training

2. Phase 2 - Pilot
    - Deploy to pilot branch
    - Gather user feedback
    - Refine model based on real usage
    - Implement A/B testing framework

3. Phase 3 - Scale
    - Roll out to additional branches
    - Implement automated retraining pipeline
    - Enhance monitoring and alerting
    - Develop fallback procedures

## Continuous Improvement

1. Automated monitoring via:
    - MLflow experiment tracking
    - FastAPI request logging
    - Evidently drift detection

2. Regular Updates:
    - Model retraining based on drift detection
    - Feature importance analysis
    - Performance metric tracking
    - User feedback incorporation

## Resources

### Human Resources

1. Data Science Team:
    - 1 ML Engineers (model development)
    - 1 Data Engineer (pipeline maintenance)
    - 1 MLOps Engineer (deployment/monitoring)

2. Product Team:
    - 1 Product Manager
    - 1 UX Designer
    - 1 Full-stack Developers

3. Domain Experts:
    - 1 Credit Risk Officer
    - 1 Compliance Officer

### Compute Resources

1. Development:
    - Kedro pipeline execution environment
    - MLflow tracking server
    - Development databases

2. Production:
    - FastAPI server for model serving
    - Model artifact storage
    - Monitoring infrastructure
    - Load balancing for high availability
    - Backup and disaster recovery systems

3. Storage:
    - Secure data warehouse for sensitive information
    - Model registry
    - Log storage
    - Backup storage