# Phase 1 — Compréhension des données (LendingClub)

## Objectif
Charger le fichier LendingClub et comprendre :
- la taille du dataset
- les colonnes disponibles
- les statuts de prêts (`loan_status`)

Aucune transformation n’est faite à ce stade.


In [1]:
import pandas as pd


In [2]:
df = pd.read_csv("/Users/master/Downloads/credit-climate-risk-lab/data/raw/lendingclub.csv", low_memory=False)


In [4]:
df.head(10)

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,...,hardship_payoff_balance_amount,hardship_last_payment_amount,disbursement_method,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
0,68407277,,3600.0,3600.0,3600.0,36 months,13.99,123.03,C,C4,...,,,Cash,N,,,,,,
1,68355089,,24700.0,24700.0,24700.0,36 months,11.99,820.28,C,C1,...,,,Cash,N,,,,,,
2,68341763,,20000.0,20000.0,20000.0,60 months,10.78,432.66,B,B4,...,,,Cash,N,,,,,,
3,66310712,,35000.0,35000.0,35000.0,60 months,14.85,829.9,C,C5,...,,,Cash,N,,,,,,
4,68476807,,10400.0,10400.0,10400.0,60 months,22.45,289.91,F,F1,...,,,Cash,N,,,,,,
5,68426831,,11950.0,11950.0,11950.0,36 months,13.44,405.18,C,C3,...,,,Cash,N,,,,,,
6,68476668,,20000.0,20000.0,20000.0,36 months,9.17,637.58,B,B2,...,,,Cash,N,,,,,,
7,67275481,,20000.0,20000.0,20000.0,36 months,8.49,631.26,B,B1,...,,,Cash,N,,,,,,
8,68466926,,10000.0,10000.0,10000.0,36 months,6.49,306.45,A,A2,...,,,Cash,N,,,,,,
9,68616873,,8000.0,8000.0,8000.0,36 months,11.48,263.74,B,B5,...,,,Cash,N,,,,,,


In [5]:
df.shape

(2260701, 151)

##### Il y a 2 260 701 de prêts et 150 variables

In [6]:
# overview sur les colonnes
df.columns

Index(['id', 'member_id', 'loan_amnt', 'funded_amnt', 'funded_amnt_inv',
       'term', 'int_rate', 'installment', 'grade', 'sub_grade',
       ...
       'hardship_payoff_balance_amount', 'hardship_last_payment_amount',
       'disbursement_method', 'debt_settlement_flag',
       'debt_settlement_flag_date', 'settlement_status', 'settlement_date',
       'settlement_amount', 'settlement_percentage', 'settlement_term'],
      dtype='object', length=151)

### Extractions des colonnes métiers pour notre étude
___

In [10]:
important_columns = ["loan_status", "loan_amnt", "term","int_rate", "annual_inc", "grade", "addr_state"]

df_metier = df[important_columns]
df_metier.head(10)

Unnamed: 0,loan_status,loan_amnt,term,int_rate,annual_inc,grade,addr_state
0,Fully Paid,3600.0,36 months,13.99,55000.0,C,PA
1,Fully Paid,24700.0,36 months,11.99,65000.0,C,SD
2,Fully Paid,20000.0,60 months,10.78,63000.0,B,IL
3,Current,35000.0,60 months,14.85,110000.0,C,NJ
4,Fully Paid,10400.0,60 months,22.45,104433.0,F,PA
5,Fully Paid,11950.0,36 months,13.44,34000.0,C,GA
6,Fully Paid,20000.0,36 months,9.17,180000.0,B,MN
7,Fully Paid,20000.0,36 months,8.49,85000.0,B,SC
8,Fully Paid,10000.0,36 months,6.49,85000.0,A,PA
9,Fully Paid,8000.0,36 months,11.48,42000.0,B,RI


In [14]:
# zoom sur les types du dataframe métier
df_metier.dtypes

loan_status     object
loan_amnt      float64
term            object
int_rate       float64
annual_inc     float64
grade           object
addr_state      object
dtype: object

In [13]:
# zoom sur la colonne target 'loan_status'
df_metier.loan_status.value_counts()


loan_status
Fully Paid                                             1076751
Current                                                 878317
Charged Off                                             268559
Late (31-120 days)                                       21467
In Grace Period                                           8436
Late (16-30 days)                                         4349
Does not meet the credit policy. Status:Fully Paid        1988
Does not meet the credit policy. Status:Charged Off        761
Default                                                     40
Name: count, dtype: int64

### EXPORT DES RÉSULTATS

In [16]:
summary = pd.DataFrame({
    "rows": [df.shape[0]],
    "columns": [df.shape[1]]
})

summary.to_csv("/Users/master/Downloads/credit-climate-risk-lab/reports/tableau_exports/01_dataset_summary.csv", index=False)

In [17]:
status = df["loan_status"].value_counts().reset_index()
status.columns = ["loan_status", "count"]

status.to_csv("/Users/master/Downloads/credit-climate-risk-lab/reports/tableau_exports/01_loan_status_counts.csv", index=False)


### Phase 1 terminée : données chargées et comprises.