-
Notifications
You must be signed in to change notification settings - Fork 0
/
BankRecovery.html
359 lines (261 loc) · 14.3 KB
/
BankRecovery.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
<!DOCTYPE HTML>
<!--
Zohaib Aftab, Data Scientist
zohaibdr.github.io |
To showcase my projects
-->
<html>
<head>
<title>Project: Bank loan recovery </title>
<meta content="" name="description">
<meta content="" name="keywords">
<!-- Favicons -->
<link href="assets/img/favicon.png" rel="icon">
<link href="assets/img/apple-touch-icon.png" rel="apple-touch-icon">
<!-- Vendor CSS Files -->
<link href="assets/vendor/bootstrap/css/bootstrap.min.css" rel="stylesheet">
<link href="assets/vendor/bootstrap-icons/bootstrap-icons.css" rel="stylesheet">
<link href="assets/vendor/glightbox/css/glightbox.min.css" rel="stylesheet">
<link href="assets/vendor/swiper/swiper-bundle.min.css" rel="stylesheet">
<!-- Template Main CSS File -->
<link href="assets/css/style.css" rel="stylesheet">
<style>
img{
max-width: 90%;
display: block; /* remove extra space below image */
}
</style>
</head>
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-TSB82NSVFW"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-TSB82NSVFW');
</script>
<body>
<!-- ======= Header ======= -->
<header id="header" class="fixed-top">
<div class="container d-flex align-items-center justify-content-between">
<!-- <h1 class="logo"><a href="index.html">DevFolio</a></h1> -->
<!-- Uncomment below if you prefer to use an image logo -->
<a href="index.html" class="logo"><img src="assets/img/logo.png" alt="" class="img-fluid"></a>
<nav id="navbar" class="navbar">
<ul>
<li><a class="nav-link scrollto active" href="index.html">Home</a></li>
<li><a class="nav-link scrollto" href="index.html#about">About</a></li>
<!-- <li><a class="nav-link scrollto" href="index.html#services">Services</a></li> -->
<li><a class="nav-link scrollto " href="index.html#portfolio">Portfolio</a></li>
<!-- <li><a class="nav-link scrollto " href="index.html#blog">Blog</a></li> -->
<li><a class="nav-link scrollto" href="index.html#contact">Contact</a></li>
</ul>
<i class="bi bi-list mobile-nav-toggle"></i>
</nav><!-- .navbar -->
<!-- .navbar -->
</div>
</header><!-- End Header -->
<div class="hero hero-single route bg-image" style="background-image: url(assets/img/overlay-bg.jpg)">
<div class="overlay-mf"></div>
<div class="hero-content display-table">
<div class="table-cell">
<div class="container">
<h2 class="hero-title mb-4">
Bank Loan Recovery Model
</h2>
<ol class="breadcrumb d-flex justify-content-center">
<li class="breadcrumb-item">
<a href="index.html#Portfolio">Portfolio</a>
</li>
<li class="breadcrumb-item active">
Bank Loan Recovery Model
</li>
</ol>
</div>
</div>
</div>
</div>
<main id="main">
<!-- ======= Portfolio Details Section ======= -->
<section id="portfolio-details" class="portfolio-details">
<div class="container">
<div class="row gy-4">
<div class="col-lg-8">
<div class="portfolio-details-slider swiper">
<div class="swiper-wrapper align-items-center">
<div class="swiper-slide">
<img src="images/the-collectors.png" alt="">
</div>
<div class="swiper-slide">
<img src="images/bank_recovery/RDD model.jpg" alt="">
</div>
<div class="swiper-slide">
<img src="images/bank_recovery/scatterActual.png" alt="">
</div>
</div>
<div class="swiper-pagination"></div>
</div>
</div>
<div class="col-lg-4">
<div class="portfolio-info">
<h3>Project information</h3>
<ul>
<li><strong>Category</strong>: Banking </li>
<li><strong>Tags</strong>: Kruskal-Wallis H test, Regression Discontinuity </li>
<li><strong>Data source </strong>:<a href="" target="_blank"> <em> Datacamp </em> </a> </li>
<p class="pt-3"><a class="btn btn-primary btn js-scroll px-4" href="https://github.com/zohaibdr/Portfolio/tree/master/Bank-Loan-Recovery-Paradox" target="_blank" role="button"> See Code on Github </a></p>
</ul>
</div>
<div class="portfolio-description">
</div>
</div>
<h2>Context</h2>
<p> After a debt has been legally declared "uncollectable" by a bank, the account is considered "charged-off." But that doesn't mean the bank walks away from the debt. They still want to collect some of the money they are owed using internal collections staff or outside collection agencies. The bank will score the account to assess the expected recovery amount, that is, the expected amount that the bank may be able to receive from the customer in the future. This amount is a function of the probability of the customer paying, the total debt, and other factors that impact the ability and willingness to pay.</p>
<h3>Recovery strategies and their cost </h3>
<p>Banks implement different recovery strategies at different thresholds ($1000, $2000, etc.) where the greater the expected recovery amount, the more effort the bank puts into contacting the customer. For higher recovery strategies, the bank incurs more costs as they leverage human resources in more efforts to obtain payments. Suppose, <strong>each additional level of recovery strategy requires an additional $50 per customer</strong> so that customers in the Recovery Strategy Level 1 cost the company $50 more than those in Level 0. Customers in Level 2 cost $50 more than those in Level 1, and so on. </p>
</p>
<!-- </header> -->
<h2>The Big Question </h2>
<p> <strong>Does the extra amount that is recovered at the higher strategy level exceed the extra $50 in costs?</strong>
In other words, was there a jump (also called a <strong>"discontinuity"</strong>) of more than $50 in the amount recovered at the higher strategy level? </p>
<h2> Data set </h2>
Historical data of expected and actual loan recoveries (1882 entries) against the recovery strategy and a few other factors:
<p> <img src="images\bank_recovery\datahead.png" alt="" /> </p>
<p> Complete data available on my <a href="" target="_blank">Github</a> </p>
<p> Here's a quick summary of the Levels and thresholds: </p>
<ul>
<li><strong>Level 0:</strong> Expected recovery amounts >$0 and <=$1000</li>
<li><strong>Level 1:</strong> Expected recovery amounts >$1000 and <=$2000</li>
<li><strong>Level 2:</strong> Expected recovery amounts >$2000 and <=$3000</li>
<li><strong>Level 3:</strong> Expected recovery amounts >$3000 and <=$5000</li>
<li><strong>Level 4:</strong> Expected recovery amounts >$5000 </li>
</ul>
</header>
<header>
<h2> Major Concept </h2>
<br/>
<h3>Regression Discontinuity Design (RDD):</h3>
<p> A method which has its roots in education research for establishing causal inference. It is a quasi-experimental impact evaluation method used to evaluate programs that have a <strong>cutoff point </strong> determining who is eligible to participate. </p>
<p> <img src="images\bank_recovery\RDD model.jpg" alt="" /> </p>
<blockquote> <strong>Can we find this sort of discontinuity at key cut-off points is the goal of this project. </strong> </blockquote>
</p>
</header>
<hr />
<!-- <header> -->
<h1>Data Analysis</h1>
<p><strong>To answer the big question </strong>: We will first exclude the effect of other variables on the expected recovery amount before concentrating on the effect of recovery strategy.</p>
For example, does the customer age show a jump (discontinuity) at the $1000 threshold or does that age vary smoothly?
Plotting a scatter plot to observe it visually,
<p> <img src="images\bank_recovery\scatterAge.png" alt="" /> </p>
<p> There certainly is a correlation (Pearson's r: 0.79), but we do not see a visible JUMP at $1000 on the x-axis (first vertical dotted line). </p>
<p> In order to make sure that age is similar above and below the $1000 recovery threshold, a statistical analysis is warranted in the vicinity of this threshold (say $900 - $1100) range.
</p>
<pre>
<code>
# Import stats module
from scipy import stats
# Compute average age just below and above the threshold
era_900_1100 = df.loc[(df['expected_recovery_amount']<1100) & (df['expected_recovery_amount']>=900)]
print(era_900_1100.head())
by_recovery_strategy = era_900_1100.groupby(['recovery_strategy'])
by_recovery_strategy['age'].describe().unstack()
print(by_recovery_strategy['age'].describe().unstack())
#Perform Kruskal-Wallis test
Level_0_age = era_900_1100.loc[df['recovery_strategy']=="Level 0 Recovery"]['age']
Level_1_age = era_900_1100.loc[df['recovery_strategy']=="Level 1 Recovery"]['age']
stats.kruskal(Level_0_age,Level_1_age)
</code>
</pre>
<p>
The <b> Kruskal-Wallis H test</b> (aka one-way ANOVA on ranks, a non-parametric test) elicits that there is <strong>no statistically significant difference</strong> in age in the expected recovery amounts of just below and above the $1000 threshold (H=3.45, p = .06).
</p>
<p> A similar test in performed on <strong>Gender</strong> by first creating a gender-wise cross-tabulation. Since Gender is a categorical variable, we can perform the <strong>Chi-squre test</strong>.
</p>
<pre>
<code>
crosstab = pd.crosstab(df.loc[(df['expected_recovery_amount']<1100) &
(df['expected_recovery_amount']>=900)]['recovery_strategy'],
df['sex'])
print(crosstab)
# Chi-square test
chi2_stat, p_val, dof, ex = stats.chi2_contingency(crosstab)
print(p_val)
</code>
</pre>
<h3> Test result </h3>
<p> <img src="images\bank_recovery\chiSq.png" alt="" /> </p>
<blockquote> <strong>Both of these tests are repeated at other key thresholds ($2000,$3000 and $5000) with similar results of failing to reject the null hypothesis (no age and gender effect)</strong>. </blockquote>
<hr />
<h2> Back to business </h2>
<p> Now that we are confident that age and gender has no significant effect on recovery, we can move to move to key variable of interest, the <strong> 'actual_recovery_amount'. </strong> </p>
A first step in examining the relationship between the actual recovery amount and the expected recovery amount is to develop a scatter plot.
<p> <img src="images\bank_recovery\scatterActual.png" alt="" /> </p>
<p> Since the cost of change in recovery strategy is only $50, it is hard to notice such small jump (if any) on this plot at key thresholds ($1000, $2000, etc). But we need measure the effect as well. For this, <strong>Regression Discontinuity Design (RDD)</strong> is a suitable option. </p>
<p> For this, we first add an indicator of the true threshold to the model (starting at $1000) which represents whether or not the expected recovery amount is greater than the threshold (say 0 and 1). </p>
<strong> When we add the true threshold to the model, the regression coefficient for the true threshold represents the additional amount recovered due to the higher recovery strategy.
</strong>
<blockquote> If the higher recovery strategy helped recovery more money, then the regression coefficient of the true threshold will be greater than zero. If the higher recovery strategy did not help recovery more money, then the regression coefficient will not be statistically significant. </blockquote>
<pre>
<code>
# Create indicator (0 or 1) for expected recovery amount >= $1000
df['indicator_1000'] = np.where(df['expected_recovery_amount']<1000, 0, 1)
era_900_1100 = df.loc[(df['expected_recovery_amount']<1100) &
(df['expected_recovery_amount']>=900)]
# Define X and y
X = era_900_1100[['expected_recovery_amount','indicator_1000']]
y = era_900_1100['actual_recovery_amount']
X = sm.add_constant(X)
# Build linear regression model
model = sm.OLS(y,X).fit()
# Print the model summary
model.summary()
</code>
</pre>
<h1>Results</h1>
<p> <img src="images\bank_recovery\result1.png" alt="" /> </p>
<p>
<strong> The regression coefficient for the true threshold was statistically significant with an estimated impact of around $278. This is much larger than the $50 per customer needed to run this higher recovery strategy. </strong> I performed similar analysis for all thresholds with the results summarized in the table below: </p>
<p> <img src="images\bank_recovery\result2.png" alt="" /> </p>
<h1>Conclusion</h1>
<div class="box">
To conclude, it is worth chasing the smaller loans, by changing the strategy at each of the first three strategy levels. However, for the loans greater than $5000, it is not worth investing in Level 4 strategy.
<p> </p>
<p> </p>
</div>
</div>
</section>
<!-- End Portfolio Details Section -->
<!-- ======= Footer ======= -->
<footer>
<div class="container">
<div class="row">
<div class="col-sm-12">
<div class="copyright-box">
<p class="copyright">© All Rights Reserved</p>
<div class="credits">
<!--
All the links in the footer should remain intact.
You can delete the links only if you purchased the pro version.
Licensing information: https://bootstrapmade.com/license/
Purchase the pro version with working PHP/AJAX contact form: https://bootstrapmade.com/buy/?theme=DevFolio
-->
Design credit to <a href="https://bootstrapmade.com/" target="_blank"> BootstrapMade</a>
</div>
</div>
</div>
</div>
</div>
</footer><!-- End Footer -->
<div id="preloader"></div>
<a href="#" class="back-to-top d-flex align-items-center justify-content-center"><i class="bi bi-arrow-up-short"></i></a>
<!-- Vendor JS Files -->
<script src="assets/vendor/purecounter/purecounter_vanilla.js"></script>
<script src="assets/vendor/bootstrap/js/bootstrap.bundle.min.js"></script>
<script src="assets/vendor/glightbox/js/glightbox.min.js"></script>
<script src="assets/vendor/swiper/swiper-bundle.min.js"></script>
<script src="assets/vendor/typed.js/typed.min.js"></script>
<script src="assets/vendor/php-email-form/validate.js"></script>
<!-- Template Main JS File -->
<script src="assets/js/main.js"></script>
</body>
</html>