# Project Solution - Secondary

You have to work on the [Kiva](https://drive.google.com/file/d/1-tJtnIbo1Rt-F1XfoWGVkmBXiI-ciuRx/view) dataset. Some information on the datasets are on the [Kaggle](https://www.kaggle.com/gaborfodor/additional-kiva-snapshot) web page.

## Basic tasks

**1.** Normalize the `loan_lenders` table. In the normalized table, each row must have one `loan_id` and one `lender`.

Firstly, csv file is read and data frame with unnormalized data is created:

In [1]:
import pandas as pd
import numpy as np
import time as tm
FINAL = True # This is final calculation, on complete data
start_time = tm.time()
loans_lenders_un = pd.read_csv('../../Datasets/kiva/loans_lenders.csv')
if not FINAL:
    loans_lenders_un = loans_lenders_un.head(10000) 
print("--- %s seconds ---" % (tm.time()-start_time)) 
loans_lenders_un

--- 8.599493741989136 seconds ---


Unnamed: 0,loan_id,lenders
0,483693,"muc888, sam4326, camaran3922, lachheb1865, reb..."
1,483738,"muc888, nora3555, williammanashi, barbara5610,..."
2,485000,"muc888, terrystl, richardandsusan8352, sherri4..."
3,486087,"muc888, james5068, rudi5955, daniel9859, don92..."
4,534428,"muc888, niki3008, teresa9174, mike4896, david7..."
...,...,...
9995,45940,"helga4707, james6963, jimjams, andreas2382, si..."
9996,247491,"priyaram, christian9832, john9242, sandra1434,..."
9997,345274,"priyaram, nicola1093, bobby9744, simon7848, di..."
9998,125945,"joseph1859, matt5349, reese3555, stanley3312, ..."


Data frame with structured data will be created with help of intermediate object - list of pairs (loan_id, lender) packed in the dictionary:

In [2]:
import time as tm
start_time = tm.time()
lis = []
for index, row in loans_lenders_un.iterrows(): 
    ls = row['lenders'].split(',')
    for l in ls:
        lis.append({ 'loan_id' : row['loan_id'], 'lender': l.strip() })
loans_lender = pd.DataFrame(lis) 
print("--- %s seconds ---" % (tm.time()-start_time)) 
loans_lender 

--- 6.878727197647095 seconds ---


Unnamed: 0,loan_id,lender
0,483693,muc888
1,483693,sam4326
2,483693,camaran3922
3,483693,lachheb1865
4,483693,rebecca3499
...,...,...
245000,225434,wongacom3393
245001,225434,marleneanddel8151
245002,225434,joanne4956
245003,225434,juddie7070


**2.** For each loan, add a column `duration` corresponding to the number of days between the disburse time and the planned expiration time. If any of those two dates is missing, also the duration must be missing.

Firstly, data frame should be loaded and structure of the data frame `loans` should be determined:

In [3]:
import pandas as pd
import numpy as np
import time as tm
start_time = tm.time()
loans = pd.read_csv('../../Datasets/kiva/loans.csv')
print("--- %s seconds ---" % (tm.time()-start_time)) 
loans.columns

--- 64.78130149841309 seconds ---


Index(['loan_id', 'loan_name', 'original_language', 'description',
       'description_translated', 'funded_amount', 'loan_amount', 'status',
       'activity_name', 'sector_name', 'loan_use', 'country_code',
       'country_name', 'town_name', 'currency_policy',
       'currency_exchange_coverage_rate', 'currency', 'partner_id',
       'posted_time', 'planned_expiration_time', 'disburse_time',
       'raised_time', 'lender_term', 'num_lenders_total',
       'num_journal_entries', 'num_bulk_entries', 'tags', 'borrower_genders',
       'borrower_pictured', 'repayment_interval', 'distribution_model'],
      dtype='object')

After thar, values from the data frame should be displayed:

In [4]:
if not FINAL:
    loans = loans.head(20000)
loans

Unnamed: 0,loan_id,loan_name,original_language,description,description_translated,funded_amount,loan_amount,status,activity_name,sector_name,...,raised_time,lender_term,num_lenders_total,num_journal_entries,num_bulk_entries,tags,borrower_genders,borrower_pictured,repayment_interval,distribution_model
0,657307,Aivy,English,"Aivy, 21 years of age, is single and lives in ...",,125.0,125.0,funded,General Store,Retail,...,2014-01-15 04:48:22.000 +0000,7.0,3,2,1,,female,true,irregular,field_partner
1,657259,Idalia Marizza,Spanish,"Doña Idalia, esta casada, tiene 57 años de eda...","Idalia, 57, is married and lives with her husb...",400.0,400.0,funded,Used Clothing,Clothing,...,2014-02-25 06:42:06.000 +0000,8.0,11,2,1,,female,true,monthly,field_partner
2,658010,Aasia,English,Aasia is a 45-year-old married lady and she ha...,,400.0,400.0,funded,General Store,Retail,...,2014-01-24 23:06:18.000 +0000,14.0,16,2,1,"#Woman Owned Biz, #Supporting Family, user_fav...",female,true,monthly,field_partner
3,659347,Gulmira,Russian,"Гулмире 36 лет, замужем, вместе с супругом вос...",Gulmira is 36 years old and married. She and ...,625.0,625.0,funded,Farming,Agriculture,...,2014-01-22 05:29:28.000 +0000,14.0,21,2,1,user_favorite,female,true,monthly,field_partner
4,656933,Ricky\t,English,Ricky is a farmer who currently cultivates his...,,425.0,425.0,funded,Farming,Agriculture,...,2014-01-14 17:29:27.000 +0000,7.0,15,2,1,"#Animals, #Eco-friendly, #Sustainable Ag",male,true,bullet,field_partner
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,341198,Guillermo Antonio,Spanish,"El Señor Guillermo tiene 42 años, la ubicación...",Guillermo is 42 years old. His business is lo...,375.0,375.0,funded,Shoe Sales,Retail,...,2011-10-13 00:00:41.000 +0000,14.0,10,1,1,,male,true,monthly,field_partner
19996,341771,ERLINDA,English,Erlinda is from the village of Sillawit. She i...,,175.0,175.0,funded,General Store,Retail,...,2011-10-14 06:02:23.000 +0000,8.0,3,2,1,,female,true,irregular,field_partner
19997,344237,Ma Guadalupe,Spanish,La señora Ma Guadalupe es madre de 2 hijos de ...,\r\nMaría Guadalupe is the mother of two child...,625.0,625.0,funded,Property,Housing,...,2011-10-27 19:23:44.000 +0000,14.0,23,1,1,,female,true,monthly,field_partner
19998,344735,Blanca Lidia,Spanish,"Blanca, vive con su esposo, tiene tres hijos d...",Blanca lives with her husband and has three ch...,350.0,350.0,funded,Food Production/Sales,Food,...,2011-10-12 02:10:27.000 +0000,8.0,10,4,2,,female,true,monthly,field_partner


We are now mainly interested in columns 'planned_expiration_time', 'disburse_time':

In [5]:
loans_sel = loans[ ['loan_id', 'loan_name','planned_expiration_time', 'disburse_time'] ]
loans_sel

Unnamed: 0,loan_id,loan_name,planned_expiration_time,disburse_time
0,657307,Aivy,2014-02-14 03:30:06.000 +0000,2013-12-22 08:00:00.000 +0000
1,657259,Idalia Marizza,2014-03-26 22:25:07.000 +0000,2013-12-20 08:00:00.000 +0000
2,658010,Aasia,2014-02-15 21:10:05.000 +0000,2014-01-09 08:00:00.000 +0000
3,659347,Gulmira,2014-02-21 03:10:02.000 +0000,2014-01-17 08:00:00.000 +0000
4,656933,Ricky\t,2014-02-13 06:10:02.000 +0000,2013-12-17 08:00:00.000 +0000
...,...,...,...,...
19995,341198,Guillermo Antonio,,2011-09-13 07:00:00.000 +0000
19996,341771,ERLINDA,,2011-09-16 07:00:00.000 +0000
19997,344237,Ma Guadalupe,,2011-10-07 07:00:00.000 +0000
19998,344735,Blanca Lidia,,2011-09-16 07:00:00.000 +0000


In [6]:
from datetime import datetime as dt
import time as tm
start_time = tm.time()
loans['duration'] = None
for index, row in loans.iterrows(): 
    s2 = row['planned_expiration_time']
    s1 = row['disburse_time']
    if( pd.notna(s1) and pd.notna(s2) and s1 != '' and s2 != ''):
        d2 = dt.strptime(s2, "%Y-%m-%d %H:%M:%S.%f %z")
        d1 = dt.strptime(s1, "%Y-%m-%d %H:%M:%S.%f %z")
        loans.loc[index,'duration'] = (d2 - d1).days 
loans_sel = loans[ ['loan_id', 'loan_name','planned_expiration_time', 'disburse_time', 'duration'] ]
print("--- %s seconds ---" % (tm.time()-start_time)) 
loans_sel

--- 37.69523572921753 seconds ---


Unnamed: 0,loan_id,loan_name,planned_expiration_time,disburse_time,duration
0,657307,Aivy,2014-02-14 03:30:06.000 +0000,2013-12-22 08:00:00.000 +0000,53
1,657259,Idalia Marizza,2014-03-26 22:25:07.000 +0000,2013-12-20 08:00:00.000 +0000,96
2,658010,Aasia,2014-02-15 21:10:05.000 +0000,2014-01-09 08:00:00.000 +0000,37
3,659347,Gulmira,2014-02-21 03:10:02.000 +0000,2014-01-17 08:00:00.000 +0000,34
4,656933,Ricky\t,2014-02-13 06:10:02.000 +0000,2013-12-17 08:00:00.000 +0000,57
...,...,...,...,...,...
19995,341198,Guillermo Antonio,,2011-09-13 07:00:00.000 +0000,
19996,341771,ERLINDA,,2011-09-16 07:00:00.000 +0000,
19997,344237,Ma Guadalupe,,2011-10-07 07:00:00.000 +0000,
19998,344735,Blanca Lidia,,2011-09-16 07:00:00.000 +0000,


**3.** Find the lenders that have funded at least twice.

Those are lenders that are duplicated in structured loan_leneders dataframe.


- This solution is based on calculation/counting using dicitionary object:

In [7]:
import time as tm
start_time = tm.time()
stat={}
for index, row in loans_lender.iterrows(): 
    lender = row['lender']
    stat[lender] = 1 + stat.get(lender, 0)
lenders_funded_2_more = []
for k in stat.keys():
    if( stat[k] >= 2):
        lenders_funded_2_more.append((k,stat[k]))
print("--- %s seconds ---" % (tm.time()-start_time)) 
lenders_funded_2_more

--- 52.25188136100769 seconds ---


[('muc888', 11),
 ('sam4326', 6),
 ('rebecca3499', 167),
 ('karlheinz4543', 13),
 ('jerrydb', 2),
 ('paula8951', 3),
 ('gmct', 981),
 ('r3922', 95),
 ('brian9451', 6),
 ('shree8053', 101),
 ('alan5513', 62),
 ('oisin3389', 20),
 ('helle8622', 2),
 ('bo3186', 7),
 ('ric8947', 2),
 ('daniel98469874', 113),
 ('deborah12671549', 7),
 ('matthew9831', 2),
 ('john6330', 19),
 ('john9479', 6),
 ('mattiaslaven', 73),
 ('jason3883', 13),
 ('highgrovechurch', 260),
 ('dino5102', 14),
 ('jonathan7946', 7),
 ('ann8187', 6),
 ('bryan2669', 8),
 ('eddyphil', 19),
 ('don9212', 171),
 ('carolineandcolin9686', 3),
 ('bent8782', 2),
 ('raph8817', 4),
 ('danielle2350', 5),
 ('barbara5610', 341),
 ('danhostetler', 3),
 ('daniel1104', 9),
 ('amirali5409', 259),
 ('oceanwest', 15),
 ('trolltech4460', 1295),
 ('thedragonflykeeper', 10),
 ('patrick8466', 2),
 ('terrystl', 89),
 ('sherri4341', 3),
 ('gooddogg1', 1427),
 ('danny6470', 46),
 ('jacqueline4838', 8),
 ('diederik8163', 9),
 ('ryan54597608', 7),
 ('ja

**4.** For each country, compute how many loans have involved that country as borrowers.

- This solution is based on calculation/counting using dicitionary object:

In [8]:
import time as tm
start_time = tm.time()
stat={}
for index, row in loans.iterrows(): 
    cn = row['country_name']
    cc = row['country_code']
    stat[(cc,cn)] = 1 + stat.get((cc,cn), 0)
print("--- %s seconds ---" % (tm.time()-start_time)) 
stat

--- 4.853782653808594 seconds ---


{('PH', 'Philippines'): 4542,
 ('HN', 'Honduras'): 222,
 ('PK', 'Pakistan'): 610,
 ('KG', 'Kyrgyzstan'): 165,
 ('SV', 'El Salvador'): 1073,
 ('BI', 'Burundi'): 35,
 ('ML', 'Mali'): 180,
 ('MN', 'Mongolia'): 134,
 ('PE', 'Peru'): 1059,
 ('GE', 'Georgia'): 120,
 ('AM', 'Armenia'): 212,
 ('GH', 'Ghana'): 313,
 ('TZ', 'Tanzania'): 176,
 ('KH', 'Cambodia'): 955,
 ('TG', 'Togo'): 205,
 ('GT', 'Guatemala'): 134,
 ('LB', 'Lebanon'): 230,
 ('PY', 'Paraguay'): 372,
 ('NI', 'Nicaragua'): 608,
 ('KE', 'Kenya'): 1876,
 ('UG', 'Uganda'): 641,
 ('RW', 'Rwanda'): 233,
 ('MZ', 'Mozambique'): 77,
 ('AZ', 'Azerbaijan'): 109,
 ('IL', 'Israel'): 12,
 ('TJ', 'Tajikistan'): 659,
 ('BO', 'Bolivia'): 356,
 ('MX', 'Mexico'): 196,
 ('ID', 'Indonesia'): 124,
 ('NG', 'Nigeria'): 175,
 ('EC', 'Ecuador'): 427,
 ('MG', 'Madagascar'): 57,
 ('VN', 'Vietnam'): 309,
 ('CO', 'Colombia'): 458,
 ('JO', 'Jordan'): 230,
 ('YE', 'Yemen'): 88,
 ('IN', 'India'): 484,
 ('CL', 'Chile'): 16,
 ('AL', 'Albania'): 50,
 ('PS', 'Palesti

**5.** For each country, compute the overall amount of money borrowed.


- This solution is based on calculation/sum using dicitionary object:

In [9]:
import time as tm
start_time = tm.time()
stat={}
for index, row in loans.iterrows(): 
    cn = row['country_name']
    cc = row['country_code']
    stat[(cc,cn)] = stat.get((cc,cn), 0) + row['loan_amount']
print("--- %s seconds ---" % (tm.time()-start_time)) 
stat

--- 4.855634450912476 seconds ---


{('PH', 'Philippines'): 1523875.0,
 ('HN', 'Honduras'): 186600.0,
 ('PK', 'Pakistan'): 325000.0,
 ('KG', 'Kyrgyzstan'): 215175.0,
 ('SV', 'El Salvador'): 689500.0,
 ('BI', 'Burundi'): 108675.0,
 ('ML', 'Mali'): 201575.0,
 ('MN', 'Mongolia'): 235425.0,
 ('PE', 'Peru'): 1003125.0,
 ('GE', 'Georgia'): 191675.0,
 ('AM', 'Armenia'): 371075.0,
 ('GH', 'Ghana'): 245825.0,
 ('TZ', 'Tanzania'): 324150.0,
 ('KH', 'Cambodia'): 672900.0,
 ('TG', 'Togo'): 136325.0,
 ('GT', 'Guatemala'): 225375.0,
 ('LB', 'Lebanon'): 326675.0,
 ('PY', 'Paraguay'): 907925.0,
 ('NI', 'Nicaragua'): 486375.0,
 ('KE', 'Kenya'): 982100.0,
 ('UG', 'Uganda'): 524325.0,
 ('RW', 'Rwanda'): 358875.0,
 ('MZ', 'Mozambique'): 44300.0,
 ('AZ', 'Azerbaijan'): 205675.0,
 ('IL', 'Israel'): 42525.0,
 ('TJ', 'Tajikistan'): 619850.0,
 ('BO', 'Bolivia'): 666475.0,
 ('MX', 'Mexico'): 303550.0,
 ('ID', 'Indonesia'): 93400.0,
 ('NG', 'Nigeria'): 62175.0,
 ('EC', 'Ecuador'): 470950.0,
 ('MG', 'Madagascar'): 17700.0,
 ('VN', 'Vietnam'): 37770

**6.** Like the previous point, but expressed as a percentage of the overall amount lent.


- This solution is based on calculation using dicitionary object:

In [10]:
import time as tm
start_time = tm.time()
stat={}
total = 0
for index, row in loans.iterrows(): 
    cn = row['country_name']
    cc = row['country_code']
    stat[(cc,cn)] = stat.get((cc,cn), 0) + row['loan_amount']
    total += row['loan_amount']
for k in stat.keys():
    stat[k] = stat[k]/total*100
print("--- %s seconds ---" % (tm.time()-start_time)) 
stat

--- 5.539268732070923 seconds ---


{('PH', 'Philippines'): 9.053238179401303,
 ('HN', 'Honduras'): 1.1085779635969375,
 ('PK', 'Pakistan'): 1.9308029912594034,
 ('KG', 'Kyrgyzstan'): 1.278340103520745,
 ('SV', 'El Salvador'): 4.096272807610334,
 ('BI', 'Burundi'): 0.6456308156157404,
 ('ML', 'Mali'): 1.1975434245018899,
 ('MN', 'Mongolia'): 1.3986439822069077,
 ('PE', 'Peru'): 5.959497694175658,
 ('GE', 'Georgia'): 1.138728194921988,
 ('AM', 'Armenia'): 2.204531446097179,
 ('GH', 'Ghana'): 1.4604296779272088,
 ('TZ', 'Tanzania'): 1.925753198820725,
 ('KH', 'Cambodia'): 3.9976533317490848,
 ('TG', 'Togo'): 0.8098975931798097,
 ('GT', 'Guatemala'): 1.3389376127848862,
 ('LB', 'Lebanon'): 1.9407540528297402,
 ('PY', 'Paraguay'): 5.393920941043674,
 ('NI', 'Nicaragua'): 2.889520938073207,
 ('KE', 'Kenya'): 5.834589592971877,
 ('UG', 'Uganda'): 3.1149793181294974,
 ('RW', 'Rwanda'): 2.132052072271441,
 ('MZ', 'Mozambique'): 0.2631833000393587,
 ('AZ', 'Azerbaijan'): 1.2219012468531625,
 ('IL', 'Israel'): 0.2526381452409419,


`7.` Like the three previous points, but split for each year (with respect to `disburse time`).


- This solution is based on calculation using dicitionary object:

In [12]:
from datetime import datetime as dt
import time as tm
start_time = tm.time()
stat={}
total = 0
for index, row in loans.iterrows(): 
    cn = row['country_name']
    cc = row['country_code']
    y = dt.strptime(row['disburse_time'], "%Y-%m-%d %H:%M:%S.%f %z").year
    triplet = stat.get((cc,cn,y), (0,0,0))
    stat[(cc,cn,y)] =  (triplet[0] + 1, triplet[1] + row['loan_amount'], None)
    total += row['loan_amount']
for k in stat.keys():
    triplet = stat[k]
    stat[k]= (triplet[0], triplet[1], triplet[1]/total*100)
print("--- %s seconds ---" % (tm.time()-start_time)) 
stat

--- 8.115744590759277 seconds ---


{('PH', 'Philippines', 2013): (693, 265225.0, 1.5756837641746932),
 ('HN', 'Honduras', 2013): (65, 57250.0, 0.3401183730756949),
 ('PK', 'Pakistan', 2014): (138, 66950.0, 0.3977454161994371),
 ('KG', 'Kyrgyzstan', 2014): (37, 45175.0, 0.26838161578505704),
 ('PH', 'Philippines', 2014): (907, 306900.0, 1.8232721169769568),
 ('SV', 'El Salvador', 2014): (278, 183275.0, 1.0888243637632835),
 ('BI', 'Burundi', 2014): (9, 23575.0, 0.14005747851981673),
 ('ML', 'Mali', 2014): (57, 59500.0, 0.35348547070749076),
 ('MN', 'Mongolia', 2013): (57, 121450.0, 0.721526225502937),
 ('PE', 'Peru', 2013): (220, 228575.0, 1.3579485960834403),
 ('HN', 'Honduras', 2014): (62, 50875.0, 0.30224492978560663),
 ('PH', 'Philippines', 2015): (467, 162950.0, 0.9680749151560608),
 ('PK', 'Pakistan', 2015): (65, 32700.0, 0.19426848558209997),
 ('GE', 'Georgia', 2015): (8, 13975.0, 0.08302452862415434),
 ('SV', 'El Salvador', 2015): (93, 63125.0, 0.3750213502253841),
 ('AM', 'Armenia', 2015): (26, 36925.0, 0.219368

- Another way for solving problem, based on panda functions:

In [14]:
from datetime import datetime as dt
import time as tm
start_time = tm.time()
loans_total = loans['loan_amount'].sum()
print("--- %s seconds ---" % (tm.time()-start_time)) 
loans       
#loans['disburse_time_year'] = loans.groupby(['loan_id']).agg({'disburse_time':[lambda x: dt.strptime(x.avg(), "%Y-%m-%d %H:%M:%S.%f %z").year]}) 
#loans_agg = loans.groupby(['country_code','country_name','disburse_time_year'], as_index = False).agg({'loan_amount':['count', 'sum', lambda x: x.sum()/total*100]})
#loans_agg.columns = loans_agg.columns.droplevel(0)
#print("--- %s seconds ---" % (tm.time()-start_time)) 
#loans_agg
                                                                                                       
                                                                                                       

--- 0.0 seconds ---


Unnamed: 0,loan_id,loan_name,original_language,description,description_translated,funded_amount,loan_amount,status,activity_name,sector_name,...,num_lenders_total,num_journal_entries,num_bulk_entries,tags,borrower_genders,borrower_pictured,repayment_interval,distribution_model,duration,disburse_time_year
0,657307,Aivy,English,"Aivy, 21 years of age, is single and lives in ...",,125.0,125.0,funded,General Store,Retail,...,3,2,1,,female,true,irregular,field_partner,53,2013
1,657259,Idalia Marizza,Spanish,"Doña Idalia, esta casada, tiene 57 años de eda...","Idalia, 57, is married and lives with her husb...",400.0,400.0,funded,Used Clothing,Clothing,...,11,2,1,,female,true,monthly,field_partner,96,2013
2,658010,Aasia,English,Aasia is a 45-year-old married lady and she ha...,,400.0,400.0,funded,General Store,Retail,...,16,2,1,"#Woman Owned Biz, #Supporting Family, user_fav...",female,true,monthly,field_partner,37,2014
3,659347,Gulmira,Russian,"Гулмире 36 лет, замужем, вместе с супругом вос...",Gulmira is 36 years old and married. She and ...,625.0,625.0,funded,Farming,Agriculture,...,21,2,1,user_favorite,female,true,monthly,field_partner,34,2014
4,656933,Ricky\t,English,Ricky is a farmer who currently cultivates his...,,425.0,425.0,funded,Farming,Agriculture,...,15,2,1,"#Animals, #Eco-friendly, #Sustainable Ag",male,true,bullet,field_partner,57,2013
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,341198,Guillermo Antonio,Spanish,"El Señor Guillermo tiene 42 años, la ubicación...",Guillermo is 42 years old. His business is lo...,375.0,375.0,funded,Shoe Sales,Retail,...,10,1,1,,male,true,monthly,field_partner,,2011
19996,341771,ERLINDA,English,Erlinda is from the village of Sillawit. She i...,,175.0,175.0,funded,General Store,Retail,...,3,2,1,,female,true,irregular,field_partner,,2011
19997,344237,Ma Guadalupe,Spanish,La señora Ma Guadalupe es madre de 2 hijos de ...,\r\nMaría Guadalupe is the mother of two child...,625.0,625.0,funded,Property,Housing,...,23,1,1,,female,true,monthly,field_partner,,2011
19998,344735,Blanca Lidia,Spanish,"Blanca, vive con su esposo, tiene tres hijos d...",Blanca lives with her husband and has three ch...,350.0,350.0,funded,Food Production/Sales,Food,...,10,4,2,,female,true,monthly,field_partner,,2011


**8.** For each lender, compute the overall amount of money lent. For each loan that has more than one lender, you must assume that all lenders contributed the same amount.


- This solution is based on calculation using dicitionary objects:

In [15]:
import time as tm
start_time = tm.time()
loan_ids_by_lender = {}
number_of_lenders_by_loan_id = {}
for index, row in loans_lender.iterrows(): 
    loanId = row['loan_id']
    lender = row['lender']
    s = loan_ids_by_lender.get(lender,set())
    s.add(loanId)
    loan_ids_by_lender[lender] = s 
    number_of_lenders_by_loan_id[loanId] = number_of_lenders_by_loan_id.get(loanId, 0) + 1
amount_by_loan_id = {}
for index, row in loans.iterrows(): 
    loanId = row['loan_id']
    amount = row['loan_amount']
    amount_by_loan_id[loanId] = amount
#(loan_ids_by_lender, number_of_lenders_by_loan_id, amount_by_loan_id)  
money_lent = {}
for lender in loan_ids_by_lender:
    s = loan_ids_by_lender[lender]
    amm = 0
    for loanId in s:
        amm += amount_by_loan_id.get(loanId,0) / number_of_lenders_by_loan_id.get(loanId,1)
    money_lent[lender] = amm
print("--- %s seconds ---" % (tm.time()-start_time)) 
money_lent


--- 61.58185791969299 seconds ---


{'muc888': 30.434782608695652,
 'sam4326': 0.0,
 'camaran3922': 0.0,
 'lachheb1865': 0.0,
 'rebecca3499': 495.03070528303584,
 'karlheinz4543': 39.166666666666664,
 'jerrydb': 0.0,
 'paula8951': 0.0,
 'gmct': 1679.3040736176308,
 'amra9383': 0.0,
 'r3922': 379.7326854942665,
 'brian9451': 0.0,
 'shree8053': 428.4476431869338,
 'alan5513': 172.56036110096213,
 'oisin3389': 95.12117346938776,
 'helle8622': 0.0,
 'bo3186': 53.55750487329435,
 'ric8947': 0.0,
 'daniel98469874': 840.084585223973,
 'nick9464': 0.0,
 'deborah12671549': 0.0,
 'matthew9831': 0.0,
 'john6330': 124.68932563513674,
 'john9479': 26.785714285714285,
 'mattiaslaven': 312.77470740469715,
 'jonathan2867': 0.0,
 'jason3883': 0.0,
 'highgrovechurch': 1581.8699298403642,
 'maria3124': 0.0,
 'dino5102': 67.85714285714286,
 'jonathan7946': 28.571428571428573,
 'ann8187': 0.0,
 'bryan2669': 33.8768115942029,
 'john88459657': 0.0,
 'eddyphil': 29.081632653061224,
 'don9212': 738.9803032088435,
 'carolineandcolin9686': 0.0,
 '

**9.** For each country, compute the difference between the overall amount of money lent and the overall amount of money borrowed. Since the country of the lender is often unknown, you can assume that the true distribution among the countries is the same as the one computed from the rows where the country is known.


**10.** Which country has the highest ratio between the difference computed at the previous point and the population?


**11.** Which country has the highest ratio between the difference computed at point 9 and the population that is not below the poverty line?


**12.** For each year, compute the total amount of loans. Each loan that has planned expiration time and disburse time in different years must have its amount distributed proportionally to the number of days in each year. For example, a loan with disburse time December 1st, 2016, planned expiration time January 30th 2018, and amount 5000USD has an amount of 5000USD * 31 / (31+365+30) = 363.85 for 2016, 5000USD * 365 / (31+365+30) = 4284.04 for 2017, and 5000USD * 30 / (31+365+30) = 352.11 for 2018.