# Project Data Sample Review 

In [12]:
import pandas as pd
import numpy as np
import os
from collections import Counter
%matplotlib inline
pd.set_option('max_rows',300)

In [13]:
projects=pd.read_csv('../Data/EWS_Project_Listing_01JUN17-31MAY18.csv')
projects = projects[projects['EWS ID'].notnull()]

In [14]:
projects.shape

(2954, 77)

** Example Row **

In [15]:
projects.iloc[0]

EWS ID                                                               32549
ProjectNumber                                                    FMO-53557
Published                                                        Published
Bank Risk Rating                                                         B
Project Status                                                    Approved
EWS URL                  https://ews.rightsindevelopment.org/projects/5...
Detailed Analysis URL                                                  NaN
Project Name             SUNFARMING EURASIA ASSET ENERJI YATIRIMLARI VE...
City                                                                   NaN
Country Count                                                            1
Country 1                                                           Turkey
Country 2                                                              NaN
Country 3                                                              NaN
Country 4                

** Null Check **

In [16]:
projects.count()/len(projects.index)

EWS ID                   1.000000
ProjectNumber            1.000000
Published                1.000000
Bank Risk Rating         1.000000
Project Status           0.957684
EWS URL                  1.000000
Detailed Analysis URL    0.000000
Project Name             1.000000
City                     0.193297
Country Count            1.000000
Country 1                0.899797
Country 2                0.040961
Country 3                0.020650
Country 4                0.013541
Country 5                0.010156
Country 6                0.005755
Country 7                0.003047
Country 8                0.001693
Country 9                0.001693
Country 10               0.001693
Country 11               0.001354
Country 12               0.001016
Borrower or Client       0.851388
Private Actor Count      1.000000
Private Actor 1          0.152674
Private Actor 2          0.030467
Private Actor 3          0.008802
Private Actor 4          0.005416
Private Actor 5          0.003724
Private Actor 

## Project Description Column will Likely Be most Useful for Matching 

**Notes**
    * Some descriptions are pretty short - not sure how easy it will be to match to those
    * Some of the other fields will likely be useful (Country, Borrower or Client, etc.)

In [7]:
for i in projects.sample(15)['Project Description']:
    print(i)
    print('*****\n')

According to FMO website, FMO provides ACBA with a USD 15 million multi-currency facility that ACBA can choose to drawdown in either US Dollar or Armenian Dram. The facility will improve access to finance for Armenian MSMEs and farmers that remain underserved. It will thereby contribute to development of the real sector in the country, which continues to be one of the poorest in the region. With its clear focus on MSMEs and farmers and its wide regional outreach, ACBA is well positioned to finance these underserved segments of the Armenian economy. In rural areas where many other banks are not present, ACBA is often the first financial institution working with the local entrepreneurs and individuals. Strict on-lending criteria will ensure that FMO's funds will be channeled exclusively to MSMEs and farmers.
*****

Summarized from the IIC: The IIC, as member of the IDB Group, will support Locfund II's expansion through a Loan of up to US$10 million (the "IIC A Loan"). Thus, the number of

## Looking at the Sector Data 

This could be a dataset that could help in the tagging of the news articles with (Sector Infortmation)

In [8]:
def get_category_cols(category, additional_removes=None):
    cols = [i for i in projects.columns if category in i]
    cols.remove(category + ' Count')
    if additional_removes:
        [cols.remove(i) for i in additional_removes]
    return cols

sector_cols = get_category_cols('Sector')   
all_sectors = projects[sector_cols].as_matrix().flatten()


In [9]:
Counter(all_sectors)

Counter({'Agriculture and Forestry': 337,
         'Climate and Environment': 172,
         'Communications': 77,
         'Construction': 342,
         'Education and Health': 245,
         'Energy': 488,
         'Finance': 673,
         'Humanitarian Response': 29,
         'Hydropower': 60,
         'Industry and Trade': 323,
         'Infrastructure': 255,
         'Law and Government': 277,
         'Mining': 25,
         'Technical Cooperation': 527,
         'Transport': 322,
         'Water and Sanitation': 264,
         nan: 16262})

# NOTE 

Might be able to use this to also classify the Bank and Country 

**Countries**

In [10]:
country_cols = get_category_cols('Country')
all_countries = projects[country_cols].as_matrix().flatten()
country_counter = Counter(all_countries)
print(len(country_counter))
country_counter

170


Counter({'Afghanistan': 25,
         'Albania': 10,
         'Angola': 6,
         'Argentina': 57,
         'Armenia': 26,
         'Austria': 11,
         'Azerbaijan': 15,
         'Bahamas': 8,
         'Bangladesh': 64,
         'Barbados': 3,
         'Belarus': 7,
         'Belgium': 10,
         'Belize': 9,
         'Benin': 6,
         'Bhutan': 17,
         'Bolivia': 30,
         'Bosnia and Herzegovina': 15,
         'Brazil': 79,
         'Bulgaria': 9,
         'Burkina Faso': 7,
         'Burundi': 1,
         'Cambodia': 42,
         'Cameroon': 11,
         'Cape Verde': 3,
         'Central African Republic': 7,
         'Chad': 10,
         'Chile': 22,
         'China': 128,
         'Colombia': 53,
         'Congo, Democratic Republic of': 7,
         'Congo, Republic of': 3,
         'Cook Islands': 1,
         'Costa Rica': 23,
         'Croatia': 13,
         'Cyprus': 1,
         'Czech Republic': 11,
         'Denmark': 9,
         'Djibouti': 3,
         'Do

**Banks**

In [11]:
banks_cols = get_category_cols('Bank', ['Bank Risk Rating'])
all_banks = projects[banks_cols].as_matrix().flatten()
all_banks = [i for i  in all_banks if  pd.notnull(i)] ## Something weird with the nulls in this one. 
banks_counter = Counter(all_banks)
print(len(banks_counter))
banks_counter

13


Counter({'African Development Bank (AFDB)': 45,
         'Asian Development Bank (ADB)': 644,
         'Asian Infrastructure Investment Bank (AIIB)': 51,
         'European Bank for Reconstruction and Development (EBRD)': 179,
         'European Investment Bank (EIB)': 462,
         'Green Climate Fund (GCF)': 60,
         'Inter-American Development Bank (IADB)': 574,
         'Inter-American Investment Corporation (IIC)': 67,
         'International Finance Corporation (IFC)': 332,
         'Multilateral Investment Guarantee Agency (MIGA)': 39,
         'Netherlands Development Finance Company (FMO)': 206,
         'New Development Bank (NDB)': 12,
         'World Bank (WB)': 337})