In [81]:
import pandas as pd
import numpy as np
import os
from collections import Counter
%matplotlib inline
pd.set_option('max_rows',300)
pd.set_option('max_columns',100)

# Feedly Data 

In [12]:
feedly = pd.read_pickle('../../Data/Feedly_Processed_DF_cleaned.pkl')

In [13]:
for i in feedly.columns:
    print(i)

article_id
title
url
feed_label
content
published
summary
article_text
article_keywords
article_text_len
top_lang


## Feedly Data 

```
article_id          : unique identifier for article (LINKING FIELD)
title               : Title of article as provided by the feedly API
url                 : url of article 
feed_label          : Label of the Feedly Feed (created by IAP)
content             : article content - as provided by feedly API (sparsely populated)
published           : date article was published 
summary             : summary info on article provided by feedly API 
article_text        : text of article as scraped from Newspaper3K library
article_keywords    : keywords of article as calculated from Newspaper3K library
article_text_len    : length of article text
top_lang            : detected language (all english)
```

## Projects Data 

In [55]:
projects=pd.read_csv('../../Data/EWS_Published Project_Listing_DD.csv', encoding='ISO-8859-1')

In [56]:
for i in projects.columns:
    print(i)

EWS ID
ProjectNumber
Published
Bank Risk Rating
Project Status
EWS URL
Detailed Analysis URL
Project Name
City
Country Count
Country 1
Country 2
Country 3
Country 4
Country 5
Country 6
Country 7
Country 8
Country 9
Country 10
Country 11
Country 12
Borrower or Client
Private Actor Count
Private Actor 1
Private Actor 2
Private Actor 3
Private Actor 4
Private Actor 5
Private Actor 6
Private Actor 7
Private Actor 8
Private Actor 9
Private Actor 10
Private Actor 11
Private Actor 12
Private Actor 13
Private Actor 14
Private Actor 15
Bank Count
Bank 1
Bank 2
Bank 3
Bank 4
Bank 5
Sector Count
Sector 1
Sector 2
Sector 3
Sector 4
Sector 5
Sector 6
Sector 7
Last Edited
Date Scraped
Date Disclosed
Board Date
Source URL
Project Cost
Investment Amount
Project Description
Contact Information


```
EWS ID                 : ID used internally IAPs system (ignore)
ProjectNumber          : ID used to match projects in this data to labeled projects (LINKING FIELD)
Published              : Date projects was published to IAP system
Bank Risk Rating       : Bank Risk Rating for project
Project Status         : Project Status
EWS URL                : URL on EWS system
Detailed Analysis URL  : Detailed Analysis URL
Project Name           : Project Name 
City                   : City of project (often null)
Country Count          : Count of Countries involved in project
Country 1              : First Country 
Country 2              : Second Country (if more than one)
Country 3              : Third Country (if more than one)
Country 4              : etc. (above)
Country 5              : etc. (above)
Country 6              : etc. (above)
Country 7              : etc. (above)
Country 8              : etc. (above)
Country 9              : etc. (above)
Country 10             : etc. (above)
Country 11             : etc. (above)
Country 12             : etc. (above)
Borrower or Client     : Entity that 
Private Actor Count    : Count of Private Actors involved with project (often null)
Private Actor 1        : First Private Actor
Private Actor 2        : etc.
Private Actor 3        : etc.
Private Actor 4        : etc.
...
Bank Count             : Count of Banks Involved
Bank 1                 : First Bank
Bank 2                 : Second Bank 
Bank 3                 : etc.
Bank 4                 : etc.
Bank 5                 : etc.
Sector Count           : Count of Relevant Sectors
Sector 1               : First Sector
Sector 2               : Second Sector
Sector 3               : etc.
Sector 4               : etc.
Sector 5               : etc.
Sector 6               : etc.
Sector 7               : etc.
Last Edited            : Date project information was last edited 
Date Scraped           : Date project information was scraped
Date Disclosed         : Date the project was disclosed by bank
Board Date             : Date the board will vote/review.
Source URL             : Scrape url
Project Cost           : Project Cost
Investment Amount      : Amount expeted to be invested by bank
Project Description    : Text description of project (USEFUL !!)
Contact Information    : More text description of contacts for the bank
```


## Labeled Data 

In [32]:
import os
from os import path
labeled_data = {}
pth = '../../Data/Labeled_Data'
for f in os.listdir(pth):
    labeled_data[f] = pd.read_csv(path.join(pth,f))

### Banks 

In [33]:
for i in labeled_data['banks.csv'].columns:
    print(i)

article_id
published
title
url
feed_label
Bank1
Bank2


```
article_id     : unique identifier for article (LINKING FIELD)
published      : date article was published
title          : Title of article as provided by the feedly API
url            : url of article 
feed_label     : Label of the Feedly Feed (created by IAP)
Bank1          : Bank associated with article
Bank2          : Second bank associated with article
```

### Sectors 

In [34]:
for i in labeled_data['sectors.csv'].columns:
    print(i)

article_id
published
title
url
feed_label
Sectors
cl_Sector
top_sector


```
article_id     : unique identifier for article (LINKING FIELD)
published      : date article was published
title          : Title of article as provided by the feedly API
url            : url of article 
feed_label     : Label of the Feedly Feed (created by IAP)
Sectors        : Sector(s) associated with article
cl_Sector      : Sector(s) but lowercased and whitespace stripped
top_sector     : First sector if multiple, or just the one sector listed. 
```

### Countries

In [35]:
for i in labeled_data['countries.csv'].columns:
    print(i)

article_id
published
title
url
feed_label
Country1


```
article_id     : unique identifier for article (LINKING FIELD)
published      : date article was published
title          : Title of article as provided by the feedly API
url            : url of article 
feed_label     : Label of the Feedly Feed (created by IAP)
Country1       : Country(possibly more) associated with article
```

**Labeled Projects**

In [36]:
for i in labeled_data['projects.csv'].columns:
    print(i)

article_id
published
title
url
feed_label
ProjectNumber
EWS Project Name
EWS hyperlink
Matched


```
article_id       : unique identifier for article (LINKING FIELD)
published        : date article was published
title            : Title of article as provided by the feedly API
url              : url of article 
feed_label       : Label of the Feedly Feed (created by IAP)
ProjectNumber    : Project ID number in EWS System (LINKING FIELD)
EWS Project Name : Project Name in EWS System
EWS hyperlink    : Link to Project in EWS System
Matched          : !IMPORTANT! Indicates if project is matched to article, 1 = match exists, 0 article has no \
                    matching project. Not all articles will match a project.  
```

# End

## Scratch Link Up - Sanity Check 

**Project Data to Labeled Projects **

In [59]:
projects[projects.columns[0:2]].merge(labeled_data['projects.csv'],on='ProjectNumber',how='inner').shape

(70, 10)

**Feedly Data to Labeled Projects **

In [65]:
feedly[feedly.columns[0:2]].merge(labeled_data['projects.csv'], on='article_id',how='inner').shape

(112, 10)

In [71]:
for i in feedly.merge(labeled_data['projects.csv'], on='article_id',how='inner').article_text:
    print('********\n',i)

********
 The African Development Bank (AfDB), Green Climate Fund (GCF), and the Africa50 investment fund are collaborating to bring solar energy to the Sahel, in support of the priority set by countries in the region.

The three organisations said they would share ideas and resources about opportunities to make solar power available throughout the region, transforming African deserts into new sources of renewable energy.

The Desert-to-Power scheme, initiated by the AfDB, aims to develop 10,000megawatts (Mw) of solar energy across the Sahel region. It is intended to provide solar generated electricity to 250 million people, including 90 million through off grid solutions, thereby enabling the development of agriculture and other economic activities.

GCF Executive Director, Howard Bamsey, said: “Sahel countries have identified the potential of solar power to bring green energy to people across the region. Renewable energy investment is a priority in their Nationally Determined Contrib

In [88]:
projects[(projects['Sector 1'].notnull()) & (projects['Project Description'].notnull())][['Sector 1','Project Description']].shape

(5583, 2)

In [93]:
feedly[feedly.columns[0:2]].merge(labeled_data['countries.csv'], on='article_id', how='inner')

Unnamed: 0,article_id,title_x,published,title_y,url,feed_label,Country1
0,ee5530a3,AIIB approves $1.5 bln of loans to India for i...,2018-02-27,AIIB approves $1.5 bln of loans to India for i...,https://www.reuters.com/article/aiib-india-inv...,NEWS AIIB - All Streams,india
1,185416ad,"India, ADB sign USD 250 million loan agreement...",2018-02-01,"India, ADB sign USD 250 million loan agreement...",https://steelguru.com/auto/india-adb-sign-usd-...,NEWS ADB - All Streams,india
2,ef56fb55,EIB approves $1.8bn financing for Trans-Adriat...,2018-02-07,EIB approves $1.8bn financing for Trans-Adriat...,http://transportandstorage.energy-business-rev...,NEWS EIB - All streams,"albania, greece, italy"
3,85f28676,EBRD increases its portfolio in Azerbaijan,2018-04-09,EBRD increases its portfolio in Azerbaijan,https://en.trend.az/business/economy/2884342.html,NEWS EBRD - All streams,azerbaijan
4,8eba7336,AIIB approves loan to Bangladesh Independent P...,2018-03-06,AIIB approves loan to Bangladesh Independent P...,https://www.theasset.com/belt-road-online/3420...,NEWS AIIB - All Streams,bangladesh
5,dfef62fb,AIIB approves $1.5 billion in loans to India f...,2018-02-27,AIIB approves $1.5 billion in loans to India f...,https://in.reuters.com/article/aiib-india-inve...,NEWS AIIB - All Streams,india
6,64dcdae9,ADB commits US$175.3 million in geothermal ene...,2018-03-26,ADB commits US$175.3 million in geothermal ene...,https://www.opengovasia.com/articles/adb-commi...,NEWS ADB - All Streams,indonesia
7,a356bd17,"Minsk, EBRD launch Green City project",2018-05-04,"Minsk, EBRD launch Green City project",http://eng.belta.by/society/view/minsk-ebrd-la...,NEWS EBRD - All streams,ukraine
8,c645ddb5,AfDB approves $1.5 million for Jigawa power pr...,2018-05-05,AfDB approves $1.5 million for Jigawa power pr...,https://www.today.ng/news/nigeria/111381/afdb-...,NEWS AFDB- All Streams,nigeria
9,c0f4e98d,Panel to fix payment for leaving projects,2018-04-30,Panel to fix payment for leaving projects,http://kathmandupost.ekantipur.com/news/2018-0...,NEWS ADB - All Streams,nepal
