<img src="http://eikon.tpq.io/refinitiv_logo.png" width="28%" align="left" style="vertical-align: top; padding-top: 23px;">
<img src="http://hilpisch.com/tpq_logo_long.png" width="36%" align="right" style="vertical-align: top;">

# Eikon Data API

**Sentiment Scoring for News**

Dr. Yves J. Hilpisch | The Python Quants GmbH

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:training@tpq.io">training@tpq.io</a>

<img src="http://hilpisch.com/images/tr_eikon_02.png" width=350px align=left>

## The Agenda

This tutorial covers **natural language processing (NLP)** based on news from the Eikon Data API:

* Reading News Headlines
* Extracting and Storing Raw Texts
* Sentiment Scoring Examples
* Sentiment Scoring Over Time
* Combining Sentiment with Index Levels

## Imports and Versions

The following imports several **packages** as used in the following.

In [None]:
import numpy as np  # NumPy numerical computing
import pandas as pd  # pandas Data Analysis
import nltk, bs4  # NLP toolkit & BeautyfulSoup
import eikon as ek  # the Eikon Python wrapper package
import cufflinks as cf  # interactive plotting
from bs4 import BeautifulSoup  # HTML parsing
from nltk.sentiment.vader import SentimentIntensityAnalyzer  # sentiment analysis
import configparser as cp

If necessary, download required files for `nltk`.

In [None]:
nltk.download('punkt')
nltk.download('vader_lexicon')

The following **Python and package versions** are used.

In [None]:
import sys
print(sys.version)

In [None]:
np.__version__

In [None]:
pd.__version__

In [None]:
ek.__version__

In [None]:
cf.__version__

In [None]:
nltk.__version__

In [None]:
bs4.__version__

## Connecting to Eikon Data API

This code sets the `app_id` to connect to the **Eikon Data API Proxy** which needs to be running locally. It requires the previously created text file `eikon.cfg` to be in the current working directory.

In [None]:
cfg = cp.ConfigParser()
cfg.read('eikon.cfg')  # adjust for different file location

In [None]:
ek.set_app_key(cfg['eikon']['app_id']) #set_app_id function being deprecated

## Reading News Headlines

The function `ek.get_news_headlines()` allows you to search for and retrieve **news headlines**, including `storyId` values needed to retrieve the full news text.

A `query` string might contain `RICs` and other words to be searched for.

In [None]:
news = ek.get_news_headlines('R:.SPX "TRUMP" Language:LEN',
                         date_from='2018-02-01',
                         date_to='2018-05-28',
                         count=100
                        )

In [None]:
news.info()

In [None]:
news.head()

## Collecting Raw Texts

The analyses that follow are based on **all news stories** as identified above. To this end, the raw texts are collected in a `list` object (or loaded if such a file exists).

In [None]:
import pickle

In [None]:
%%time
try:
    news = pickle.load(open('eikon_news.pkl', 'rb'))
except:
    stories = []
    for i, storyId in enumerate(news['storyId']):
        try:
            html = ek.get_news_story(storyId)
            story = BeautifulSoup(html, 'html5lib').get_text()
            stories.append(story)
        except:
            stories.append('')
    news['story'] = stories
    pickle.dump(news, open('eikon_news.pkl', 'wb'))

## Sentiment Scoring Examples

First, a `SentimentIntensityAnalyzer` object is instantiated.

In [None]:
sid = SentimentIntensityAnalyzer()

The following example illustrates a **negative sentiment** score.

In [None]:
scores = sid.polarity_scores(
    '''This is absolute rubbish. I really didn't like. It was so bad.''')
scores

This one yields a **positive sentiment** score.

In [None]:
scores = sid.polarity_scores(
    '''I really liked it. It was amazing. Real fun.''')
scores

In the same way, a sentiment scoring can be implemented for a news text.

In [None]:
text = news.iloc[0]['story']
text[200:350]

In [None]:
scores = sid.polarity_scores(text)
scores

## Sentiment Over Time

The following code scores all retrieved news texts for sentiments and collects the results in a `DataFrame` object.

In [None]:
sentiment = pd.DataFrame()

In [None]:
for storyId in news['storyId']:
    row = news[news['storyId'] == storyId]
    scores = sid.polarity_scores(row['story'][0])
    sentiment = sentiment.append(pd.DataFrame(scores, index=[row['versionCreated'][0]]))

In [None]:
sentiment.index = pd.DatetimeIndex(sentiment.index)

In [None]:
sentiment.sort_index(inplace=True)

In [None]:
sentiment.head()

Some statistics about the **sentiment scores** for all news texts.

In [None]:
sentiment.describe()

The **frequency distribution** for the compounded sentiment scores.

In [None]:
sentiment['compound'].iplot(kind='histogram', bins=15)

The compounded sentiment scores **over time** for raw values.

In [None]:
sentiment['compound'].iplot(mode='markers')

The cumulative compounded sentiment scores **over time** for raw values.

In [None]:
sentiment['compound'].cumsum().iplot(mode='markers')

The cumulative compounded sentiment scores **over time** with transformed values (to **`+1` and `-1`**).

In [None]:
sentiment['transform'] = sentiment['compound'].apply(lambda x: 1 if x > 0 else -1)

In [None]:
sentiment['transform'].value_counts()

In [None]:
sentiment['transform'].cumsum().iplot(mode='markers')

## Combining Sentiment with Index Levels

First, retrieve **historical closing index levels** for the S&P 500.

In [None]:
data = ek.get_timeseries('.SPX',
                         start_date='2018-02-01',
                         end_date='2018-05-28',
                         interval='daily',
                         fields='CLOSE')

Second, **resample the cumulative compounded sentiment scores** to end-of-day and forward fill the empty rows.

In [None]:
daily = sentiment['compound'].cumsum().resample('D', label='right').last().ffill()

Third, **join** the cumulative compounded sentiment scores with the historical index levels and **plot** the two time series (with different scaling).

In [None]:
data.join(daily).iplot(secondary_y='compound', width=2.5)

## Conclusions

This tutorial covers the following **natural language processing (NLP)** tasks based on the Eikon Data API and respective Python packages:

* Reading News Headlines
* Extracting and Storing Raw Texts
* Sentiment Scoring Examples
* Sentiment Scoring Over Time
* Combining Sentiment with Index Levels

## Eikon Data API Developer Resources

* [Overview](https://developers.thomsonreuters.com/eikon-data-apis) 
* [Quick Start ](https://developers.thomsonreuters.com/eikon-data-apis/quick-start)
* [Documentation](https://developers.thomsonreuters.com/eikon-data-apis/docs)
* [Downloads](https://developers.thomsonreuters.com/eikon-data-apis/downloads)
* [Tutorials](https://developers.thomsonreuters.com/eikon-data-apis/learning)
* [Q&A Forums](https://developers.thomsonreuters.com/eikon-data-apis/qa) 

Data Item Browser Application: Type `DIB` into Eikon Search Bar.

* [Article on Chains](https://developers.thomsonreuters.com/article/simple-chain-objects-ema-part-1)

<img src="http://eikon.tpq.io/refinitiv_logo.png" width="28%" align="left" style="vertical-align: top; padding-top: 23px;">
<img src="http://hilpisch.com/tpq_logo_long.png" width="36%" align="right" style="vertical-align: top;">