# Financial News API

This is a tutorial notebook for accessing text content related to financial stocks. In particular, it acts as a client example to connect to a news search engine instance, hosting the news headline dataset found here:
 * https://www.kaggle.com/miguelaenlle/massive-stock-news-analysis-db-for-nlpbacktests
 

## Terrier Search Client

For the hackathon, we have hosted a search index backed by the open source Terrier search engine (http://terrier.org) containing the documents in the above dataset. To connect to this search back-end in Python, we need a libruary called PyTerrier. You can find documentation for PyTerrier here:
 * https://pyterrier.readthedocs.io/en/latest/
 
If you are running this notebook on the javiersanzcruza/pyterrier-jupyter docker container image, then you are good-to-go, as PyTerrier is already installed. otherwise, run the command below:

In [None]:
# Install PyTerrier
!pip install python-terrier

Next we need to import and initalize PyTerrier, this checks that all the components of the libruary are available and functioning:

In [None]:
import pyterrier as pt
import requests
pt.init()

Since we will be using a remote pre-built index, we need to tell PyTerrier where to find the access end-point for that index, we do this with a search index reference, or IndexRef for short. In this case, 'http://superfp7.terrier.org/newspt/' points at where the search index is currently hosted, if using a different index, you would instead point it at a different location.

In [None]:
indexref = pt.IndexRef.of("http://superfp7.terrier.org/newspt/")

Now that PyTerrier knows where the index is, we can issue queries to it. You can find out more about retrieval here:
 * https://pyterrier.readthedocs.io/en/latest/terrier-retrieval.html
 
However, in short, we first create a 'Retriever' object, which you can think as a delivery service that we are going to send our queries with. This retriever has an target index address which is set by our IndexRef, and can contain some additional instructions to be looked at by the remote index when executing our queries. In this case, we are setting a weighting model (wmodel) that specifies how documents are scored (DPH), and setting the number of top results to return to 20. 

In [None]:
retriever = pt.BatchRetrieve(indexref, wmodel="DPH", num_results = 20)

Once this is done, we can then use the search method on the Retriever to issue a query, this will return a Pandas Dataframe with \[‘qid’, ‘docno’, ‘rank’, ‘score’\] columns. qid is the id of the query (used if we send multiple queries at once), docno is the id of a retrieved document, rank is the rank of that document in the results, and score is the weighting model score for that document.

In [72]:
results = retriever.search("ipad")
results

Unnamed: 0,qid,docid,docno,rank,score,query
0,1,0,1615315,0,5.734333,ipad
1,1,0,1076704,1,5.70137,ipad
2,1,0,8415,2,5.656921,ipad
3,1,0,991459,3,5.656921,ipad
4,1,0,1394454,4,5.656921,ipad
5,1,0,1840194,5,5.656921,ipad
6,1,0,210249,6,5.597101,ipad
7,1,0,947895,7,5.597101,ipad
8,1,0,1440495,8,5.597101,ipad
9,1,0,1514808,9,5.597101,ipad


## Getting the News Article Data

As we can see, we get a ranked list of document identifiers back for the specified query, however, in most cases what we want are the contents of the documents, not just the id's of those documents, so we also need to get those. for efficiency reasons, we keep the document contents separate from the search index, so we need to request the content for a document given its id from a separate service.

In particular, we have a key-value store set up that allows requests for the headline for a document with a given id:

In [73]:
x = requests.get('http://superfp7.terrier.org/newsapi/headline/1615315')
x.text

"Dow 30 Stock Roundup: Apple's New iPad, Japanese Label Expansion for Merck/ Eisai's Lenvima"

We can simply iterate over the results dataframe, retrieve each headline in turn, and then append the results to the dataframe:

In [74]:
headlines = []
for index, row in results.iterrows():
    headline = requests.get('http://superfp7.terrier.org/newsapi/headline/'+row['docno']).text
    headlines.append(headline)
    
results.insert(5, "Headline", headlines, True)
results

Unnamed: 0,qid,docid,docno,rank,score,Headline,query
0,1,0,1615315,0,5.734333,"Dow 30 Stock Roundup: Apple's New iPad, Japane...",ipad
1,1,0,1076704,1,5.70137,Amazon (AMZN) Kindle Fire Leaps To No. 2 Behin...,ipad
2,1,0,8415,2,5.656921,"Apple Now a Strong Buy, New iPhones & iPad Key...",ipad
3,1,0,991459,3,5.656921,"Apple Now a Strong Buy, New iPhones & iPad Key...",ipad
4,1,0,1394454,4,5.656921,"Apple Now a Strong Buy, New iPhones & iPad Key...",ipad
5,1,0,1840194,5,5.656921,ZAGG Unveils Slim Book Wireless Keyboard and C...,ipad
6,1,0,210249,6,5.597101,BMO Launches iPad App with Integrated Access t...,ipad
7,1,0,947895,7,5.597101,"Top 7 Stocks Apple iPad May Help, Chips To Tes...",ipad
8,1,0,1440495,8,5.597101,3D Systems' New iSense to Enhance iPad Photogr...,ipad
9,1,0,1514808,9,5.597101,3D Systems' New iSense to Enhance iPad Photogr...,ipad


We can similarly ask for other information about the news article from the same end-point:
 * headline/docno : Given the numerical identifier of a news item, returns its headline.
 * url/docno: Given the numerical identifier of a news item, returns its URL.
 * date/docno : Given the numerical identifier of a news item, returns its publication date.
 * stock/docno : Given the numerical identifier of a news item, returns its ticker.
 * news/docno : Given the numerical identifier of a news item, returns its information (headline, url, date and ticker) in JSON format.
 
 
For example, if we wanted the predicted stock ticker added for each document, we could run:

In [75]:
stocktickers = []
for index, row in results.iterrows():
    ticker = requests.get('http://superfp7.terrier.org/newsapi/stock/'+row['docno']).text
    stocktickers.append(ticker)
    
results.insert(6, "Ticker", stocktickers, True)
results

Unnamed: 0,qid,docid,docno,rank,score,Headline,Ticker,query
0,1,0,1615315,0,5.734333,"Dow 30 Stock Roundup: Apple's New iPad, Japane...",TOT,ipad
1,1,0,1076704,1,5.70137,Amazon (AMZN) Kindle Fire Leaps To No. 2 Behin...,MMI,ipad
2,1,0,8415,2,5.656921,"Apple Now a Strong Buy, New iPhones & iPad Key...",ABTL,ipad
3,1,0,991459,3,5.656921,"Apple Now a Strong Buy, New iPhones & iPad Key...",LIQD,ipad
4,1,0,1394454,4,5.656921,"Apple Now a Strong Buy, New iPhones & iPad Key...",RMBS,ipad
5,1,0,1840194,5,5.656921,ZAGG Unveils Slim Book Wireless Keyboard and C...,ZAGG,ipad
6,1,0,210249,6,5.597101,BMO Launches iPad App with Integrated Access t...,BMO,ipad
7,1,0,947895,7,5.597101,"Top 7 Stocks Apple iPad May Help, Chips To Tes...",KLAC,ipad
8,1,0,1440495,8,5.597101,3D Systems' New iSense to Enhance iPad Photogr...,SCSC,ipad
9,1,0,1514808,9,5.597101,3D Systems' New iSense to Enhance iPad Photogr...,SPLS,ipad


Where possible, we have also collected meta-data for each company ticker:

In [76]:
import json

tickerdata = json.loads(requests.get('http://superfp7.terrier.org/newsapi/ticker/AAPL').text)
print(json.dumps(tickerdata, indent=4, sort_keys=True))

{
    "52WeekChange": {
        "10": null
    },
    "SandP52WeekChange": {
        "10": null
    },
    "address1": {
        "10": "One Apple Park Way"
    },
    "address2": {
        "10": null
    },
    "algorithm": {
        "10": null
    },
    "annualHoldingsTurnover": {
        "10": null
    },
    "annualReportExpenseRatio": {
        "10": null
    },
    "ask": {
        "10": 157.87
    },
    "askSize": {
        "10": 1100.0
    },
    "averageDailyVolume10Day": {
        "10": 74388970.0
    },
    "averageDailyVolume3Month": {
        "10": null
    },
    "averageVolume": {
        "10": 76245489.0
    },
    "averageVolume10days": {
        "10": 74388970.0
    },
    "beta": {
        "10": 1.202797
    },
    "beta3Year": {
        "10": null
    },
    "bid": {
        "10": 157.85
    },
    "bidSize": {
        "10": 1000.0
    },
    "bondHoldings": {
        "10": null
    },
    "bondPosition": {
        "10": null
    },
    "bondRatings": {
        "10

We could similarly grab data fro this to attach to our search results:

In [77]:
tickerdatas = []
for index, row in results.iterrows():
    ticker = requests.get('http://superfp7.terrier.org/newsapi/ticker/'+row['Ticker']).text
    tickerJSON = json.loads(ticker)
    tickerdatas.append(tickerJSON['longName'])
    
results.insert(7, "Company Name", tickerdatas, True)
results

Unnamed: 0,qid,docid,docno,rank,score,Headline,Ticker,Company Name,query
0,1,0,1615315,0,5.734333,"Dow 30 Stock Roundup: Apple's New iPad, Japane...",TOT,{'3641': None},ipad
1,1,0,1076704,1,5.70137,Amazon (AMZN) Kindle Fire Leaps To No. 2 Behin...,MMI,"{'2374': 'Marcus & Millichap, Inc.'}",ipad
2,1,0,8415,2,5.656921,"Apple Now a Strong Buy, New iPhones & iPad Key...",ABTL,{},ipad
3,1,0,991459,3,5.656921,"Apple Now a Strong Buy, New iPhones & iPad Key...",LIQD,{},ipad
4,1,0,1394454,4,5.656921,"Apple Now a Strong Buy, New iPhones & iPad Key...",RMBS,{'3176': 'Rambus Inc.'},ipad
5,1,0,1840194,5,5.656921,ZAGG Unveils Slim Book Wireless Keyboard and C...,ZAGG,{'4050': None},ipad
6,1,0,210249,6,5.597101,BMO Launches iPad App with Integrated Access t...,BMO,{'499': 'Bank of Montreal'},ipad
7,1,0,947895,7,5.597101,"Top 7 Stocks Apple iPad May Help, Chips To Tes...",KLAC,{'2085': 'KLA Corporation'},ipad
8,1,0,1440495,8,5.597101,3D Systems' New iSense to Enhance iPad Photogr...,SCSC,"{'3280': 'ScanSource, Inc.'}",ipad
9,1,0,1514808,9,5.597101,3D Systems' New iSense to Enhance iPad Photogr...,SPLS,{},ipad
