In [1]:
import os
from datetime import datetime
import pandas as pd

# Parameters

In [2]:
# Parameters dictionary.
pm = {
    'organization': 'Google Inc',
    'start_date_train': datetime(2022,1,1),
    'end_date_train': datetime(2022,11,1),
    'start_date_test': datetime(2022,11,1),
    'end_date_test': datetime(2023,1,1),
    'n_articles': 5,
    'text_columns': [ 'abstract', 'lead_paragraph', 'snippet', 'headline.main', ],
}

⭕ **Possible Improvements:**

* The date range for the market data (dependent variable) could be larger than the date range for the news, since there may be a time lag.
* Test for a number of years.
* Play with what text columns are included or not.
* Test if weighting articles by how much a company is mentioned in the article improves predictions.
* Inspect how the number of articles published changes things.

# NYTimes Data

## Retrieve Data
To access the NYTimes API we will by using the `pynytimes` repository, for which the bibtex citations is:
```
@software{Den_Heijer_pynytimes_2023,
    author = {Den Heijer, Micha},
    license = {MIT},
    title = {{pynytimes}},
    url = {https://github.com/michadenheijer/pynytimes},
    version = {0.10.0},
    year = {2023},
    doi = {10.5281/zenodo.7821090}
}
```

Our API key is stored int the environment variable `NYTIMES_KEY`, which is set in e.g. `~/.bash_profile` or `~/.zshrc`

In [3]:
from pynytimes import NYTAPI

In [4]:
nytapi = NYTAPI( os.environ.get( 'NYTIMES_KEY' ), parse_dates=True )

In [5]:
results = nytapi.article_search(
    query=pm['organization'],
    results=pm['n_articles'],
    dates={ 'begin':pm['start_date_train'], 'end':pm['end_date_train'] }
)

⭕ **Possible Improvement:**

Currently searching with keywords. An advanced option is to use the filter query feature of the NYTimes API, e.g.
```
options={
    'fq': 'organizations:("Google Inc")',
},
```
This requires also filtering on the "rank" of the organization in regards to the article, as found in e.g. `article['keywords']['rank']`. Otherwise we'll get articles tangentially related to the target company.

## Format Data

In [6]:
# Create storage dictionary
nyt_data = {
    'pub_date': [],
}
for column in pm['text_columns']:
    nyt_data[column] = []

In [7]:
# Collect
for i, result in enumerate( results ):
    for column in nyt_data.keys():
        
        # Parse column
        if '.' in column:
            column_keys = column.split( '.' )
            column_val = result[column_keys[0]][column_keys[1]]
        else:
            column_val = result[column]
            
        # Store
        nyt_data[column].append( column_val )

In [8]:
# Turn into a dataframe
nyt = pd.DataFrame( nyt_data )

In [9]:
# Collect the full string
nyt['text'] = ( nyt[pm['text_columns']] + ' ' ).sum( axis=1 )

In [10]:
nyt.head()

Unnamed: 0,pub_date,abstract,lead_paragraph,snippet,headline.main,text
0,2022-10-25 20:37:03+00:00,Google’s parent company reported earnings that...,"Even Alphabet, the parent company of Google an...",Google’s parent company reported earnings that...,Alphabet’s Profit Drops 27 Percent From a Year...,Google’s parent company reported earnings that...
1,2022-10-26 22:47:44+00:00,A series of quarterly earnings reports is show...,Google this week reported a steep decline in p...,A series of quarterly earnings reports is show...,Tech’s Biggest Companies Are Sending Worrying ...,A series of quarterly earnings reports is show...
2,2022-10-20 15:05:58+00:00,"Ken Paxton, the state attorney general, said p...",The Texas attorney general filed a privacy law...,"Ken Paxton, the state attorney general, said p...",Texas Sues Google for Collecting Biometric Dat...,"Ken Paxton, the state attorney general, said p..."
3,2022-10-28 12:06:38+00:00,The social network’s new owner has just a few ...,Elon Musk closes his purchase of Twitter and f...,The social network’s new owner has just a few ...,Elon Musk Faces Another Big Decision at Twitter,The social network’s new owner has just a few ...
4,2022-10-25 18:24:56+00:00,Apple has rejected Spotify’s new app three tim...,"Daniel Ek, the chief executive of Spotify, wan...",Apple has rejected Spotify’s new app three tim...,Spotify Wants to Get Into Audiobooks but Says ...,Apple has rejected Spotify’s new app three tim...
