# Sentiment analysis of NYT articles

## Using the NYT Article API and the Alchemy API to analyse sentiment targeted towards Apple in 2014

### Gathering articles

Articles can be accessed via the NYT API: http://developer.nytimes.com/docs/read/article_search_api_v2

I've used the python wrapper available here: https://pypi.python.org/pypi/nytimesarticle/0.1.0

In [1]:
from nytimesarticle import articleAPI
api = articleAPI('d356a6a8d85bb062faac0b4050b02f34:0:71775226') #Insert your API key
articles = api.search( q = 'Apple' , fq = {'organizations.contains':'Apple Inc'}, begin_date = '20140101', end_date = '20150101') 
#print articles

The above request only gives you the first page of articles (they currently limit to 10 per page). So lets see how many hits we have and how many pages.

In [2]:
hits = articles['response']['meta']['hits']
print hits

435


In [3]:
import math
def roundup(x):
    return int(math.ceil(x / 10.0)) * 10
hits_num = roundup(hits)/10
print hits_num

44


Here we're repeating the request and storing it page by page. There seems to be a 1000 article limit overall.

In [4]:
from time import sleep
articles_all = []

for x in range(hits_num):
    articles = api.search( q = 'Apple' , fq = {'organizations.contains':['Apple Inc']}, begin_date = '20140101', end_date = '20150101', page = x) 
    sleep(2)
    try:
        articles['response']
    except:
        break
    else:
        articles_all.append(articles)

In [5]:
responses = [d['response']['docs'] for d in articles_all]

In [6]:
results = [item for sublist in responses for item in sublist]

Let's have a quick look at the headlines.

In [7]:
for r in results:
    print r['headline']['main']

Apple’s Cook Makes Case for Equal Rights
Power Outage
The Apple Chronicles
App Smart | Must-Haves for iOS 8
With iPhone 6 and Smartwatch, Apple Is Back and Better Than Ever
Apple iPhone Sales Expected to Break Records
Apple's Tim Cook Talks of Retail Expansion in China
New Apple Tool Checks iPhones for 'Kill Switch' Security
Apple Versus Cops
Apple's Midlife Crisis
Investors and Customers Yearn for an Apple iThingamajig
Apple’s iCloud Storage Service Is Aim of Attack in China
Apple Pulls iOS 8 Software Update After iPhone Problems 
The iPhone 6: Is Bigger Better?
The Digital Wallet Revolution
Tech Shares Lead Nasdaq Lower in Quiet Day of Trading 
Tim Cook, Making Apple His Own
Analysts Share High Expectations for Bigger iPhones
Daily Report: Apple's Unsplashy Acquisitions Point to Future Plans
Lawyers in iPod Trial Await Jury Decision
Apple iPod Case: Steve Jobs Deposition
Apple Releases Web Tool for iPhone Switchers
Tim Cook of Apple: Being Gay in Corporate America 
Daily Report: The 

In [8]:
import nltk, unicodedata
from nltk.tokenize import RegexpTokenizer
toker = RegexpTokenizer(r'((?<=[^\w\s])\w(?=[^\w\s])|(\W))+', gaps=True)

In [9]:
import pandas as pd
results_list = []
for a in results:
    headline, snippet, lead_para, abstract = [], [],[],[]
    headline = unicodedata.normalize('NFKD', a['headline']['main']).encode('ascii','ignore')
    if a['snippet'] != None:
        snippet = unicodedata.normalize('NFKD', a['snippet']).encode('ascii','ignore')
    if a['lead_paragraph'] != None:
        lead_para = unicodedata.normalize('NFKD', a['lead_paragraph']).encode('ascii','ignore')
    if a['abstract'] != None:
        abstract = unicodedata.normalize('NFKD', a['abstract']).encode('ascii','ignore')
    results_list.append({'headline': headline, 'snippet': snippet, 'lead_paragraph': lead_para, 'abstract':abstract, 
    'pub_date': a['pub_date'],
    'web_url': a['web_url'],
    'keywords': a['keywords'],
    'section_name': a['section_name'],  
    '_id': a['_id']})

df = pd.DataFrame(results_list)
df = df.set_index('_id')
df

Unnamed: 0_level_0,abstract,headline,keywords,lead_paragraph,pub_date,section_name,snippet,web_url
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
5452a2fa38f0d81b603ecd2e,[],Apples Cook Makes Case for Equal Rights,"[{u'rank': u'1', u'is_major': u'N', u'value': ...","In a speech on Oct. 27, Timothy D. Cook, Apple...",2014-10-30T16:42:11Z,U.S.,"In a speech on Oct. 27, Timothy D. Cook, Apple...",http://www.nytimes.com/video/us/10000000320699...
5362b46a38f0d84d9e2711a6,Brad Stone reviews book Haunted Empire: Apple ...,Power Outage,"[{u'value': u'Jobs, Steven P', u'is_major': u'...",A journalist looks at the challenges facing Ap...,2014-05-04T00:00:00Z,Books,A journalist looks at the challenges facing Ap...,http://www.nytimes.com/2014/05/04/books/review...
534892e738f0d85faac97cfa,Joe Nocera Op-Ed column asserts that Apple's a...,The Apple Chronicles,"[{u'rank': u'1', u'is_major': u'Y', u'value': ...","These days, the tech industry is battling over...",2014-04-12T00:00:00Z,Opinion,"These days, the tech industry is battling over...",http://www.nytimes.com/2014/04/12/opinion/noce...
5422f9af38f0d84b4e7e9d05,[],App Smart | Must-Haves for iOS 8,"[{u'value': u'iOS (Operating System)', u'is_ma...",Kit Eaton explores three apps that show off wh...,2014-09-24T13:04:19Z,Technology,Kit Eaton explores three apps that show off wh...,http://www.nytimes.com/video/technology/person...
540fa32838f0d87641c67fe6,Farhad Manjoo State of the Art column; Apple u...,"With iPhone 6 and Smartwatch, Apple Is Back an...","[{u'rank': u'1', u'is_major': u'Y', u'value': ...",Any question about how well Tim Cook is managi...,2014-09-10T00:00:00Z,Technology,Any question about how well Tim Cook is managi...,http://www.nytimes.com/2014/09/10/technology/p...
52df0f7838f0d8031784098b,Apple is expected to announce record iPhone sa...,Apple iPhone Sales Expected to Break Records,"[{u'value': u'Apple Inc', u'name': u'organizat...",[],2014-01-21T19:19:16Z,Technology,Apple is expected to announce record iPhone sa...,http://bits.blogs.nytimes.com/2014/01/21/apple...
5449562238f0d875ddacb622,"For Apple, greater China has been one of its f...",Apple's Tim Cook Talks of Retail Expansion in ...,"[{u'rank': u'1', u'name': u'persons', u'value'...",[],2014-10-23T15:22:40Z,Technology,"For Apple, greater China has been one of its f...",http://bits.blogs.nytimes.com/2014/10/23/apple...
542dc07638f0d87d7534cf49,A new law will soon require every smartphone s...,New Apple Tool Checks iPhones for 'Kill Switch...,"[{u'value': u'Apple Inc', u'name': u'organizat...",[],2014-10-02T16:09:57Z,Technology,A new law will soon require every smartphone s...,http://bits.blogs.nytimes.com/2014/10/02/apple...
541c3a4b38f0d8296cb10bab,The company wont provide information such as p...,Apple Versus Cops,"[{u'value': u'Apple Inc', u'name': u'organizat...",[],2014-09-19T10:13:23Z,Opinion,The company wont provide information such as p...,http://takingnote.blogs.nytimes.com/2014/09/19...
536d11df38f0d852c67d492a,"As it approaches its 40th year, Apple is displ...",Apple's Midlife Crisis,"[{u'value': u'Cook, Timothy D', u'name': u'per...",[],2014-05-09T13:34:12Z,Business Day,"As it approaches its 40th year, Apple is displ...",http://dealbook.nytimes.com/2014/05/09/apples-...


Let's check which sections they come from, and remove any that might not be relevant to our interests.

In [10]:
df['section_name'].unique()

array([u'U.S.', u'Books', u'Opinion', u'Technology', u'Business Day',
       u'The Upshot', u'Travel', u'Sunday Review', None, u'false',
       u'Your Money', u'World', u'Fashion & Style', u'Style',
       u'Home & Garden', u'Multimedia/Photos', u'Automobiles', u'Arts',
       u'Real Estate'], dtype=object)

In [11]:
sec = 'Multimedia/Photos'

df = df[df.section_name != sec]

Add all the relevant text fields into one column.

In [12]:
df['all_text'] = df['headline'].map(str) + ' ' + df['abstract'].map(str) + ' ' + df['lead_paragraph'].map(str) 
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0_level_0,abstract,headline,keywords,lead_paragraph,pub_date,section_name,snippet,web_url,all_text
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
5452a2fa38f0d81b603ecd2e,[],Apples Cook Makes Case for Equal Rights,"[{u'rank': u'1', u'is_major': u'N', u'value': ...","In a speech on Oct. 27, Timothy D. Cook, Apple...",2014-10-30T16:42:11Z,U.S.,"In a speech on Oct. 27, Timothy D. Cook, Apple...",http://www.nytimes.com/video/us/10000000320699...,Apples Cook Makes Case for Equal Rights [] In ...
5362b46a38f0d84d9e2711a6,Brad Stone reviews book Haunted Empire: Apple ...,Power Outage,"[{u'value': u'Jobs, Steven P', u'is_major': u'...",A journalist looks at the challenges facing Ap...,2014-05-04T00:00:00Z,Books,A journalist looks at the challenges facing Ap...,http://www.nytimes.com/2014/05/04/books/review...,Power Outage Brad Stone reviews book Haunted E...
534892e738f0d85faac97cfa,Joe Nocera Op-Ed column asserts that Apple's a...,The Apple Chronicles,"[{u'rank': u'1', u'is_major': u'Y', u'value': ...","These days, the tech industry is battling over...",2014-04-12T00:00:00Z,Opinion,"These days, the tech industry is battling over...",http://www.nytimes.com/2014/04/12/opinion/noce...,The Apple Chronicles Joe Nocera Op-Ed column a...
5422f9af38f0d84b4e7e9d05,[],App Smart | Must-Haves for iOS 8,"[{u'value': u'iOS (Operating System)', u'is_ma...",Kit Eaton explores three apps that show off wh...,2014-09-24T13:04:19Z,Technology,Kit Eaton explores three apps that show off wh...,http://www.nytimes.com/video/technology/person...,App Smart | Must-Haves for iOS 8 [] Kit Eaton ...
540fa32838f0d87641c67fe6,Farhad Manjoo State of the Art column; Apple u...,"With iPhone 6 and Smartwatch, Apple Is Back an...","[{u'rank': u'1', u'is_major': u'Y', u'value': ...",Any question about how well Tim Cook is managi...,2014-09-10T00:00:00Z,Technology,Any question about how well Tim Cook is managi...,http://www.nytimes.com/2014/09/10/technology/p...,"With iPhone 6 and Smartwatch, Apple Is Back an..."
52df0f7838f0d8031784098b,Apple is expected to announce record iPhone sa...,Apple iPhone Sales Expected to Break Records,"[{u'value': u'Apple Inc', u'name': u'organizat...",[],2014-01-21T19:19:16Z,Technology,Apple is expected to announce record iPhone sa...,http://bits.blogs.nytimes.com/2014/01/21/apple...,Apple iPhone Sales Expected to Break Records A...
5449562238f0d875ddacb622,"For Apple, greater China has been one of its f...",Apple's Tim Cook Talks of Retail Expansion in ...,"[{u'rank': u'1', u'name': u'persons', u'value'...",[],2014-10-23T15:22:40Z,Technology,"For Apple, greater China has been one of its f...",http://bits.blogs.nytimes.com/2014/10/23/apple...,Apple's Tim Cook Talks of Retail Expansion in ...
542dc07638f0d87d7534cf49,A new law will soon require every smartphone s...,New Apple Tool Checks iPhones for 'Kill Switch...,"[{u'value': u'Apple Inc', u'name': u'organizat...",[],2014-10-02T16:09:57Z,Technology,A new law will soon require every smartphone s...,http://bits.blogs.nytimes.com/2014/10/02/apple...,New Apple Tool Checks iPhones for 'Kill Switch...
541c3a4b38f0d8296cb10bab,The company wont provide information such as p...,Apple Versus Cops,"[{u'value': u'Apple Inc', u'name': u'organizat...",[],2014-09-19T10:13:23Z,Opinion,The company wont provide information such as p...,http://takingnote.blogs.nytimes.com/2014/09/19...,Apple Versus Cops The company wont provide inf...
536d11df38f0d852c67d492a,"As it approaches its 40th year, Apple is displ...",Apple's Midlife Crisis,"[{u'value': u'Cook, Timothy D', u'name': u'per...",[],2014-05-09T13:34:12Z,Business Day,"As it approaches its 40th year, Apple is displ...",http://dealbook.nytimes.com/2014/05/09/apples-...,Apple's Midlife Crisis As it approaches its 40...


Remove duplicate entries

In [23]:
df2 = df
df2["index"] = df2.index
df2.drop_duplicates(cols='index', take_last=True, inplace=True)
del df2["index"]
df2 = df2.sort(['pub_date'])
#df2
#df2.count()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from IPython.kernel.zmq import kernelapp as app


### Sentiment analysis

The Alchemy API (http://www.alchemyapi.com/) can be accessed using their python SDK avaliable here: https://github.com/AlchemyAPI/alchemyapi_python

In [14]:
from alchemyapi import AlchemyAPI
alchemyapi = AlchemyAPI()

Here we're looking for sentiment targeted towards 'Apple' in our article text.

In [15]:
senti_all = []
for index, row in df2.iterrows():
    if 'Apple' in toker.tokenize(row['all_text']):
        response = alchemyapi.sentiment_targeted("text", row['all_text'], "Apple")
        response['id'] = index
        senti_all.append(response)
    else:
        senti_all.append({'id': index, 'status': 'None found'})

Append results to dataframe, and change formats.

In [20]:
senti_df = pd.DataFrame(senti_all).set_index('id')

In [17]:
df2 = pd.concat([df2, senti_df], axis=1)
df2 = df2.dropna(subset = ['docSentiment'])
for index, row in df2.iterrows():
     if row['docSentiment']['type'] == 'neutral':
        row['docSentiment']['score'] = '0'
df2['pub_date'] = pd.to_datetime(df2['pub_date'])

KeyError: ['docSentiment']

Add column with sentiment results.

In [None]:
docs = []
for index, row in df2.iterrows():
    docs.append({'id': index, 'score' : row['docSentiment']['score'], 'type': row['docSentiment']['type']})
docs_df = pd.DataFrame(docs).set_index('id')
df2 = pd.concat([df2, docs_df], axis=1)
df2

Save results to tsv file.

In [None]:
with open('nyt_apple_sentiment.tsv' , 'w') as f:
    f.write('index'+ '\t' +'date' + '\t' + 'score'+ '\t' + 'headline' + '\n')
    for index, row in df2.iterrows():
        f.write(index+ '\t' +str(row['pub_date']) + '\t' + row['score']+ '\t' + row['headline'] + '\n')