# NYTimes Comment Collection

API link: https://developer.nytimes.com/apis

API needs signup and an API key: https://developer.nytimes.com/get-started

Output: let's save comments to csv for now. The volume shoudln't be too big. 

Foreseeable issues: rate limiting. Might need some retry and wait. 

In [1]:
import os
import json
import requests
import time
import tqdm
import urllib.parse
from datetime import datetime

In [3]:
api_key = os.environ.get('NYT_API_KEY') # you can name the environmental variable anything. I called mine NYT_API_KEY

## API Call Functions

In [3]:
def get_most_emailed_articles(days=7):
    # can't use offset
    url = f'https://api.nytimes.com/svc/mostpopular/v2/emailed/{days}.json?api-key={api_key}'
    response = requests.request("GET", url)
    if response.ok:
        print('ok')
        response_json = json.loads(response.content)
        return response_json['results']
    else:
        print(response.status_code, response.content)

In [4]:
def get_most_shared_articles(days=7):
    # can't use offset
    url = f'https://api.nytimes.com/svc/mostpopular/v2/shared/{days}.json?api-key={api_key}'
    response = requests.request("GET", url)
    if response.ok:
        print('ok')
        response_json = json.loads(response.content)
        return response_json['results']
    else:
        print(response.status_code, response.content)

In [10]:
def get_front_page_articles(section='home'):
    """
    section could be home, us, world
    """
    url = f'https://api.nytimes.com/svc/topstories/v2/{section}.json?api-key={api_key}'
    response = requests.request("GET", url)
    if response.ok:
        response_json = json.loads(response.content)
        return response_json['results']
    else:
        print(response.status_code, response.content)

In [19]:
def get_comments_for_article(article_url):
    url = f'https://api.nytimes.com/svc/community/v3/user-content/url.json?api-key={api_key}&offset=0&url={article_url}'
    response = requests.request("GET", url)
    if response.ok:
        response_json = json.loads(response.content)
        comments = []
        for comment in response_json['results']['comments']:
            comment['article_url'] = article_url
            comments.append(comment)
        return comments
    else:
        print(response.status_code, response.content)
        return []
        
def has_comment(article_url):
    comments = get_comments_for_article(article_url)
    return len(comments)>0

## Examples

In [8]:
most_emailed_articles_urls = [article['url'] for article in most_emailed_articles]
most_emailed_articles_urls

['https://www.nytimes.com/2021/12/17/nyregion/bomber-pilot-christmas-trees.html',
 'https://www.nytimes.com/2021/12/17/t-magazine/new-york-best-food-restaurants.html',
 'https://www.nytimes.com/2021/12/11/well/family/rude-child-development-behavior.html',
 'https://www.nytimes.com/article/testing-positive-covid-omicron-variant.html',
 'https://www.nytimes.com/2021/12/14/dining/new-yorks-top-10-new-restaurants-of-2021.html',
 'https://www.nytimes.com/2021/12/20/well/mind/how-to-declutter.html',
 'https://www.nytimes.com/2021/12/15/magazine/grieving-loss-closure.html',
 'https://www.nytimes.com/2021/12/14/science/james-webb-telescope-launch.html',
 'https://www.nytimes.com/2021/12/19/opinion/omicron-breakthroughs.html',
 'https://www.nytimes.com/2021/12/20/us/holocaust-librarian-elementary-school.html',
 'https://www.nytimes.com/2021/12/17/arts/music/classical-music-tommasini.html',
 'https://www.nytimes.com/2021/12/14/well/live/eye-drops-reading-glasses-fda.html',
 'https://www.nytimes.

In [9]:
for url in tqdm.tqdm(most_emailed_articles_urls[:5]):
    comments = get_comments_for_article(url)
    print(len(comments))
    time.sleep(1)

  0%|          | 0/5 [00:00<?, ?it/s]

ok
25


 20%|██        | 1/5 [00:01<00:06,  1.66s/it]

ok
25


 40%|████      | 2/5 [00:03<00:05,  1.71s/it]

ok
25


 60%|██████    | 3/5 [00:04<00:03,  1.65s/it]

ok
25


 80%|████████  | 4/5 [00:06<00:01,  1.59s/it]

ok
25


100%|██████████| 5/5 [00:07<00:00,  1.59s/it]


### Comment data structure

In [11]:
comments[0]

{'commentID': 115955886,
 'status': 'approved',
 'commentSequence': 115955886,
 'userID': 106268319,
 'userDisplayName': 'Peter',
 'userLocation': 'NY',
 'userTitle': 'NULL',
 'userURL': 'NULL',
 'picURL': None,
 'commentTitle': '<br\\//>',
 'commentBody': 'Love Pete’s writing, and recommendations. He’s steered me toward many delicious evenings, and I’m excited to try some of the restaurants on this list. However, if there’s one area I think Pete lacks some credibility in, it’s his judgement of plant based food - as evidenced by the inclusion of Cadence on this list. Being a former meat eater and now a vegetarian, I know how difficult it is to be truly discerning within the realm of the latter, but the heart of great plant based cuisine is no different than any other - fresh ingredients creatively cooked from scratch. Food that you know you can’t make yourself at home (part of the fun of going out, to me). In my two experience at Cadence that was definitely not the case. So it’s disapp

#rate limit error

429 b'{"fault":{"faultstring":"Rate limit quota violation. Quota limit  exceeded. Identifier : 3a2ff40b-ecff-4334-815c-613a6333b1f9","detail":{"errorcode":"policies.ratelimit.QuotaViolation"}}}'


In [95]:
import pandas as pd
pd.DataFrame(comments)

Unnamed: 0,commentID,status,commentSequence,userID,userDisplayName,userLocation,userTitle,userURL,picURL,commentTitle,...,editorsSelection,parentID,parentUserDisplayName,depth,commentType,trusted,recommendedFlag,permID,isAnonymous,article_url
0,114399051,approved,114399051,67222984,Sari,USA,,,,<br\//>,...,False,,,1,comment,0,0,114399051,False,https://www.nytimes.com/article/breakthrough-i...
1,114399007,approved,114399007,21960982,muser,danville ca,,,,<br\//>,...,False,,,1,comment,0,0,114399007,False,https://www.nytimes.com/article/breakthrough-i...
2,114388900,approved,114388900,20118907,DocCDN,Toronto,,,,<br\//>,...,False,,,1,comment,0,0,114388900,False,https://www.nytimes.com/article/breakthrough-i...
3,114399377,approved,114399377,25887268,EveT,New England,,,,<br\//>,...,False,,,1,comment,0,0,114399377,False,https://www.nytimes.com/article/breakthrough-i...
4,114397950,approved,114397950,70250914,Kevin,"Gardner, MA",,,,<br\//>,...,False,,,1,comment,0,0,114397950,False,https://www.nytimes.com/article/breakthrough-i...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,114393468,approved,114393468,71651962,JL,Bucks County PA,,,,<br\//>,...,False,,,1,comment,0,0,114393468,False,https://www.nytimes.com/article/breakthrough-i...
496,114394821,approved,114394821,87678737,Mooz,Liberal Island In Sea Of Red,,,,<br\//>,...,False,,,1,comment,0,0,114394821,False,https://www.nytimes.com/article/breakthrough-i...
497,114394144,approved,114394144,367276,French 🎻 🎻 Violin 🎻,New York 🗽🐜🦅❤️,,,,<br\//>,...,False,,,1,comment,0,0,114394144,False,https://www.nytimes.com/article/breakthrough-i...
498,114396033,approved,114396033,10664125,Max,NYC,,,,<br\//>,...,False,,,1,comment,0,0,114396033,False,https://www.nytimes.com/article/breakthrough-i...
