# Data gathering, problem statement, stakeholders, KPIs

## Data gathering

For each headline listed on [fivethirtyeight/politics/features](https://fivethirtyeight.com/politics/features/), at the top, then under "Latest Politics", we store the type of post, its title, its url, the author(s), the date and time posted, a list of the article's tags, according to 538, and the number of comments.  We scrape all headlines from the features pages, except for the live-blogs, which don't have any comments.  The hardest part to scrape is the number of comments, since 538 uses the Facebook comments plugin.  First we import the necessary python modules.

In [1]:
# Get the html
import requests

# Parse the html
from bs4 import BeautifulSoup

# Render JavaScript to scrape the comments
from selenium import webdriver
from selenium.webdriver.common.by import By

# Delays and time for execution of the code
import time # for debugging

# Get the date and time
from datetime import datetime

# For splitting using more than one delimiter
import re

# Makes a csv file quickly
import pandas as pd



Since scraping the comments is the hardest part, we write a function that will do it.  It only works for posts from [fivethirtyeight.com/features](https://fivethirtyeight.com/features).  The function takes some time each time it's run, so there are debugging commands to track its progress.  Any line in the code with the comment "for debugging" can be commented out.

In [2]:
# Input is the url of one of the features posts on fivethirtyeight.com/politics/features pages.
# Output is the number of comments on the post.
def num_comments_538_post(url):
    # Start the timer to time the execution of each iteration of this function
    start = time.time() # for debugging
    # Function only works when the input is a features article from fivethirtyeight.com
    print("Comments scraping current url:", url) # for debugging
    # Create a webdriver object with selenium that will get the required html    
    # Here Chrome will be used, but modifications to the code for other browsers exist
    driver = webdriver.Chrome()
    # Open the 538 webpage after 10 seconds
    time.sleep(10)
    driver.get(url)
    # Click the expand comments button
    driver.find_element(By.CLASS_NAME, "fte-expandable-icon").click()
    # Execute the JavaScript after clicking the button
    article_html = driver.execute_script("return document.documentElement.outerHTML;")
    # Close the 538 webpage
    driver.quit()
    # Parse the html
    article_soup = BeautifulSoup(article_html, "lxml")
    # Find the iframe corresponding to the comments
    comments_frame = article_soup.find('iframe', attrs = {'data-testid':"fb:comments Facebook Social Plugin"})
    # Get the source attribute in the iframe 
    comments_url = comments_frame['src']
    # Redefine the webdriver object (needed to avoid errors)
    driver = webdriver.Chrome()
    # Open the Facebook comments plugin url
    driver.get(comments_url)
    # Execute the JavaScript on that page
    comments_html = driver.execute_script("return document.documentElement.outerHTML;")
    # Close the comments page
    #driver.quit()
    # Parse the rendered code
    comments_soup = BeautifulSoup(comments_html,"lxml")
    # Find the element that contains the number of comments
    number = comments_soup.find('span',  attrs = {'class':"_50f7"}).text.strip(" comments")
    print("The number of comments is "+str(number)+".") # for debugging
    # End the timer
    end = time.time() # for debugging
    print("Time elapsed:", end-start, "seconds\n") # for debugging 
    return number

Now we extract the desired data from each headline under "Latest Politics", including the main article, on the [538 features](https://www.fivethirtyeight.com/politics/features) page(s).  In the following code, the authors and tags are originally stored as lists.  However, when we convert all the data into a data frame later, we will need the data to have the right shape -- we need it to be a list of lists, with no additional nested lists.  For authors and tags we turn the list into a string where the items are separated by semicolons instead of commas.  This will make it possible to create a `.csv` file with the data.

In [3]:
# Get the date and time to put in the name of the output file
now = datetime.now()

# Set timer for full execution
start_full = time.time() # for debugging

# How many pages of features to extract data from
features_num_pages = 110 #input("How many features pages to scrape?  Each has about 10 posts.  ")
#print("This code will scrape data from", features_num_pages, "page(s) worth of posts in 538's politics/features section.\n") # for debugging

# Here is where all the data will go
posts = []
# Get the data for each post
for i in range(features_num_pages): 
    print("\nPage "+str(i+1)+"...\n") # for debugging
    # Get the html for each headline
    features_url = "https://fivethirtyeight.com/politics/features/page/"+str(i+1)
    features_html = requests.get(features_url)
    # Parse the html
    features_soup = BeautifulSoup(features_html.content)
    # Gather the data for each of articles
    features = features_soup.find_all('h2', attrs = {'class':["article-title entry-title", "title entry-title"]})
    for post in features:
        # Get post title from the features page
        title = post.a.text.strip('\n''\t')
        # Get post url from the features page
        url = post.find('a').get('href')
        # Screen for live blogs, which don't have comments
        if "live-blog" in url:
            continue
        # Go to the url to get more data
        post_code = requests.get(url)
        post_soup = BeautifulSoup(post_code.content)
        # Get author(s)
        author_bios = post_soup.find_all('div', attrs = {'class':"mini-bio"})
        if author_bios == []:
            authors = "None/All"
        else:    
            authors_list = []
            for author in author_bios:
                # Extract the author name
                to_extract = author.p.text
                to_extract_list = re.split(" is | reports", to_extract)
                authors_list.append(to_extract_list[0])
            authors = str(authors_list).replace(",", ";").strip("[" "]").replace("\'", "")    
        # Get date and time of post
        date = post_soup.find('time').text.strip('\n''\t')
        # Get tags
        tags_list = []
        for tag in post_soup.find_all('a', attrs = {'class':"tag"}):
            tags_list.append(tag.text.split(" (")[0])
        tags = str(tags_list).replace(",", ";").strip("[" "]").replace("\'", "")      
        # Use the tags to get the post type
        if "Politics Podcast" in tags:
            post_type = "podcast"
        else:    
            post_type = post.find('a').get('data-content-type') 
        if post_type == None:
            post_type = "feature"
        # Change the name "feature" to "article"    
        if post_type == "feature":
            post_type = "article"
        # Get number of comments
        num_comments = num_comments_538_post(url)    
        # Add all attributes to list
        posts.append([post_type, title, url, authors, date, tags, num_comments])
    if len(posts) == 1000:
        break

# End the timer for the full execution
end_full = time.time() # for debugging

# Compute time elapsed in seconds
total_time_seconds = end_full-start_full # for debugging
# In minutes 
total_time_minutes = total_time_seconds/60 # for debugging
if total_time_minutes < 60: # for debugging
    print("Total time elapsed =", total_time_minutes, "minutes") # for debugging
else: # for debugging
    # In hours
    total_time_hours = total_time_minutes/60 # for debugging
    # Print the time elapsed in hours
    print("Total time elapsed =", total_time_hours, "hours") # for debugging

# The data
print("Number of posts scraped:", len(posts)) # for debugging
#posts # for debugging


Page 1...

Comments scraping current url: https://fivethirtyeight.com/features/which-republican-candidate-should-biden-be-most-afraid-of/
The number of comments is 27.
Time elapsed: 20.897101879119873 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/partisan-gerrymandering-is-legal-again-in-north-carolina/
The number of comments is 6.
Time elapsed: 20.4435293674469 seconds

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-where-biden-stands-heading-into-2024/
The number of comments is 4.
Time elapsed: 20.892934560775757 seconds

Comments scraping current url: https://fivethirtyeight.com/features/the-real-reason-presidential-candidates-form-exploratory-committees/
The number of comments is 6.
Time elapsed: 21.110576629638672 seconds

Comments scraping current url: https://fivethirtyeight.com/features/asa-hutchinson-2024-republican-presidential-announcement/
The number of comments is 21.
Time elapsed: 21.09902548789978 second

The number of comments is 14.
Time elapsed: 20.97903299331665 seconds

Comments scraping current url: https://fivethirtyeight.com/features/2024-republican-reaction-trump-indictment/
The number of comments is 8.
Time elapsed: 20.762823820114136 seconds

Comments scraping current url: https://fivethirtyeight.com/features/trump-indictment-charges-chat/
The number of comments is 16.
Time elapsed: 21.71370530128479 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/progressives-won-in-the-midwest-while-trump-was-arrested-in-manhattan/
The number of comments is 10.
Time elapsed: 21.357578992843628 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/what-nyc-protesters-think-of-trumps-arrest/
The number of comments is 6.
Time elapsed: 20.591123580932617 seconds

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-the-takeaways-from-2023s-super-tuesday/
The number of comments is 6.
Time elapsed: 20.663459300994873 

Comments scraping current url: https://fivethirtyeight.com/features/wisconsin-state-supreme-court-spending/
The number of comments is 6.
Time elapsed: 21.349160194396973 seconds

Comments scraping current url: https://fivethirtyeight.com/features/dads-caucus-parenting-survey/
The number of comments is 7.
Time elapsed: 21.975295782089233 seconds

Comments scraping current url: https://fivethirtyeight.com/features/ron-desantis-is-doubling-down-on-his-education-crusade-will-it-work-with-gop-voters-in-2024/
The number of comments is 16.
Time elapsed: 20.724247932434082 seconds


Page 10...

Comments scraping current url: https://fivethirtyeight.com/videos/republican-politicians-are-struggling-to-define-drag/
The number of comments is 15.
Time elapsed: 22.21114754676819 seconds

Comments scraping current url: https://fivethirtyeight.com/features/republicans-lawmakers-are-trying-to-ban-drag-first-they-have-to-define-it/
The number of comments is 8.
Time elapsed: 20.88834047317505 seconds

Co

Comments scraping current url: https://fivethirtyeight.com/features/which-republicans-could-vote-for-a-debt-ceiling-increase/
The number of comments is 12.
Time elapsed: 21.336053371429443 seconds

Comments scraping current url: https://fivethirtyeight.com/features/bipartisan-tiktok-bans-state-legislatures/
The number of comments is 3.
Time elapsed: 21.091973543167114 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/kyrsten-sinemas-odds-of-reelection-dont-look-great/
The number of comments is 5.
Time elapsed: 21.1814181804657 seconds

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-some-republicans-are-souring-on-aid-to-ukraine/
The number of comments is 5.
Time elapsed: 20.52989435195923 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/why-democrats-are-worried-about-2024-senate-elections/
The number of comments is 11.
Time elapsed: 21.233766555786133 seconds

Comments scraping current url: https:

The number of comments is 4.
Time elapsed: 21.799777030944824 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/do-gop-leaders-want-trump-in-2024/
The number of comments is 3.
Time elapsed: 21.20190143585205 seconds

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-the-elections-happening-in-2023/
The number of comments is 3.
Time elapsed: 21.011884212493896 seconds

Comments scraping current url: https://fivethirtyeight.com/features/are-blue-states-ready-to-relax-their-bans-on-later-abortions/
The number of comments is 10.
Time elapsed: 22.87244939804077 seconds

Comments scraping current url: https://fivethirtyeight.com/features/would-putting-south-carolina-first-give-black-democrats-a-stronger-voice/
The number of comments is 9.
Time elapsed: 21.23995876312256 seconds

Comments scraping current url: https://fivethirtyeight.com/features/which-parents-are-the-most-tired/
The number of comments is 5.
Time elapsed: 21.64988636

Comments scraping current url: https://fivethirtyeight.com/features/when-might-other-republicans-challenge-trump-for-the-2024-nomination/
The number of comments is 9.
Time elapsed: 20.743409872055054 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/did-the-jan-6-committee-succeed/
The number of comments is 7.
Time elapsed: 20.658615112304688 seconds

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-the-politics-of-prosecuting-trump/
The number of comments is 5.
Time elapsed: 20.880903005599976 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/the-number-of-election-denying-republicans-defined-the-2022-midterms/
The number of comments is 7.
Time elapsed: 21.75059151649475 seconds


Page 23...

Comments scraping current url: https://fivethirtyeight.com/videos/the-number-that-captures-the-impact-of-the-dobbs-decision/
The number of comments is 7.
Time elapsed: 22.365968465805054 seconds

Comments scrapi

The number of comments is 20.
Time elapsed: 22.99202036857605 seconds


Page 27...

Comments scraping current url: https://fivethirtyeight.com/features/rogers-democracy-on-ballot-1129/
The number of comments is 18.
Time elapsed: 20.661518812179565 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/if-biden-doesnt-run-in-2024-who-will/
The number of comments is 14.
Time elapsed: 22.011504411697388 seconds

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-where-the-georgia-runoff-stands/
The number of comments is 5.
Time elapsed: 20.931797742843628 seconds

Comments scraping current url: https://fivethirtyeight.com/features/few-midterm-voters-backed-different-parties-for-senate-and-governor/
The number of comments is 6.
Time elapsed: 20.942546129226685 seconds

Comments scraping current url: https://fivethirtyeight.com/features/what-can-the-2022-midterms-tell-us-about-2024/
The number of comments is 16.
Time elapsed: 20.96262836

Comments scraping current url: https://fivethirtyeight.com/features/the-3-big-questions-i-still-have-about-election-day/
The number of comments is 31.
Time elapsed: 21.10218048095703 seconds

Comments scraping current url: https://fivethirtyeight.com/features/ballot-measures-abortion/
The number of comments is 4.
Time elapsed: 21.78146457672119 seconds

Comments scraping current url: https://fivethirtyeight.com/features/why-democrats-shouldnt-take-the-asian-american-vote-for-granted/
The number of comments is 17.
Time elapsed: 21.334871530532837 seconds

Comments scraping current url: https://fivethirtyeight.com/features/control-of-the-senate-could-rest-on-abortion-and-inflation-in-nevada/
The number of comments is 11.
Time elapsed: 21.412578105926514 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/republicans-are-doing-notably-well-in-these-two-governors-races/
The number of comments is 6.
Time elapsed: 21.799954652786255 seconds


Page 32...

Comments scrap

The number of comments is 9.
Time elapsed: 20.822925567626953 seconds

Comments scraping current url: https://fivethirtyeight.com/features/state-government-elections/
The number of comments is 11.
Time elapsed: 20.97775435447693 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/a-quarter-of-latino-adults-dont-favor-any-midterm-candidate/
The number of comments is 4.
Time elapsed: 20.994978666305542 seconds


Page 36...

Comments scraping current url: https://fivethirtyeight.com/videos/how-much-do-campaign-ads-really-matter/
The number of comments is 2.
Time elapsed: 21.150009155273438 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/american-turning-point-politics-in-public-education/
The number of comments is 7.
Time elapsed: 20.712821006774902 seconds

Comments scraping current url: https://fivethirtyeight.com/features/most-candidates-who-think-2020-was-rigged-was-are-probably-going-to-win-in-november/
The number of comments is 55.
T

The number of comments is 10.
Time elapsed: 20.58345603942871 seconds


Page 40...

Comments scraping current url: https://fivethirtyeight.com/features/were-looking-for-a-freelance-audio-editor-for-our-politics-podcast/
The number of comments is 6.
Time elapsed: 20.533984899520874 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/who-is-john-fetterman/
The number of comments is 24.
Time elapsed: 21.204177856445312 seconds

Comments scraping current url: https://fivethirtyeight.com/features/2022-women-candidates-data/
The number of comments is 10.
Time elapsed: 21.12386155128479 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/how-our-midterm-forecast-takes-candidates-scandals-into-account/
The number of comments is 9.
Time elapsed: 20.984519958496094 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/why-our-model-thinks-democrats-are-unlikely-to-hold-the-house/
The number of comments is 7.
Time elapsed: 20.5533

The number of comments is 14.
Time elapsed: 20.763927698135376 seconds

Comments scraping current url: https://fivethirtyeight.com/features/latino-voters-shifted-right-in-2020-what-does-that-mean-for-arizona-and-nevada-this-year/
The number of comments is 15.
Time elapsed: 20.748008728027344 seconds

Comments scraping current url: https://fivethirtyeight.com/features/2022-candidates-race-data/
The number of comments is 14.
Time elapsed: 20.53258228302002 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/latino-voters-swung-right-in-2020-will-they-again-in-2022/
The number of comments is 7.
Time elapsed: 20.619486093521118 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/lindsey-graham-and-chuck-schumer-have-opposite-midterm-strategies/
The number of comments is 4.
Time elapsed: 21.554381370544434 seconds

Comments scraping current url: https://fivethirtyeight.com/features/wisconsin-senate-polls/
The number of comments is 14.
Time elaps

Comments scraping current url: https://fivethirtyeight.com/features/meet-6-democrats-of-color-who-want-to-see-their-party-change/
The number of comments is 10.
Time elapsed: 21.227624893188477 seconds

Comments scraping current url: https://fivethirtyeight.com/features/why-republican-voters-support-ballot-initiatives-their-red-states-do-not/
The number of comments is 18.
Time elapsed: 21.129876613616943 seconds

Comments scraping current url: https://fivethirtyeight.com/features/more-democrats-are-leaving-the-house-and-that-could-help-republicans-win/
The number of comments is 8.
Time elapsed: 21.073489665985107 seconds

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-whats-behind-bidens-rising-approval/
The number of comments is 10.
Time elapsed: 20.695841073989868 seconds

Comments scraping current url: https://fivethirtyeight.com/features/trumps-endorsees-have-started-losing-more-but-dont-read-into-that-for-2024/
The number of comments is 10.
Tim

The number of comments is 4.
Time elapsed: 20.714371919631958 seconds

Comments scraping current url: https://fivethirtyeight.com/features/can-progressive-candidates-in-minnesota-vermont-and-wisconsin-win-their-primaries/
The number of comments is 8.
Time elapsed: 20.85115671157837 seconds

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-could-the-inflation-reduction-act-save-bidens-approval-rating/
The number of comments is 11.
Time elapsed: 20.993164539337158 seconds


Page 53...

Comments scraping current url: https://fivethirtyeight.com/videos/wisconsin-spent-months-investigating-the-2020-election-these-candidates-still-believe-it-was-fraudulent/
The number of comments is 20.
Time elapsed: 21.047111749649048 seconds

Comments scraping current url: https://fivethirtyeight.com/features/3-republican-primaries-and-a-special-election-to-watch-in-minnesota-and-wisconsin/
The number of comments is 5.
Time elapsed: 20.87253975868225 seconds

Comments sc

Comments scraping current url: https://fivethirtyeight.com/features/at-least-120-republicans-who-deny-the-2020-election-results-will-be-on-the-ballot-in-november/
The number of comments is 22.
Time elapsed: 21.462868213653564 seconds


Page 57...

Comments scraping current url: https://fivethirtyeight.com/videos/wait-who-are-kelly-schulz-and-dan-cox/
The number of comments is 13.
Time elapsed: 22.182283401489258 seconds

Comments scraping current url: https://fivethirtyeight.com/features/one-of-the-few-things-americans-agree-on-space-is-cool/
The number of comments is 3.
Time elapsed: 20.307780027389526 seconds

Comments scraping current url: https://fivethirtyeight.com/features/biden-is-very-unpopular-it-may-not-tell-us-much-about-the-midterms/
The number of comments is 12.
Time elapsed: 21.97046709060669 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/will-democrats-continue-to-win-in-georgia-in-2022/
The number of comments is 4.
Time elapsed: 21.9358413219

Comments scraping current url: https://fivethirtyeight.com/videos/the-new-legal-fault-lines-for-abortion/
The number of comments is 6.
Time elapsed: 26.538220643997192 seconds

Comments scraping current url: https://fivethirtyeight.com/features/emergency-politics-podcast-supreme-court-overturns-roe-v-wade/
The number of comments is 5.
Time elapsed: 30.059600830078125 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/supreme-court-gun-rights/
The number of comments is 6.
Time elapsed: 25.729260683059692 seconds

Comments scraping current url: https://fivethirtyeight.com/features/roe-v-wade-defined-an-era-the-supreme-court-just-started-a-new-one/
The number of comments is 25.
Time elapsed: 28.372469902038574 seconds

Comments scraping current url: https://fivethirtyeight.com/features/the-supreme-courts-argument-for-overturning-roe-v-wade/
The number of comments is 23.
Time elapsed: 32.37429857254028 seconds

Comments scraping current url: https://fivethirtyeight.

The number of comments is 5.
Time elapsed: 21.574033975601196 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/what-would-make-the-jan-6-hearings-change-minds/
The number of comments is 9.
Time elapsed: 21.491077661514282 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/the-democratic-divides-in-california/
The number of comments is 2.
Time elapsed: 21.767457246780396 seconds

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-the-california-primary-races-to-watch/
The number of comments is 3.
Time elapsed: 21.412523984909058 seconds

Comments scraping current url: https://fivethirtyeight.com/features/suicide-prevention-could-prevent-mass-shootings/
The number of comments is 8.
Time elapsed: 21.101152896881104 seconds


Page 66...

Comments scraping current url: https://fivethirtyeight.com/features/why-high-turnout-in-georgia-doesnt-mean-voting-restrictions-havent-had-an-effect/
The number of comments

Comments scraping current url: https://fivethirtyeight.com/features/how-every-senator-and-governor-ranks-according-to-popularity-above-replacement/
The number of comments is 9.
Time elapsed: 21.37013077735901 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/how-critical-race-theory-became-part-of-the-culture-war/
The number of comments is 8.
Time elapsed: 20.904179334640503 seconds


Page 70...

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-the-politics-of-anti-critical-race-theory-laws/
The number of comments is 7.
Time elapsed: 21.110110998153687 seconds

Comments scraping current url: https://fivethirtyeight.com/features/why-trans-rights-became-the-gops-latest-classroom-target/
The number of comments is 16.
Time elapsed: 21.728649139404297 seconds

Comments scraping current url: https://fivethirtyeight.com/features/trumps-candidate-lost-in-nebraska-but-trump-is-still-winning-most-of-his-primaries/
The number of comment

Comments scraping current url: https://fivethirtyeight.com/features/more-than-70-percent-of-trumps-endorsees-believe-the-2020-election-was-fraudulent/
The number of comments is 23.
Time elapsed: 21.637430667877197 seconds


Page 74...

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-how-old-is-too-old-for-elected-office/
The number of comments is 6.
Time elapsed: 21.269503831863403 seconds

Comments scraping current url: https://fivethirtyeight.com/features/it-can-already-take-weeks-to-get-an-abortion/
The number of comments is 13.
Time elapsed: 22.076496362686157 seconds

Comments scraping current url: https://fivethirtyeight.com/features/what-happens-when-an-election-official-believes-the-big-lie/
The number of comments is 23.
Time elapsed: 21.069854974746704 seconds

Comments scraping current url: https://fivethirtyeight.com/features/do-americans-care-about-the-latest-covid-19-wave-in-the-northeast/
The number of comments is 10.
Time elapsed: 21.

The number of comments is 12.
Time elapsed: 21.500108242034912 seconds


Page 78...

Comments scraping current url: https://fivethirtyeight.com/videos/democrats-gamble-with-their-nevada-gerrymander/
The number of comments is 9.
Time elapsed: 21.163382530212402 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/title-ix-and-the-fight-for-womens-equality-in-sports/
The number of comments is 8.
Time elapsed: 21.54646921157837 seconds

Comments scraping current url: https://fivethirtyeight.com/features/why-early-senate-and-governor-polls-have-plenty-to-tell-us-about-november/
The number of comments is 7.
Time elapsed: 21.453094005584717 seconds

Comments scraping current url: https://fivethirtyeight.com/features/ketanji-brown-jacksons-nomination-may-not-be-enough-to-turn-out-black-voters-for-democrats/
The number of comments is 11.
Time elapsed: 20.88355779647827 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/americans-liked-daylight-savi

The number of comments is 3.
Time elapsed: 21.320149660110474 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/what-democrats-and-republicans-think-of-russias-invasion-of-ukraine/
The number of comments is 3.
Time elapsed: 22.38959002494812 seconds

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-americans-are-unified-against-russias-invasion-of-ukraine/
The number of comments is 5.
Time elapsed: 21.04315686225891 seconds

Comments scraping current url: https://fivethirtyeight.com/features/where-ohio-republicans-attacks-on-dr-fauci-came-from/
The number of comments is 10.
Time elapsed: 21.116266012191772 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/texas-may-have-the-worst-gerrymander-in-the-country/
The number of comments is 7.
Time elapsed: 21.5452778339386 seconds

Comments scraping current url: https://fivethirtyeight.com/features/how-to-watch-texass-primaries-like-a-pro/
The number of comm

The number of comments is 14.
Time elapsed: 21.658075094223022 seconds

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-why-the-gop-has-made-gains-with-latino-voters/
The number of comments is 3.
Time elapsed: 21.241938591003418 seconds

Comments scraping current url: https://fivethirtyeight.com/features/how-a-court-ruling-in-alabama-could-boost-black-political-power-throughout-the-south/
The number of comments is 12.
Time elapsed: 21.127323627471924 seconds

Comments scraping current url: https://fivethirtyeight.com/features/why-democrats-keep-losing-culture-wars/
The number of comments is 49.
Time elapsed: 20.987796306610107 seconds

Comments scraping current url: https://fivethirtyeight.com/features/its-harder-than-ever-to-confirm-a-supreme-court-justice/
The number of comments is 9.
Time elapsed: 21.45792818069458 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/the-7-most-important-senate-races-ranked/
The number of co

The number of comments is 8.
Time elapsed: 21.473565578460693 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/how-the-2022-midterms-might-play-out/
The number of comments is 9.
Time elapsed: 21.30981755256653 seconds

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-will-exhausted-americans-tune-out-from-politics/
The number of comments is 8.
Time elapsed: 21.142656564712524 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/do-you-buy-that-jan-6-strengthened-trumps-hold-on-the-republican-party/
The number of comments is 5.
Time elapsed: 21.38621211051941 seconds

Comments scraping current url: https://fivethirtyeight.com/features/why-democrats-keep-bringing-up-voting-rights/
The number of comments is 15.
Time elapsed: 20.9342839717865 seconds

Comments scraping current url: https://fivethirtyeight.com/features/what-might-democrats-voting-rights-bill-entail/
The number of comments is 7.
Time elapsed:

The number of comments is 22.
Time elapsed: 21.174745082855225 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/why-stacey-abrams-and-beto-orourke-are-going-for-it-in-2022/
The number of comments is 7.
Time elapsed: 21.33620285987854 seconds

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-most-americans-dont-blame-god-for-all-the-bad-stuff-that-keeps-happening/
The number of comments is 3.
Time elapsed: 21.433066606521606 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/do-you-buy-that-stacey-abrams-and-beto-orourke-will-win-their-governors-races/
The number of comments is 8.
Time elapsed: 21.889546394348145 seconds


Page 95...

Comments scraping current url: https://fivethirtyeight.com/features/what-2021s-biggest-upset-elections-tell-us-about-the-losing-parties/
The number of comments is 6.
Time elapsed: 20.962190866470337 seconds

Comments scraping current url: https://fivethirtyeight.com/featu

The number of comments is 14.
Time elapsed: 21.529645681381226 seconds

Comments scraping current url: https://fivethirtyeight.com/features/how-republicans-won-the-virginia-governors-race/
The number of comments is 32.
Time elapsed: 21.29790449142456 seconds

Comments scraping current url: https://fivethirtyeight.com/features/how-the-supreme-court-could-make-it-easier-to-carry-guns-in-public/
The number of comments is 7.
Time elapsed: 141.2483263015747 seconds

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-a-good-election-night-for-republicans/
The number of comments is 3.
Time elapsed: 21.07268190383911 seconds


Page 99...

Comments scraping current url: https://fivethirtyeight.com/videos/election-day-2021-virginia-and-beyond/
The number of comments is 3.
Time elapsed: 22.23638653755188 seconds

Comments scraping current url: https://fivethirtyeight.com/features/the-u-s-has-a-lot-of-guns-involved-in-crimes-but-very-little-data-on-where-they-came

Comments scraping current url: https://fivethirtyeight.com/videos/is-government-about-to-regulate-facebook/
The number of comments is 0.
Time elapsed: 21.749254941940308 seconds

Comments scraping current url: https://fivethirtyeight.com/features/politics-podcast-the-politics-of-the-debt-ceiling/
The number of comments is 0.
Time elapsed: 20.803318977355957 seconds

Comments scraping current url: https://fivethirtyeight.com/features/kyrsten-sinema-is-confounding-her-own-party-but-why/
The number of comments is 0.
Time elapsed: 21.535048484802246 seconds


Page 103...

Comments scraping current url: https://fivethirtyeight.com/features/americans-want-the-government-to-act-on-climate-change-whats-the-hold-up/
The number of comments is 0.
Time elapsed: 22.042104959487915 seconds

Comments scraping current url: https://fivethirtyeight.com/videos/why-the-u-s-was-unprepared-for-covid-according-to-the-former-fda-chief/
The number of comments is 0.
Time elapsed: 21.60987401008606 seconds

Comm

AttributeError: 'NoneType' object has no attribute 'text'

Now we save the data frame to a `.csv` file to use in the data exploration phase.

In [4]:
# Use pandas to make a data frame 
df = pd.DataFrame(posts)
df.columns = ["Post type", "Title", "Post url", "Author(s)", "Date and time posted", "Tags", "No. of comments"]
# Then save it as a .csv file, with the index column removed
df.to_csv("ProblemStatementOutputs/"+str(len(posts))+"_"+now.strftime("%d-%m-%Y_%H-%M-%S")+".csv", index = False)

The name of the file has the form (number of posts)\_(date)-(month)-(year in 4 digits)\_(hour in military time)-(minute)-(seconds).

## Problem statement

Which 538 features posts get the most traffic?

## Stakeholders

News has become more polarized and sensationalized in recent years, all in the name of more clicks. This data analysis could provide some insight into what kind of articles and other content (podcasts and videos) get more traffic, without news organizations having to compromise their neutrality and factual correctness.

## Key performance indicators (KPIs)

- Number of comments a post gets