# Fox News (Tucker Carlson) Web Scraping

All of the transcripts for Tucker Carlson Tonight are stored on one page. It is important to note that the page contains links to other media related to the show - logic is introduced to scrape only the full show transcripts. Given the design of the page, which features a "show more" button necessary to expand the number of items visible on the page, Selenium is used for web scraping.

In [37]:
#importing libraries
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
import re
import os

## Connect to Chrome web driver

In [2]:
driver = webdriver.Chrome(executable_path=r'C:/Users/lambe/Downloads/chromedriver_win32/chromedriver.exe')

## Connect to Tucker Carlson Transcript page and display more results

In [3]:
driver.get('https://www.foxnews.com/category/shows/tucker-carlson-tonight/transcript')

In [4]:
#Code to click "show more" iteratively
#Set a counter to limit the number of iterations (otherwise, it will probably run indefinitely)
#Takes 5 minutes to run due to the timer in between steps (necessary to avoid overloading the website)

import time

time.sleep(3)
iter_count = 0
while iter_count < 75:
    try:       
        show_more = driver.find_element_by_xpath("//div[@class='button load-more js-load-more']")
        show_more.click()
        iter_count += 1
        time.sleep(3)
    except:
        break

In [32]:
# Code to "show more" manually
#btn = driver.find_element_by_xpath("//div[@class='button load-more js-load-more']") 
#btn.click()

## Scrape the list of URLs from the Tucker Carlson Transcript page

In [5]:
articles = driver.find_elements_by_xpath('//div[@class="content article-list"]//article[@class="article"]//div[@class="m"]//a[@href]')

#Scraping the list of URLs to show transcripts

url_list = []
for x in range(len(articles)):
# logic to only pull links to show transcripts (and not the monologue transcipt)
    if "transcript" in articles[x].get_attribute("href") and "monologue" not in articles[x].get_attribute("href"):
        url_list.append(articles[x].get_attribute("href"))
    
print(url_list)




## Scrape the article dates from the Tucker Carlson Transcript page

In [33]:
# Scrape the transcript dates
transcript_dates = []
transcripts = []
for x in range(len(url_list))[:5]:
    driver.get(url_list[x])
    date = driver.find_element_by_tag_name("time").text
    transcript_text = driver.find_elements_by_class_name("speakable")
    transcript_dates.append(date)
    transcripts.append(transcript_text[1].text)
    print(transcript_text[1].text)
    

TUCKER CARLSON, FOX NEWS CHANNEL HOST: Good evening and welcome to TUCKER CARLSON TONIGHT.

When the Russian military invaded Ukraine last month, the most highly credentialed people in the world seemed stunned by it and that was not very reassuring to the rest of us.

Quote: "It was a shock to many of the leading experts and policymakers in the United States, Europe and even Ukraine," explained a fellow expert and policymaker at the Atlantic Council. Quote: "The head of German intelligence was so caught off guard that he was still in Kyiv and had to be evacuated."

That's pretty weird if you think about it, because for weeks, Joe Biden had been speaking in a very loud voice about a potential Russian invasion of Ukraine. They seemed ready for it and yet it turns out that nobody in Washington, including Biden himself, really thought it was going to happen and when it did happen, official Washington concluded that Putin must be insane.

"The casual speculation about Vladimir Putin's menta

TUCKER CARLSON, HOST: Good evening, and welcome to TUCKER CARLSON TONIGHT.

For decades and decades, the Human Rights Campaign has been, by far, the most powerful gay rights lobby in Washington. You may have just heard of them recently, but they've been around for over 40 years, and for most of that time, HRC's central goal, and they said it many times, was winning the right of gay people to get legally married.

Then, in the summer of 2015, they finally succeeded. They reached their goal. The Supreme Court issued a decision in a case called Obergefell versus Hodges, and overnight, all 50 states were required by law to recognize same-sex marriage.

So, for the Human Rights Campaign, this should have been a moment of unbridled celebration, a dream come true, but it wasn't. It was a crisis and if you don't understand why it was a crisis, then you don't live in Washington surrounded by non-profits.

So, by this point, the Human Rights Campaign had evolved from a scrappy little lobby into 

TUCKER CARLSON, FOX NEWS CHANNEL HOST: Well, good evening and welcome to TUCKER CARLSON TONIGHT.

We're trying to find things to be happy about and actually, if you look hard enough, there are a lot of them. One of the best things about, say, a Supreme Court confirmation hearing is that you get to see the U.S. Senate in action and this is new.

If you're like most people, you know the Senate has a hundred members or two from every state, and you know that they are somehow important. They're in the Constitution. So, they wear dark suits and red ties to work. They talk about laws. Every summer, they fly to foreign countries and act like they're President.

Some of them you may even know by name -- the famous ones like Chuck Schumer and Mitch McConnell, obviously, the publicity addicts like Lindsey Graham and of course, the ones you see on your own ballot every six years. So, that's what you know.

But do you really know these people? Who are they, really? What are they like? Well, unless

TUCKER CARLSON, FOX NEWS CHANNEL HOST: Good evening and welcome to TUCKER CARLSON TONIGHT.

In the Senate Judiciary Committee yesterday, Marsha Blackburn of Tennessee asked Joe Biden's Supreme Court nominee, Ketanji Jackson, what should have been the easiest question ever posed. "Can you define what a woman is?"

Now imagine, put yourself in her position and imagine how relieved Jackson must have been when she heard that question. Here, she stayed up all night trying to memorize obscure case law from 19th Century and Court precedent and when she finally gets to the hearing room, all the Republicans want is a recap of Day One of ninth grade Biology: What's a woman?

For a world-famous scholar like Ketanji Jackson, that should have been effortless. Talk about slow and steady right down the plate. "A woman," Jackson might have said, looking incredulous, "That's simple. A woman is a human being with two X chromosomes. Ask any geneticist. It's detectable in a blood test, but if you're still

TUCKER CARLSON, FOX NEWS CHANNEL HOST: Good evening and welcome to TUCKER CARLSON TONIGHT.

As if this country's core institutions have not been degraded or diminished enough with pregnant flight suits and F.B.I. that behaves like Nancy Pelosi's Praetorian Guard, as if that weren't bad enough, Joe Biden announced in January his plan to choose the next Supreme Court Justice on the basis of appearance, the Supreme Court, you never thought that would happen.

Sociology Department maybe, your company perhaps, but the Supreme Court really matters. What does appearance have to do with ability or fealty to the Constitution? Joe Biden never explained. He did indicate, to be fair, that he would prefer a lawyer for the job, maybe even a clever one. But mostly he said he wanted a Black woman. Genetics being the single most important factor in what we used to call judicial temperament.

How does that work exactly? How do genes determine your ability as a Supreme Court Justice or a surgeon or an ai

In [43]:
#print(transcript_dates[:5])
print(len(transcripts[:5]))

test_string = (transcripts[1])
print(type(test_string))

5
<class 'str'>


In [44]:
#write to text files
filename =  "test" + "_" + "tucker" + "_" + "carlson" + "_" + "transcript.txt"
repo_path = os.path.dirname(os.getcwd())
output_path = os.path.join(repo_path, "data", "01-raw", "tucker_carlson", filename)

#print(output_path)

#export the file
with open(output_path,"w") as f:
    f.write(transcripts[1])