# Module 2: Reddit \& Bing Search APIs

In this demo I will demonstrate how to utilize the reddit API and Bing Search to pull news articles and posts as a source of external data.

First, I will show how to create a Reddit personal use script for accessing the Reddit API. This will require having a reddit account, if you don't have one, follow along using the provided excel file.

Then, we will all create a university account on Azure, and then create a Bing Search resource to access the Bing Search API.

Use this link to create a personal use script for the Reddit API [Click Here](https://www.reddit.com/prefs/apps)

## Load in Dependcies, pip install praw

In [1]:
import praw

from datetime import datetime
from datetime import date
import os
import pandas as pd
import re
import string

## Specify Reddit credentials and subreddits to be scraped

In [9]:
#Create a Reddit instance
reddit = praw.Reddit(client_id= os.environ['REDDIT_CLIENT_ID'],
                     client_secret= os.environ['REDDIT_CLIENT_SECRET'],
                     user_agent='reddit_appv1')


# not a secure way to store credentials, consider using a separate file, creating environment variables, keyvault, etc.

In [5]:
# Specify the subreddit names you want to retrieve posts from
left_subreddit_names = ['politics', 'democrats', 'liberal']
right_subreddit_names = ['conservative', 'libertarian', 'republican']

In [6]:
subreddit_names = ['politics', 'democrats', 'liberal','conservative', 'libertarian', 'republican']

## Pull in selected Post Attributes, store and convert to dataframe

In [7]:
subreddit_names = ['politics', 'democrats', 'liberal','conservative', 'libertarian', 'republican']

In [10]:
post_attributes = []# create an empty post_attributes list

for subreddit_name in subreddit_names:
    subreddit = reddit.subreddit(subreddit_name)# set subreddits
    posts = subreddit.top(limit = 100) # set post parameters

    for post in posts: # pull in the following post attributes
        post_attributes.append({
            'Title': post.title,
            'Content': post.selftext,
            'URL': post.url,
            'Date': datetime.utcfromtimestamp(post.created_utc).strftime('%Y-%m-%d'),
            'Provider': subreddit_name
        })

df_red = pd.DataFrame(post_attributes)# create dataframe

df_red.head(10)
df_red['All_Text'] = df_red['Title'] + " " + df_red['Content']# create all_text column
df_red['Source'] = 'Reddit'  # create source column
# display dataframe

In [5]:
post_attributes = []  # create an empty post_attributes list

# Define a dictionary that maps subreddit names to political leanings
subreddit_political_leanings = {
    'politics': 'left',
    'democrats': 'left',
    'liberal': 'left',
    'conservative': 'right',
    'libertarian': 'right',
    'republican': 'right'
    # Add more subreddits and their political leanings as needed
}

for subreddit_name in subreddit_names:
    subreddit = reddit.subreddit(subreddit_name)  # set subreddits
    posts = subreddit.top(limit=100)  # set post parameters

    for post in posts:  # pull in the following post attributes
        post_attributes.append({
            'Title': post.title,
            'Content': post.selftext,
            'URL': post.url,
            'Date': datetime.utcfromtimestamp(post.created_utc).strftime('%Y-%m-%d'),
            'Provider': subreddit_name,
            'Political Lean': subreddit_political_leanings.get(subreddit_name, 'neutral')
        })

df_red = pd.DataFrame(post_attributes)  # create dataframe

df_red['All_Text'] = df_red['Title'] + " " + df_red['Content']  # create all_text column
df_red['Source'] = 'Reddit'  # create source column

# Display dataframe
df_red.head(5)

Unnamed: 0,Title,Content,URL,Date,Provider,Political Lean,All_Text,Source
0,Megathread: Joe Biden Projected to Defeat Pres...,Former Vice President Joseph Biden has secured...,https://www.reddit.com/r/politics/comments/jpt...,2020-11-07,politics,left,Megathread: Joe Biden Projected to Defeat Pres...,Reddit
1,Mitch McConnell Will Lose Control Of The Senat...,,https://www.buzzfeednews.com/article/paulmcleo...,2021-01-06,politics,left,Mitch McConnell Will Lose Control Of The Senat...,Reddit
2,Megathread: House Votes to Impeach President D...,The United States House of Representatives has...,https://www.reddit.com/r/politics/comments/ecm...,2019-12-19,politics,left,Megathread: House Votes to Impeach President D...,Reddit
3,Trump Threatens to ‘Leave the Country’ if He L...,,https://www.thedailybeast.com/trump-threatens-...,2020-10-17,politics,left,Trump Threatens to ‘Leave the Country’ if He L...,Reddit
4,Demands for Kushner to Resign Over 'Staggering...,,https://www.commondreams.org/news/2020/07/31/d...,2020-07-31,politics,left,Demands for Kushner to Resign Over 'Staggering...,Reddit


In [6]:
df_red.shape

(600, 8)

## Filter dataframe to external URLs

In [7]:
filtered_df = df_red[~(df_red['URL'].str.startswith('https://www.reddit.com') | df_red['URL'].str.startswith('https://i.redd') | df_red['URL'].str.startswith('https://v.redd.it'))]

In [8]:
filtered_df

Unnamed: 0,Title,Content,URL,Date,Provider,Political Lean,All_Text,Source
1,Mitch McConnell Will Lose Control Of The Senat...,,https://www.buzzfeednews.com/article/paulmcleo...,2021-01-06,politics,left,Mitch McConnell Will Lose Control Of The Senat...,Reddit
3,Trump Threatens to ‘Leave the Country’ if He L...,,https://www.thedailybeast.com/trump-threatens-...,2020-10-17,politics,left,Trump Threatens to ‘Leave the Country’ if He L...,Reddit
4,Demands for Kushner to Resign Over 'Staggering...,,https://www.commondreams.org/news/2020/07/31/d...,2020-07-31,politics,left,Demands for Kushner to Resign Over 'Staggering...,Reddit
5,Over A million people sign petition calling fo...,,https://www.newsweek.com/kkk-petition-terroris...,2020-06-12,politics,left,Over A million people sign petition calling fo...,Reddit
6,Report: Biden Admin Discovers Trump Had Zero P...,,https://talkingpointsmemo.com/news/report-bide...,2021-01-21,politics,left,Report: Biden Admin Discovers Trump Had Zero P...,Reddit
...,...,...,...,...,...,...,...,...
496,Woman who won’t get vaccinated due to religiou...,,https://www.independent.co.uk/news/world/ameri...,2021-10-08,libertarian,right,Woman who won’t get vaccinated due to religiou...,Reddit
498,Only Five Republicans vote for bill to decrimi...,,https://thehill.com/homenews/house/528806-five...,2020-12-05,libertarian,right,Only Five Republicans vote for bill to decrimi...,Reddit
557,Accurate,,https://jssocial.pw/ppkey/fget/pic8/upload/j4I...,2021-02-24,republican,right,Accurate,Reddit
596,Thousands Sign Petition Calling For Nancy Pelo...,,https://www.dailywire.com/news/thousands-sign-...,2020-09-11,republican,right,Thousands Sign Petition Calling For Nancy Pelo...,Reddit


In [9]:
filtered_df['URL']

1      https://www.buzzfeednews.com/article/paulmcleo...
3      https://www.thedailybeast.com/trump-threatens-...
4      https://www.commondreams.org/news/2020/07/31/d...
5      https://www.newsweek.com/kkk-petition-terroris...
6      https://talkingpointsmemo.com/news/report-bide...
                             ...                        
496    https://www.independent.co.uk/news/world/ameri...
498    https://thehill.com/homenews/house/528806-five...
557    https://jssocial.pw/ppkey/fget/pic8/upload/j4I...
596    https://www.dailywire.com/news/thousands-sign-...
599                       https://i.imgtc.ws/5pNI5Xk.jpg
Name: URL, Length: 348, dtype: object