![reddit banner](https://cdn.dribbble.com/users/1761084/screenshots/3587716/reddit.gtif)

In [13]:
# Importing important libraries
import praw
import pandas as pd
import configparser

In [14]:
# For reading configuration files for Reddit Credentials
config = configparser.ConfigParser()
config.read('reddit_credentials.ini')

# Storing credential info in local variables
user_agent = config.get('credentials', 'user_agent')
client_id = config.get('credentials', 'client_id')
client_secret = config.get('credentials', 'client_secret')
redirect_url = config.get('credentials', 'redirect_url')

In [15]:
# Creating read-only Reddit instance
reddit = praw.Reddit(user_agent = user_agent,
                    client_id = client_id,
                    client_secret = client_secret,
                    redirect_url = redirect_url)

## Extracting Comments
For our project we are going to use top 3 most popular Reddit communities -
* Machine Learning - [r/MachineLearning](https://www.reddit.com/r/MachineLearning/)
* Artificial Intelligence - [r/artificial](https://www.reddit.com/r/Artificial/)
* Data Science - [r/DataScience](https://www.reddit.com/r/DataScience/)

We will extract top 1000 post of all time from each sub-reddit to create our dataset along with some other useful information like Post URL (& ID), User posted, Post title, number of comments, time created, upvote ratio and score. 

In [36]:
# Extracting top 1000 posts from each subreddit
posts = reddit.subreddit('MachineLearning+artificial+datascience').top(time_filter = 'all', limit = 3000)

In [37]:
# Creting DataFrame of the top posts along with other attributes for analysis

posts_list = []

for post in posts:
    posts_list.append({
        'post_id' : post.id,
        'post_title' : post.title,
        'subreddit' : post.subreddit,
        'time_created' : post.created_utc,
        'post_url' : post.url,
        'flair_text' : post.link_flair_text,
        'score' : post.score,
        'comments' : post.num_comments,
        'upvote_ratio' : post.upvote_ratio
    })
    
posts_df = pd.DataFrame(posts_list)

In [39]:
# Displaying the content
posts_df.sample(10)

Unnamed: 0,post_id,post_title,subreddit,time_created,post_url,flair_text,score,comments,upvote_ratio
1313,vjpew4,Working with data is like...,datascience,1656080000.0,https://www.reddit.com/r/datascience/comments/...,Discussion,395,32,0.94
1379,9lprhw,The Intro to Data Science course at UC Berkele...,datascience,1538770000.0,https://i.redd.it/mh4zp1hxbfq11.jpg,,377,92,0.98
2842,bc0lka,A Google Brain Program Is Learning How to Program,artificial,1554993000.0,https://medium.com/syncedreview/a-google-brain...,,91,23,0.94
104,10y2rrx,Thoughts?,datascience,1675969000.0,https://i.redd.it/l269tf8x39ha1.jpg,Discussion,1690,193,0.97
1909,9psua7,If you've been wondering about the disappearan...,datascience,1540029000.0,https://www.reddit.com/r/datascience/comments/...,,276,30,0.98
2894,oyjemy,Small and wide data is important and relevant:...,artificial,1628174000.0,https://signum.ai/blog/small-and-wide-data-is-...,Discussion,84,5,0.95
1897,vx7mx0,"Every higher level management - ""We have data,...",datascience,1657620000.0,https://i.redd.it/x2d2akh160b91.png,Fun/Trivia,277,24,0.98
1618,da2cna,[N] Amidst controversy regarding his most rece...,MachineLearning,1569600000.0,https://www.reddit.com/r/MachineLearning/comme...,News,345,113,0.92
1099,t37al0,[R] Robotic Telekinesis: Controlling Multifing...,MachineLearning,1646024000.0,https://v.redd.it/820q8hyv8ik81,Research,430,9,0.97
2917,xoqe06,AI audio is on the rise and will spark new deb...,artificial,1664215000.0,https://the-decoder.com/ai-audio-is-on-the-ris...,News,84,28,0.91


We will use the 'post_id' to further extract the comments from the top posts.

In [None]:
comments_list = []

for post_id in posts_df['post_id']:
    submission = reddit.submission(post_id)
    
    submissi