<p align = "center" draggable=”false” ><img src="https://user-images.githubusercontent.com/37101144/161836199-fdb0219d-0361-4988-bf26-48b0fad160a3.png" 
     width="200px"
     height="auto"/>
</p>

# Reddit and HuggingFace Starter Kit

## Part I: [Reddit API](https://www.reddit.com/dev/api/)
The first part of this excercise is to figure out how to instantiate a Reddit API object using the Python Reddit API Wrapper [PRAW](https://praw.readthedocs.io/en/stable/).  PRAW is a Python library that provides a simple interfaceto interact with the Reddit API.

### Your Task
You will first need to instantiate a [Reddit instance](https://praw.readthedocs.io/en/stable/code_overview/reddit_instance.html).
Hint: you only need to use `client_id`, `client_secret`, and `user_agent`

#### Make sure everyone in the group does this part! 

Follow the guide below on how to get your `client_id` and `client_secret`.

#### Follow these steps:
1. Pull the `FourthBrain/ML03` repo locally so you can start development.
2. Open `reddit_and_huggingface.ipynb` and install the necessary packages for this lesson by running:

    ```
    cd code_student/Week_2
    conda activate {your_virtual_environment_name}
    pip install transformers praw torch torchvision torchaudio
    ```
    
3. Obtain your `client_id` and `client_secret`

* Make a Reddit account
* Follow the steps in this screenshot which are the first steps from this [guide](https://towardsdatascience.com/how-to-use-the-reddit-api-in-python-5e05ddfd1e5c).

![instructions to set up reddit api](../../images/reddit_get_access.JPG)

* Create a `secrets.py` file and include the following:

    ```
    REDDIT_API_CLIENT_ID = ""
    REDDIT_API_CLIENT_SECRET = ""
    REDDIT_API_USER_AGENT = {can_be_any_string...for ex: "teslabot"}
    ```
    Get it?  [Teslabot :)](https://www.tesla.com/AIhttps://www.tesla.com/AI)
    

* Put `secrets.py` in `Week_2` so you can easily import it

4. Complete the code in the `# YOUR CODE HERE` space below that creates a reddit instance object that allows us to interact with the Reddit API.  Note that the `subreddit` object for the 'r/TSLA' subreddit has already been created for you.

In [7]:
import praw
from transformers import pipeline
import secrets


reddit = praw.Reddit(
    client_id=secrets.REDDIT_API_CLIENT_ID,
    client_secret=secrets.REDDIT_API_CLIENT_SECRET,
    user_agent=secrets.REDDIT_API_USER_AGENT
)

subreddit = reddit.subreddit('TSLA')

## Part II:  [r/TSLA Subreddit](https://www.reddit.com/r/TSLA/)
The second part of this exercise is to figure out how to the following code is parsing comments through use of the r/TSLA `subreddit` instance object.

### Your Task
1. Work with your group to comment each line of the following code so that you describe what each piece is doing.
2. Create one comment at the top of the code that describes what the larger for loop is iterating over.  
3. (Optional) How many comments will I get from this?

A few resources that might help!
* How do I find the top 10 posts of all time from your favorite subreddit(s)? (hint: look at ["Obtain Submission Instances from a Subreddit"](https://praw.readthedocs.io/en/stable/getting_started/quick_start.html))
* How do I parse comments from the post? (hint: look at ["Obtain Submission Instances from a Subreddit"](https://praw.readthedocs.io/en/stable/getting_started/quick_start.html))

In [37]:
from praw.models import MoreComments

top_comments = []

# iterates over the top subreddit submissions, limited in amount by the "limit param"
for submission in subreddit.top(limit=10):
    # iterates over the submissions obtained
    for top_level_comment in submission.comments:
        # if the comment still has more comments inside (represented by obj MoreComments) go to the next comment
        if isinstance(top_level_comment, MoreComments):
            continue

        # if is here it has no comments it is a top level comment and is attached a the output list
        top_comments.append(top_level_comment.body)
    

In [10]:
t = top_comments[0]

In [12]:
t

'ho lee fuk \n\nyou got anymore insider information? 👀👀'

## Part III:  [HuggingFace](https://huggingface.co/docs/transformers/quicktour)
The third part of this exercise is to analyze the sentiment of each comment scraped from `r/TSLA` to using a pre-trained HuggingFace model to make the inference. 

### Your Task
1. Implement the [Sentiment Analysis](https://huggingface.co/docs/transformers/quicktour) Model in the `# YOUR CODE HERE` section. 
2. (Optional) What is the net sentiment of the entire list of comments?

In [38]:
from transformers import pipeline

sentiment_model = pipeline("sentiment-analysis")


No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


In [39]:
import random
def get_random_comment(conversations):
    comment = random.choice(conversations)
    return comment

# Run sentiment analysis
sentiment_query_sentence = get_random_comment(top_comments) # grabs a random comment from the comment and replies list
sentiment = sentiment_model(sentiment_query_sentence) # 
print(f"Sentiment test: {sentiment_query_sentence} === {sentiment}")

Sentiment test: Would be great but Elon said he wasn't in a hurry to do that again === [{'label': 'NEGATIVE', 'score': 0.9818630814552307}]


In [40]:
sentiment_all = sentiment_model(top_comments)

In [41]:
len(top_comments)

167

In [42]:
import pandas as pd

df = pd.DataFrame(sentiment_all)

In [43]:
df.head()

Unnamed: 0,label,score
0,NEGATIVE,0.99374
1,NEGATIVE,0.999329
2,NEGATIVE,0.99597
3,NEGATIVE,0.992491
4,NEGATIVE,0.997287


In [44]:
df['label'].value_counts()

NEGATIVE    106
POSITIVE     61
Name: label, dtype: int64

In [45]:
df['label'].value_counts() /df.label.count()

NEGATIVE    0.634731
POSITIVE    0.365269
Name: label, dtype: float64