### **1. Install PRAW (Python Reddit API Wrapper)**

In [3]:
#!pip install praw

#### From the [PRAW documentation](https://praw.readthedocs.io/en/stable/getting_started/quick_start.html):

#### Prerequisites

* **Python Knowledge**:
You need to know at least a little Python to use PRAW. PRAW supports Python 3.7+. If you have any issues, feel free to discuss with the instructional team.

* **Reddit Knowledge**:
A basic understanding of how Reddit works is a must. In the event you are not already familiar with Reddit start at [**Reddit Help**](https://support.reddithelp.com/hc/en-us).

* **Reddit Account**:
A Reddit account is required to access Reddit’s API. Create one at [**reddit.com**](https://www.reddit.com/).

* **Client ID & Client Secret**:
These two values are needed to access Reddit’s API as a script application (see [Authenticating via OAuth](https://praw.readthedocs.io/en/stable/getting_started/authentication.html#oauth) for other application types). If you don’t already have a client ID and client secret, follow [**Reddit’s First Steps Guide**](https://github.com/reddit-archive/reddit/wiki/OAuth2-Quick-Start-Example#first-steps) to create them.

* **User Agent**:
A user agent is a unique identifier that helps Reddit determine the source of network requests. To use Reddit’s API, you need a unique and descriptive user agent. The recommended format is `<platform>:<app ID>:<version string> (by u/<Reddit username>)`. For example, `android:com.example.myredditapp:v1.2.3 (by u/kemitche)`.

Read more about user agents at [Reddit’s API wiki page](https://github.com/reddit-archive/reddit/wiki/API).

### **2. Create a Reddit App**

As mentioned, to access the Reddit API, you'll need to create an application on Reddit and obtain your API credentials. Follow these steps:

- Go to the [**Reddit**](https://www.reddit.com/) website and either [**sign up for an account**](https://www.reddit.com/register) or log in to your existing account. *Feel free to create a throwaway account for this project!*
- Navigate to the [**Reddit Apps page**](https://www.reddit.com/prefs/apps).
- Click the "are you a developer? create an app..." button in the top left.
- Provide a name for your app (e.g., "PRAW"), select the app type ('script') , and optionally add a description. Use http://localhost:8080 as your redirect URI.
- After submitting the form, you will reach a page that looks like the following image. You'll see your application's details, including the client ID and client secret. Keep these credentials handy for the next step.

![Praw](https://www.honchosearch.com/hubfs/Imported_Blog_Media/Client-ID-Client-Secret.png)

#### Read-only Reddit Instances

To create a read-only Reddit instance, you need three pieces of information:
* Client ID
* Client secret
* User agent

```python
# Creating a read-only Reddit instance
reddit = praw.Reddit(
    client_id="YOUR_CLIENT_ID",         # From your Reddit API application
    client_secret="YOUR_CLIENT_SECRET", # From your Reddit API application
    user_agent="YOUR_USER_AGENT")       # Identifies your script to Reddit's servers
```

**Your user agent is an identifier used by Reddit to identify the source of requests. You have some flexibility, but you'll want to choose something descriptive and unique, and it's recommended that your username is included. See the user agent bullet point under the prerequisites**

I have removed my own credentials from this workbook. We can show you how to hide your credentials before submitting the project! **The following code will need your own Reddit credentials in order to successfully work.**

### **3. Create a `.env` file to hide your Reddit credentials**

**Mac Users**
- Open your Terminal
- If not in your personal Project #3 folder already, navigate to that project folder using the `cd` command
- Create a **.env** ("env" is short for "environment" variables) file in your project folder using **touch .env** via the terminal
- That is the file's full name: **.env**
- The dot (.) in front of .env makes it a hidden file on Mac/Linux systems, just like .DS_Store or .git. You can still edit it with any text editor, but it won't be visible in Finder by default. This is why we use `ls -a` so that we can see hidden files.
- Paste in these variable names while replacing "YOUR_CLIENT_ID", "YOUR_CLIENT_SECRET", and "YOUR_USER_AGENT" with your actual client id, client secret, and user agent:
```python
client_id="YOUR_CLIENT_ID",         # From your Reddit API application
client_secret="YOUR_CLIENT_SECRET", # From your Reddit API application  
user_agent="YOUR_USER_AGENT"        # Identifies your script to Reddit's servers
```    

**Windows Users**

- Open Notepad or VSCode (`code .env` if VSCode is added to your system PATH) or any other text editor.
- Create a new file and paste in these variable names while replacing "YOUR_CLIENT_ID", "YOUR_CLIENT_SECRET", and "YOUR_USER_AGENT" with your actual client id, client secret, and user agent:
```python
client_id="YOUR_CLIENT_ID",         # From your Reddit API application
client_secret="YOUR_CLIENT_SECRET", # From your Reddit API application  
user_agent="YOUR_USER_AGENT"        # Identifies your script to Reddit's servers
```    
- When saving, follow these exact steps:
    - Click "Save As"
    - Change "Save as type" to "All Files (.)"
    - Name the file exactly as: .env
    - Select your project folder location
    - Click Save

**Important: Make sure to select "All Files" in the save type, otherwise Windows will add a hidden .txt extension, creating ".env.txt" instead of ".env"**

### **4. Initialize PRAW**
```python
reddit = praw.Reddit(
    client_id=os.getenv('REDDIT_CLIENT_ID'),
    client_secret=os.getenv('REDDIT_CLIENT_SECRET'),
    user_agent=os.getenv('REDDIT_USER_AGENT')
```

In [21]:
import pandas as pd
import praw
import os
from dotenv import load_dotenv
from datetime import datetime

# Attempts to load environment variables from .env file
# Returns True if successful, False if .env file is not found or there's an error
success = load_dotenv()

# Prints whether the environment variables were successfully loaded
# This can help with debugging if there are any issues with your 
# .env file configuration
print(f"Load successful: {success}")

# Initializing PRAW (Python Reddit API Wrapper)
reddit = praw.Reddit(
    client_id=os.getenv('REDDIT_CLIENT_ID'),
    client_secret=os.getenv('REDDIT_CLIENT_SECRET'),
    user_agent=os.getenv('REDDIT_USER_AGENT')
)

print(f"Read-only: {reddit.read_only}")

def fetch_subreddit_data(reddit, subreddit_name, post_limit=1000):
    subreddit = reddit.subreddit(subreddit_name)
    posts = []
    for post in subreddit.hot(limit=post_limit):
        # Convert Unix timestamp to datetime
        created_date = datetime.fromtimestamp(post.created)
        
        posts.append({
            "title": post.title,
            "score": post.score,
            "id": post.id,
            "url": post.url,
            "comms_num": post.num_comments,
            "created": created_date,  # Now returns datetime object
            "body": post.selftext
        })
    return pd.DataFrame(posts)

post_limit = 1000

# The `subreddit_list` variable will be assigned a list of your
# two chosen subreddits for Project #3
# Example: 
subreddit_list = ["Airbus","Boeing"]

#subreddit_list = 
for sub in subreddit_list:
    print(f'Pulling subreddit {sub}')
    data = fetch_subreddit_data(reddit, sub, post_limit)
    data.to_csv(f'data/subreddit_{sub}_data.csv', index=False)
    print(f"Fetched {len(data)} posts from {sub}")

Load successful: True
Read-only: True
Pulling subreddit Airbus
Fetched 810 posts from Airbus
Pulling subreddit Boeing
Fetched 511 posts from Boeing


### **5. Rejoice and bask in the sweet glory of Reddit!**

### Notes: 

- You can use the `created_utc` attribute of a post to keep track of the timestamp and ensure non-overlapping pulls. The `created_utc` attribute represents the post's creation time in UTC (Coordinated Universal Time).
- Example: Thailand follows UTC+07:00, which is 7 hours ahead of UTC.

### Last one:
- **Rather than working in this template notebook, make a brand new "scraping" notebook (or script in a .py file), with your own comments, so you can use this project in a portfolio!**