- File:
pushshift.ipynb
- Input: No
- Output: ..\github-data\ [focal_subr_name]\
subr_name_comments.csv
- Process: Scrap all comments and posts within 90 days of bot implementation
- File:
reddi_api.ipynb
- Input: No
- Output: No
- Process intended:
- Retrieve all deleted comments/posts
- Retrieve reactions for current comments/posts
- File:
datacleaning.ipynb
- Input: ..\data\ [focal_subr_name]\
[subr_name]_comments.csv
- Output: ..data\ [focal_subr_name]\
[subr_name]_clean_comments.csv
- Process:
- Change epoch time to human time
- Choose comments/posts within 30 days of implementation
- Choose only relevant variables from the scrapped data
- Comments:
- Posts:
- Report number of cases before, after, cleaned, left out.
- File:
perspectiveapi.ipynb
- Input: NA
- Output: NA
- Process: problem Need to get the right data types
- File:
detoxify.ipynb
- Input: ..\data\ [focal_subr_name]\
[subr_name]_clean.csv
- Output: ..\data\ [focal_subr_name]\
[subr_name]_res.csv
- Process:
- Get scores for each comments with detoxify model
- Flag if the comment is harassment based on threshhold
- File:
visual_RD.r
- Input: TBD
- Output: TBD
- Process: Visualizing Regression Discontinuity
- File:
BSTS.r
- Input: ..\data\ [focal_subr_name]\
[subr_name]_res.csv
- Output: visualization
- Process:
- Group comments and score by date
- Take average score for each date/ percentage of comments flagged as toxic in one day
- Construct BSTS
PRAW: Newest documentation
!!!Caution!!!: Install the latest version and follow the latest documentation
Perspective API : Sample Request | Installation Guide |
- Open Anaconda Prompt
- Script
\pip.exe install google-api-python-client
Perspective API only allows running single instances. Rate limit is 1 second/instance.