This notebook was used to access the OpenAI API to gather AI-generated responses to our reddit threads. Its split into two parts because I gathered the first chunk of my data (AskReddit and askscience) first and gathered my responses while I was working on gathering the second chunk of my reddit data. The second chunk is processed in the second half of this notebook.

In [1]:
import openai
import os
openai.api_key = os.getenv("OPENAI_API_KEY")

import time

import pandas as pd

In [2]:
df = pd.read_csv('../../data/cleaned_reddit_data/cleaned_ask_reddit_science.csv')

In [3]:
df.shape[0]

1782

In [4]:
# https://stackoverflow.com/a/49563326
n = 10  #chunk row size
list_df = [df[i:i+n] for i in range(0,df.shape[0],n)] 

In [5]:
list_df[-1] # This method preserves the last bit that doesn't fit in sets of 10

Unnamed: 0,subreddit,title,comment
1780,AskScience,"Is there a term for lake bottoms that ""hour gl...",I don’t know any specific term. A lake like yo...
1781,AskScience,When a military helicopter fires thousands of ...,"You don't really slow the rotor, changing its ..."


In [36]:
# responses = []
# for data in list_df:
#     prompts = ['Respond to the following post as a user of {}: {}'.format(sub, title) for sub, title in zip(data['subreddit'], data['title'])]
#     dv_responses = openai.Completion.create(
#         model = 'text-davinci-003',
#         prompt = prompts,
#         max_tokens = 500,
#         temperature = 0.6
#     )
#     responses.append(dv_responses)
#     time.sleep(20)

In [38]:
# Quick function to extract the responses only from each batch of calls
def cleaner(batch): 
    return [text['text'].strip() for text in batch.choices]

In [39]:
cleaned_responses = [cleaner(batch) for batch in responses]

In [41]:
from itertools import chain
# https://realpython.com/python-flatten-list/

In [44]:
flattened_responses = list(chain.from_iterable(cleaned_responses))

In [45]:
cleaned_responses[1][3] # checking that the list flattening worked correctly

"I think a good example of this is the story of Gavrilo Princip, the Bosnian Serb who assassinated Archduke Franz Ferdinand of Austria-Hungary and his wife in 1914. While he was labeled a terrorist and was seen as the bad guy in the story, his assassination was actually a catalyst for the liberation of the oppressed people of the Austro-Hungarian empire, and he's now seen as a hero in the Balkans."

In [47]:
flattened_responses[13]

"I think a good example of this is the story of Gavrilo Princip, the Bosnian Serb who assassinated Archduke Franz Ferdinand of Austria-Hungary and his wife in 1914. While he was labeled a terrorist and was seen as the bad guy in the story, his assassination was actually a catalyst for the liberation of the oppressed people of the Austro-Hungarian empire, and he's now seen as a hero in the Balkans."

In [48]:
df['ai_response'] = flattened_responses

In [49]:
df.head()

Unnamed: 0,subreddit,title,comment,ai_response
0,AskReddit,Would you support a mandatory retirement age o...,70 and as for president no one can run over 65...,"No, I don't think a mandatory retirement age o..."
1,AskReddit,Now that Reddit are killing 3rd party apps on ...,"Without this app, I will actually stop using R...",I'm not sure what the best alternatives to Red...
2,AskReddit,Bernie Sanders Says US Should Confiscate 100% ...,So maybe someone can try to explain this a bit...,I think Bernie Sanders' suggestion is extreme ...
3,AskReddit,"Richard Feynman said, “Never confuse education...",My wife's stepfather was a chemist who current...,I think a great example of this is people who ...
4,AskReddit,How much would you pay for a list of everyone ...,"$20. Realistically it would be interesting, bu...",I wouldn't pay anything for a list like that. ...


In [51]:
#df.to_csv('../../data/ai_responses/ask_reddit_science_final.csv', index = False)

In [52]:
# gatheredresponses for remaining subreddits
dat = pd.read_csv('../../data/cleaned_reddit_data/cleaned_remaining_subs.csv')

In [53]:
dat.shape

(3631, 3)

In [55]:
n = 20  # Doubling chunk size because the max prompts per call should be 20
list_dat = [dat[i:i+n] for i in range(0,dat.shape[0],n)] 

In [59]:
list_dat[-1].shape

(11, 3)

In [60]:
# responses = []
# for data in list_dat:
#     prompts = ['Respond to the following post as a user of {}: {}'.format(sub, title) for sub, title in zip(data['subreddit'], data['title'])]
#     dv_responses = openai.Completion.create(
#         model = 'text-davinci-003',
#         prompt = prompts,
#         max_tokens = 500,
#         temperature = 0.6
#     )
#     responses.append(dv_responses)
#     time.sleep(3) # I was low on credits so I had to upgrade and I think the current time limit is low enough sleep shouldn't be needed but just to play it safe I'll set 3 seconds

In [63]:
len(responses) == len(list_dat)

True

In [64]:
cleaned_responses = [cleaner(batch) for batch in responses]

In [65]:
flattened_responses = list(chain.from_iterable(cleaned_responses))

In [66]:
cleaned_responses[1][3]

'No, the Supreme Court justices did not reach out to the founding fathers to help interpret their intent when writing the Constitution while deliberating cases. The founding fathers were not brought in to give expert testimony. However, Supreme Court justices did rely on the writings of the founding fathers, such as the Federalist Papers, to help them interpret the intent of the Constitution. In addition, the justices would also look to other sources, such as legal treatises, for guidance on Constitutional interpretation.'

In [68]:
flattened_responses[23]

'No, the Supreme Court justices did not reach out to the founding fathers to help interpret their intent when writing the Constitution while deliberating cases. The founding fathers were not brought in to give expert testimony. However, Supreme Court justices did rely on the writings of the founding fathers, such as the Federalist Papers, to help them interpret the intent of the Constitution. In addition, the justices would also look to other sources, such as legal treatises, for guidance on Constitutional interpretation.'

In [69]:
dat['ai_response'] = flattened_responses

In [70]:
dat.head()

Unnamed: 0,subreddit,title,comment,ai_response
0,AskHistorians,Could people do backflips/front flips in ancie...,Some of the earliest images of people doing fl...,"Yes, absolutely! In fact, ancient people did a..."
1,AskHistorians,"How did the USA go from robust trade unions, a...","Unfortunately on taxation rates, the presumpti...",This is a complex question that requires a len...
2,AskHistorians,What pop history book has done the most damage...,"""The Gangs of New York"" is a very entertaining...",That's a tough question! I'm not sure which bo...
3,AskHistorians,How would a professional historian look for th...,# Gandalf’s Reference Desk Query\n\nA little r...,A professional historian would likely begin by...
4,AskHistorians,The aftermath of Kanye West’s antisemitic rhet...,Black Hebrew mythology originates from the sam...,This is an interesting question. It's hard to ...


In [72]:
dat['ai_response'][0]

'Yes, absolutely! In fact, ancient people did a variety of flips and acrobatic stunts. There are several examples of this in ancient artwork, such as the Greek vase paintings depicting men doing backflips and the ancient Egyptian frescoes that show acrobats performing front flips. Additionally, there is evidence that acrobatic performances were a popular form of entertainment in ancient societies.'

In [73]:
#dat.to_csv('../../data/ai_responses/remaining_subs_final.csv', index = False)