# Load API Key and OPENAI Package

Loads all of the packages that will be used and the OpenAI API Key that will be used. Store your OpenAI API Key in a .env file.
If you are missing any of these packages run the following commands
```
$   pip3 install PyPDF2
$   pip3 install python.dotenv
$   pip3 install openai
```

In [15]:
import os
import openai
import PyPDF2
from dotenv import load_dotenv

load_dotenv()

True

In [16]:
openai.organization = os.getenv("ORGANIZATION_KEY")
openai.api_key = os.getenv("OPENAI_API_KEY")

In [14]:
# Sample API Call
# response = openai.Completion.create(
#     engine="text-davinci-002",  # Specify the engine (model) you want to use
#     prompt="Translate the following English text to French: 'Hello, how are you?'",
#     max_tokens=50,  # Limit the length of the generated text
# )

# Lesson Proposals

Lesson Proposals will be extracted using the PyPDF2 Package. You must have the lessons downloaded locally for this code to work and place it inside of a 
Lesson_Proposal folder. It will extract all of the text and put it into the <i style="color: red"><b>proposal_text</b></i> variable.

## Retrieving Lesson Proposals

In [17]:
# Stores the path of All the PDFs in the Lesson_Proposals Folder
pdf_file_paths = []

proposal_list = os.listdir("./Lesson_Proposals/")

# Extracts all the PDF Paths
for i in proposal_list:
    if i == ".DS_Store":
        continue
    pdf_file_paths.append(f"Lesson_Proposals/{i}")

In [27]:
# print(pdf_file_paths)

['Lesson_Proposals/114233851983_A_Path_to_Open_Inclusive_and_Collaborative_Science_for_Librarians.pdf', 'Lesson_Proposals/114233826722_Open_Hardware_for_librarians.pdf', 'Lesson_Proposals/114219657654_Data_Management_and_Sharing_Plans_for_Librarians_101.pdf', 'Lesson_Proposals/114229483598_Research Community Outreach with Open Science Team Agreements_Open_Science_Team_Agreements-Lessons_for_Librarians_in_Open_Science_Proposal.pdf', 'Lesson_Proposals/114232854610_Understanding_CARE_Principles_for_research_data.pdf', 'Lesson_Proposals/114233727582_Reproducible_research_workflows_2023_01_31.pdf', 'Lesson_Proposals/114205095243_Open_Qualitative_Research.pdf']


In [18]:
# EXTRACT TEXT FROM EACH PROPOSAL

lesson_proposal = []

for i in pdf_file_paths:
    pdf_file = open(i, 'rb')
    pdf_reader = PyPDF2.PdfReader(pdf_file)

    text = ''
    for page_num in range(len(pdf_reader.pages)):
        page = pdf_reader.pages[page_num]
        text += page.extract_text()
    
    pdf_file.close()
    lesson_proposal.append(text)


## Cleaning Proposal Data

# Prompting GPT


OpenAI's API currently doesn't support a continuous chat as far as im concerned. There are a lot of repositories that address this problem but I think the solution I will be going with is a long continuous chain, starting with providing a rubric to score similiarities, providing an output format, then ultimately providing all of the proposals. We can test this with the online version GPT prior to using the API to see if it generates the result we want.

## Creating the Prompt

### Opening a Chat Log

In [None]:
# class Chat:

### Providing a Rubric

You can find the Rubric that was used to score similarity [here](https://docs.google.com/document/d/1x18mVubT2H4Gj_GvDM3nUupCqYQHaH8Qk8UleAQjPyU/edit).

In [19]:
# Telling GPT to use this Rubric when Grading the Proposals

rubric_message = """
The rubric will be displayed in the following format:
Measurement:
    Point Value. Description for that point value 

Keywords
    1. Keywords are completely unrelated to each other and have no correlation.
    2. Keywords can be seen in the same context in the english language
    3. They contain one or two similar/same keywords
    4. They contain very similar keywords but the context used throughout the proposal may be a little different 
    5. They have the same keywords with the addition/subtraction of one or more keywords
Title
    1. Titles have no correlation or do not cover similar topics
    2. Titles cover the same branch of science
    3. Titles contain similar words and cover the same topic
Lesson Objectives
    1. <30% of the lesson objectives align with each other
    2. 30-50% of the lesson objectives align with each other
    3.  50-70% of the lesson objectives align with each other
    4. 70-90% of the lesson objectives align with each other
    5. >90% of the lesson objectives align with each other
Lesson Audience
    1. Audience are completely different
    2. Audience can be grouped into a similar room
    3. Audience may be interested in the same subject
    4. The lesson is heavily ingrained in both audience’s work/field
    5. Teaching to the same audience
Description
    1. Descriptions are not similar with any context
    2. Descriptions have similar words
    3. Same concept is mentioned but different context surrounding the lesson creation
    4. Same concept is mentioned as well as similar context for the creation of the lesson
    5. Both lessons describe the same problem and issue that they plan to tackle
"""

In [None]:
# print(rubric_message)

### Telling GPT How to Output the Results

In [20]:
format_message = """
Please respond in the following format: 

1. Proposal 1 Title
    Most Similar: (Proposal Most Similar to)
        3-5 Sentences of Context comparing the 2 proposals
        Rubric Score:
            ...
    Least Similar: (Proposal Least Similar to)
        3-5 Sentences of Content comparing the 2 proposals
        Rubric Score:
            ...
2. Proposal 2 Title
    Same format as above
3. The same format until all proposals have been considered
"""


### Combining all of the messages

We will take all of the different messages created above and combine them together into one singular prompt that we can send to GPT.
It will HOPEFULLY output an appropriate response.

In [23]:
prompt = f"""You are going to receive several lesson proposals and your job is to compare them to find proposals that are most similar to each other. 
You will have a total of 7 proposals, A, B, C, etc. You will compare the proposals and see which is the most similar and least similar to each individual 
proposal. Each proposal will go through an individual comparison against each other. You will use the rubric that will be provided as a way to score
the similarities between each proposal. This score will be used in your response. Before analyzing and comparing the proposals, identify and separate each 
proposal into Title, Keypoints, Lesson Objectives, Lesson Audience, and Description. Then use the newly partitioned proposals in your analysis.

The Rubric:
{rubric_message}

{format_message}

Here are the following proposals

Proposal 1:
{lesson_proposal[0]}

Proposal 2:
{lesson_proposal[1]}

Proposal 3:
{lesson_proposal[2]}

Proposal 4:
{lesson_proposal[3]}

Proposal 5:
{lesson_proposal[4]}

Proposal 6:
{lesson_proposal[5]}

Proposal 7:
{lesson_proposal[6]}

"""

In [None]:
# print(prompt)

## Sending the Prompt

### Sending API Call

In [27]:
# Using OPENAI's API to prompt gpt-4 to analyze the text
response = openai.ChatCompletion.create(model="gpt-4", messages=[{"role": "user", "content": prompt}])

In [None]:
# print(response)

### Extracting the Message

The following code will create a new file/overwrite the "GPT_Analysis.txt" with GPT's text analysis of the lesson proposals.

In [30]:
# print(response['choices'][0]['text'])
GPT_Analysis = open("GPT_Analysis.txt", 'w')
GPT_Analysis.write(response.choices[0].message.content)
GPT_Analysis.close()