# Exploring GPT-3

In this homework assignment we will walk you through how to use GPT-3 a large pre-trained neural language model developed by OpenAI.  

You will learn about the following topics:
* Prompts and completions.  You should observe that the the quality of the text generated is high quality, but not necessarially factually accurate.
* Probabilities.  You'll learn how to inspect probabilities assigned to words in the model's output.
* Few shot learning.  We'll see an example of few-shot learning with a small handful of examples.
* Zero shot learning.  We will explore the zero-shot capabilities of pre-trained LMs.  You'll design zero-shot prompts for
1. summarization
2. question-answering
3. simplification
4. translation
* How to fine tune a model.  You will learn how to fine-tune GPT-3 to take a Wikipedia infobox as input and generate the text of a biography as its ouput.  You'll then write your own code to do the reverse task – given a biography, extract the  attributes and values in the style of a Wikipedia infobox. 



# Prompt Completion

As a warm-up we'll have you play with [the OpenAI Playground](https://beta.openai.com/playground).  Try inputting this prompt:

> One of my favorite professors at the University of Pennsylvania is 

And the click the "Submit" button to generate a completion.

Copy and paste the text below (including your prompt). 

You might notice that the text that GPT-3 generates ends mid-sentence.  GPT-3 will generate text until it either generates a special "stop sequence" token `<|endoftext|>`, or it outputs the number of tokens specified by the `maximum length` variable. 
You can press Submit again to have it continue generatin, or you can increase the max length variable in the sliderbar on the right.

In [None]:
favorite_professor_completion_1 = """
One of my favorite professors at the University of Pennsylvania is 

Dr. Robert Kurzban. He teaches courses in evolutionary psychology, and his classes are always thought-provoking and engaging. I've learned a lot from him, and I'm grateful for the opportunity to have had him as a professor.
"""

GPT-3 generates fluent text, but it is not always grounded in fact.  Let's do a Google search for the person that GPT-3 generated as your favorite professor and check
* Are they actually a professor?
* Where do they work?

In [None]:
# Extract the professor's name
professor_name_1 = "Dr. Robert Kurzban"

# Do a Google search and answer these questions
actually_a_professor_1 = True

# Insitituion where they work
instituion_1 = "University of Pennsylvania (He was fired for 'inappropriate relationships with students')"

When it generates its completions, GPT-3 generates each new word/token according to its probability distribution.  It draws each word at random in proportion to its propability.  That randomness means that it can generate different completions. You can re-generate and get different completions each time.

Generate another 4 completions for the professor prompt:

> One of my favorite professors at the University of Pennsylvania is 

and do Google searches for them.

*Tip: You can generate another response with the Regenerate button to the right of the Submit button.  The Regenerate button has a recycle symbol on it.*

In [None]:
favorite_professor_completion_2 = """
Dr. David Danks. He is a professor in the Department of Philosophy and the Cognitive Science Program. He specializes in philosophy of science, artificial intelligence, and cognitive science. He is an incredibly engaging and inspiring teacher, and he is always willing to help students understand the material and think critically about the topics. His classes are always interesting and thought-provoking, and he encourages discussion and debate among his students. He has also been a great mentor to me and has provided me with invaluable advice and guidance throughout my time at Penn.
"""

favorite_professor_completion_3 = """
Dr. Zibo Xu. He is an Associate Professor in the Department of Materials Science and Engineering. Dr. Xu's teaching style is engaging and his enthusiasm for his research and the material he is teaching is contagious. He is an expert in the field of materials science and engineering and his research focuses on the development of new materials and their applications. In addition to his expertise, he is an incredibly kind person who is always willing to help his students and provide an encouraging and supportive environment.
"""

favorite_professor_completion_4 = """
Dr. Evelyn Alsultany. She is an Associate Professor in the Department of American studies. Dr. Alsultany is an inspiring and engaging professor who teaches courses on race, gender, and ethnicity in the United States. She is passionate about her subject matter and is an excellent and knowledgeable instructor. She is always willing to help her students and encourages them to think critically about the topics they are studying. Dr. Alsultany is also very personable and creates a comfortable and welcoming learning environment. I am privileged to have had the opportunity to learn from her and I highly recommend her to any student looking for an engaging and educational experience.
"""

favorite_professor_completion_5 = """
Professor Richard W. Smith. Professor Smith is an incredible professor and a great mentor. He is an expert in American History, and he has a passion for teaching and for helping students succeed. He is always open to helping students understand course material and is available for any questions or concerns that students may have. He also encourages students to think critically and to engage in meaningful class discussions. He is a great example of how a professor can truly make a difference in the lives of their students.
"""

# Do a Google search for these professors

professor_name_2 = "David Danks"
actually_a_professor_2 = True
instituion_2 = "University of California, San Diego"

professor_name_3 = "Zibo Xu"
actually_a_professor_3 = True
instituion_3 = "ESD, SUTD, Singapore"

professor_name_4 = "Evelyn Alsultany"
actually_a_professor_4 = True
instituion_4 = "University of Southern California"

professor_name_5 = "Richard W. Smith"
actually_a_professor_5 = True
instituion_5 = "Ohio Wesleyan University"

## Probabilities

Just like with the n-gram language models that we stuided earlier in the course, neural language models like GPT-3 assign probabilities to each token in a sequence.  

In the playground, you can see the probabilities for the top-5 words predicted at each position by choosing the `Full Spectrum` option from the `Show probabilities` dropdown menu in the controls.  Try selecting that option and then generate a completion for the prompt

> My favorite class in the Computer Science Department was taught by Professor

If you mouse over the word after professor, you'll see something like this:
```
Joe = 8.21%
John = 4.25%
Nancy = 2.27%
David = 2.09%
Barbara = 2.05%
Total: -2.50 logprob on 1 tokens
(18.87% probability covered in top 5 logits
```

One critical observation about language models is that they often encode societal biases that appear in their data.  For instance, after the disovery that LM embeddings could be used to solve word analogy problems like "**man** is to **woman** as **king** is to ___" (the model predicts **queen**), researchers discovered that LMs had a surpisingly sexist answer to the analogy problem  "**man** is to **woman** as **computer programmer** is to ___" (the model predicts **homemaker**).  These kinds of biases are prevelant and pernicious. 

Let's examine the most probable names that GPT3 assigns to different completions and analyze their gender.  We'll see if it associates different genders with different academic disciplines.  (You can also see this for different careers like *nurse*, *plumber*, or *school teacher*).

Please create dictionaries mapping GPT's predictions for the first names of professors in these departmemnts
* Computer Science
* Gender Studies
* Physics
* Linguisticss
* Bioengineering
Use the prompt:
> My favorite class in the {deparment_name} Department was taught by Professor

**Note: you can also add a stop sequence of `.` to get the model to complete only a single sentence.**



In [None]:

# Classify each name as male, female, partial word, or unknown
computer_science_genders = {
  "Joe" : "male",
  "John" : "male",
  "Nancy" : "female",
  "David" : "male",
  "Barbara" : "female",
}

gender_studies_genders = {
  "Rosemary" : "female",
  "Laura" : "female",
  "Dana" : "female",
  "Judith" : "female",
  "Anne" : "female",
}

physics_genders = {
  "Arin" : "male",
  "James" : "male",
  "David" : "male",
  "John" : "male",
  "Michael" : "male",

}

lingusitics_genders = {
  "Donald" : "male",
  "John" : "male",
  "James" : "male",
  "David" : "male",
  "Harry" : "male",

}

bioengineering_genders = {
  "Vijay" : "male",
  "John" : "male",
  "David" : "male",
  "Chang" : "male",
}

(If you wanted to systematically explore the predictions of the model, you could use the API's logprobs argument to return the the log probabilities on the logprobs most likely tokens, as well the chosen tokens.)

# Few Shot Learning

One of the remarkable properties of large language models is a consequence of the fact that they have been trained on so much language data.  They encode that training data as background information that lets them learn new tasks and to generalize patterns using only a few examples.  This is called "Few shot learning".

Here is an example.  Imagine that we want to build a system that allows a student to say something they want to learn, and the system will recommend the subject for them to study.  Here are examples of inputs and outputs to our program:

```
how to program in Python - computer science
factors leading up to WW2 - history
branches of government - political science
Shakespeare's plays - English
cellular respiration - biology
respiratory disease - medical
how to sculpt - art
```

We can use these 7 examples (and probably fewer!) as a prompt to GPT-3, and it will perform few shot learning by figuring out what our pattern is, and being able to perform the task for new inputs.

Try pasting those examples into the Playground, and then listing out a few subjects to see what is output. 

```
cellular respiration
respiratory disease
how to play saxophone
autonomic system
how write a screenplay
perform in a play
stock market
planetary orbits
relativity
```



Fill in the dictionary below using the playground by replacing the TODOs with the model's predictions. 

In [None]:
few_shot_subject_classification_results = {
  "cellular respiration" : "Biology",
  "respiratory disease" : "Medical",
  "how to play saxophone" : "Music",
  "autonomic system" : "Biology",
  "how write a screenplay" : "Creative Writing",
  "perform in a play" : "Theater",
  "stock market" : "Economics",
  "planetary orbits" : "Astronomy",
  "relativity" : "Physics",
}

## Using the API

Now let's take a look at how to call the OpenAI API from our code, so that we don't have to manually enter inputs into the Playground.  

If you click on the "View code" button on the playground, you'll see a sample of code for whatever prompt you have.  For example, here's the code that we have for our few-shot learning that generates a subject to study for a topic that someone is interested in:

```python
import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.Completion.create(
  model="text-davinci-002",
  prompt="how to program in Python - computer science\nfactors leading up to WW2 - history\nbranches of government - political science\nShakespeare's plays - English\ncellular respiration - biology\nrespiratory disease - medical\nhow to sculpt - art",
  temperature=0.7,
  max_tokens=256,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)
```
This is python code, so it'll be pretty easy for us to use this as a starting point and to modify it to create a function that we can call.


First, you'll need install the OpenAPI via pip.  You can use pip and other Unix command in a colab notebook by prefixing them with an exclamation point.  (The `%%capture` command before that just surpresses the output of running the Unix command.  You can remove it if you want to see the progress of the command).


In [None]:
%%capture
!pip install openai

Next, you will enter your secret key for the OpenAI API, then you can find your OpenAI API key [here](https://beta.openai.com/account/api-keys).  

We will enter it as a password, so that the raw text of it doesn't get saved in your Python notebook and you accidentally make your notebook public.  That would be bad because then other people could use your key and have you pay for their usage.

In [None]:
from getpass import getpass
import os

print('Enter OpenAI API key:')
openai_api_key = getpass()

os.environ['OPENAI_API_KEY']=openai_api_key

Enter OpenAI API key:
··········


Now let's write a function that takes a topic as input and then outputs a subject to study if you want to learn about that topic.

In [None]:
import openai
import os
import time

def generate_subject_few_shot(topic):
  few_shot_prompt = """how to program in Python - computer science
factors leading up to WW2 - history
branches of government - political science
Shakespeare's plays - English
cellular respiration - biology
respiratory disease - medical
how to sculpt - art
"""

  response = openai.Completion.create(
      model="text-davinci-002",
      prompt=few_shot_prompt + topic + " - ", # We'll append our topic and a dash to the end of the few shot prompt.
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      stop=["\n"]
  )
  # I recommend putting a short wait after each call, 
  # since the rate limit for the platform is 60 requests/min.
  # (This increases to 3000 requests/min after you've been using the platform for 2 days).
  time.sleep(1)

  # the response from OpenAI's API is a JSON object that contains 
  # the completion to your prompt plus some other information.  Here's how to access
  # just the text of the completion. 
  return response['choices'][0]['text'].strip()

topic = "cellular respiration"
generate_subject_few_shot(topic)

'biology'

That's it!  That's an exampe of how to write a function call to the OpenAI API in order for it to output a subject for a topic. 

Here is some information about the different arguments that we to the `openai.Completion.create` call:
 * `model` – OpenAI offers four different sized versionf of the GPT-3 model: davinci, currie, babbage and ada.  Davinci has the largest number of parameters and is [the most expensive to run](https://openai.com/api/pricing/).  Ada has the fewest parameters, is the fastest to run and is the least expensive. 
 * `prompt` - this is the prompt that the model will generate a completion for
 * `temperature` - controls how much of the probability distribution the model will use when it is generating each token. 1.0 means that it samples from the complete probability distrubiton, 0.7 means that it drops the bottom 30% of the least likely tokens when it is sampling. 0.0 means that it will perform deterministically and always output the single most probable token for each context. 
 * `top_p` - is an alternative way of controling the sampling. 
 * `frequency_penalty` and `presence_penalty` are two ways of reduing the model from repeating the same words in one output.  You can set these to be >0 if you're seeing a lot of repetition in your output. 
 * `max_tokens` is the maximum length in tokens that will be output by calling the function.  A token is a subword unit.  There are roughly 2 or 3 tokens per word on average.
 * `stop` is a list of stop sequences.  The model will stop generating output once it generates one of these strings, even if it hasn't reached the max token length. By default this is set to a special token `<|endoftext|>`.

You can read more about [the Completion API call in the documentation](https://beta.openai.com/docs/api-reference/completions).

# Zero shot learning

In addition to few shot learning, GPT-3 can sometimes also perform "zero shot learning" where instead of giving it several examples of what we want it to do, we can instead give it instructions of what we want it to do.

For example, for our topic - subject task we could give GPT-3 the prompt

> Given a topic, output the subject that a student should study if they want to know more about that topic.

Then if we append 
> cellular respiration -

GPT3 will output biology.

Try to adapt the `generate_subject_few_shot` function to do a zero-shot version.

In [None]:
def generate_subject_zero_shot(topic):
  zero_shot_prompt = """Given a topic, output the subject that a student should study if they want to know more about that topic."""

  response = openai.Completion.create(
      model="text-davinci-002",
      prompt=zero_shot_prompt + topic + " - ", # We'll append our topic and a dash to the end of the few shot prompt.
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      stop=["\n"]
  )
  # I recommend putting a short wait after each call, 
  # since the rate limit for the platform is 60 requests/min.
  # (This increases to 3000 requests/min after you've been using the platform for 2 days).
  time.sleep(1)

  # the response from OpenAI's API is a JSON object that contains 
  # the completion to your prompt plus some other information.  Here's how to access
  # just the text of the completion. 
  return response['choices'][0]['text'].strip()

topic = "cellular respiration"
generate_subject_few_shot(topic)

'biology'

A very cool recent finding is that training proceedure for large language models can be changed to improve this instruction following behavior.  If large LMs are [trained to do multiple tasks through prompting](https://arxiv.org/abs/2110.08207), they better generalize to complete new tasks in a zero-shot fashion.  The current version of GPT3 (text-davinci-2) uses this kind of training.

Try writing zero-shot prompts to do the following tasks:
1. Summarize a Wikipedia article.
2. Answer questions about an article.
3. Re-write an article so that it's suitable for a young child who is just learning how to read (age 8 or so).
4. Translate an article from Russian into English.

You should experiment with a few prompts in the playground to find a good prompt that seems to work well.

In [None]:
def summarize(article_text):
  zero_shot_prompt = """Provide a brief summary (no more than 2 sentences) of the above Wikipedia article: 
  
  """

  response = openai.Completion.create(
      model="text-davinci-002",
      prompt=article_text + '\n' + zero_shot_prompt, # We'll append our topic and a dash to the end of the few shot prompt.
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      # stop=["\n"]
  )
  # I recommend putting a short wait after each call, 
  # since the rate limit for the platform is 60 requests/min.
  # (This increases to 3000 requests/min after you've been using the platform for 2 days).
  time.sleep(1)

  # the response from OpenAI's API is a JSON object that contains 
  # the completion to your prompt plus some other information.  Here's how to access
  # just the text of the completion. 
  
  summary = response['choices'][0]['text'].strip() 
  return summary

def answer_question(article_text, question):

  response = openai.Completion.create(
      model="text-davinci-002",
      prompt=article_text + '\n\n\nAnswer this question about above article' + question, # We'll append our topic and a dash to the end of the few shot prompt.
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      # stop=["\n"]
  )
  # I recommend putting a short wait after each call, 
  # since the rate limit for the platform is 60 requests/min.
  # (This increases to 3000 requests/min after you've been using the platform for 2 days).
  time.sleep(1)

  # the response from OpenAI's API is a JSON object that contains 
  # the completion to your prompt plus some other information.  Here's how to access
  # just the text of the completion. 
  
  answer = response['choices'][0]['text'].strip() 
  return answer

def simplify(article_text):
  # TODO - write a function to re-write an article so that it's suitable for a young child.
  response = openai.Completion.create(
      model="text-davinci-002",
      prompt=article_text + '\n\n\n Rewrite the above article so it is suitable for a young child', # We'll append our topic and a dash to the end of the few shot prompt.
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      # stop=["\n"]
  )
  time.sleep(1)
  simplified_article = response['choices'][0]['text'].strip() 
  return simplified_article

def translate(article_text, source_language, target_language):
    # TODO - write a function to translate an article from a source language to a target language.
  thecommand = "Translate the above article from " + source_language + " to " + target_language
  response = openai.Completion.create(
      model="text-davinci-002",
      prompt=article_text + '\n\n\n' + thecommand, # We'll append our topic and a dash to the end of the few shot prompt.
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      # stop=["\n"]
  )
  time.sleep(1)
  simplified_article = response['choices'][0]['text'].strip() 
  return simplified_article

Show your outputs in your prompts.  The colab notebook that you turn in should have these outputs for the TAs and professor to review.

In [None]:
article_text = """
China Construction Bank Corporation (CCB) is one of the "big four" banks in China. In 2015, CCB was the 2nd largest bank in the world by market capitalization and 6th largest company in the world.[3][4] The bank has approximately 13,629 domestic branches. In addition, it maintains overseas branches in Barcelona, Frankfurt, Luxembourg, Hong Kong, Johannesburg, New York City, Seoul, Singapore, Tokyo, Melbourne, Kuala Lumpur, Santiago de Chile, Brisbane, Sydney and Auckland, and a wholly owned subsidiary in London. Its total assets reached CN¥ 8.7 trillion in 2009,[5] and it is considered a systemically important bank by the Financial Stability Board. Its headquarters is in Xicheng District, Beijing.[6]
"""

summarize(article_text)

'China Construction Bank Corporation is one of the "big four" banks in China and is the 2nd largest bank in the world by market capitalization. The bank has approximately 13,629 domestic branches and maintains overseas branches in Barcelona, Frankfurt, Luxembourg, Hong Kong, Johannesburg, New York City, Seoul, Singapore, Tokyo, Melbourne, Kuala Lumpur, Santiago de Chile, Brisbane, Sydney and Auckland.'

In [None]:
article_text = """
The RJR Nabisco leveraged buyout was, at the time, widely considered to be the preeminent example of corporate and executive greed. Bryan Burrough and John Helyar published Barbarians at the Gate: The Fall of RJR Nabisco, a successful book about the events which was later turned into a television movie for HBO.

Ross Johnson was the President and CEO of RJR Nabisco at the time of the leveraged buyout and Henry Kravis was the managing partner at Kohlberg Kravis Roberts & Co. The leveraged buyout was in the amount of $25 billion, and the battle for control took place between October and November 1988.

Although KKR eventually took control of RJR Nabisco, RJR management and Shearson Lehman Hutton had originally announced that they would take RJR Nabisco private at $75 per share. A fierce series of negotiations and proposals ensued which involved nearly all of the major private equity players of the day, including Morgan Stanley, Goldman Sachs, Salomon Brothers, First Boston, Wasserstein Perella & Co., Forstmann Little, Shearson Lehman Hutton, and Merrill Lynch. Once put in play by Shearson Lehman Hutton and RJR management, almost every major Wall Street firm involved in M&A launched frenzied, literal last-minute bids in a fog of incomplete or misleading information.

KKR quickly introduced a tender offer to obtain RJR Nabisco for $90 per share—a price that enabled it to proceed without the approval of RJR Nabisco's management. RJR's management team, working with Shearson Lehman Hutton and Salomon Brothers, submitted a bid of $112, a figure they felt certain would enable it to outflank any response by Kravis. KKR's final bid of $109, while a lower dollar figure, was ultimately accepted by the board of directors. It was accepted because KKR's offer was guaranteed whereas management's lacked a "reset", meaning that the final share price might have been lower than their professed $112 per share. Additionally, many in RJR's board of directors had grown concerned at recent disclosures of Johnson's unprecedented golden parachute deal. Time Magazine featured Johnson on the cover of its December 1988 issue along with the headline "A Game of Greed: This man could pocket $100 million from the largest corporate takeover in history. Has the buyout craze gone too far?".[8]

KKR's offer was welcomed by the board, and, to some observers, it appeared that their elevation of the reset issue as a deal-breaker in KKR's favor was little more than an excuse to reject Johnson's higher payout of $112 per share.[9] Johnson received compensation worth more than $60 million from the buyout, then left in February 1989. In March 1989, Louis V. Gerstner of American Express became the new head of RJR Nabisco.[7]

"""
questions = [
    "Who acquired RJR Nabisco?",
    "Who were the primary bidders for the company?",
    "Who was Ross Johnson?",
    "Who was the CEO of American Express and running the Shearson Lehman bid?",
    "What can we learn from the RJR Nabisco LBO?",
]

for question in questions:
  answer = answer_question(article_text, question)
  print(question)
  print(answer)
  print('---')


Who acquired RJR Nabisco?
KKR acquired RJR Nabisco.
---
Who were the primary bidders for the company?
A. KKR, Morgan Stanley, Goldman Sachs, Salomon Brothers, First Boston, Wasserstein Perella & Co., Forstmann Little, Shearson Lehman Hutton, and Merrill Lynch

B. KKR and RJR management

C. Shearson Lehman Hutton and RJR management

D. KKR, Shearson Lehman Hutton, and RJR management

E. KKR and RJR Nabisco board of directors
---
Who was Ross Johnson?
The President and CEO of RJR Nabisco at the time of the leveraged buyout.
---
Who was the CEO of American Express and running the Shearson Lehman bid?
The CEO of American Express at the time was Louis V. Gerstner.
---
What can we learn from the RJR Nabisco LBO?
The RJR Nabisco leveraged buyout was, at the time, widely considered to be the preeminent example of corporate and executive greed. Bryan Burrough and John Helyar published Barbarians at the Gate: The Fall of RJR Nabisco, a successful book about the events which was later turned into

In [None]:
article_text = """
Theoretical results in machine learning mainly deal with a type of inductive learning called supervised learning. In supervised learning, an algorithm is given samples that are labeled in some useful way. For example, the samples might be descriptions of mushrooms, and the labels could be whether or not the mushrooms are edible. The algorithm takes these previously labeled samples and uses them to induce a classifier. This classifier is a function that assigns labels to samples, including samples that have not been seen previously by the algorithm. The goal of the supervised learning algorithm is to optimize some measure of performance such as minimizing the number of mistakes made on new samples.

In addition to performance bounds, computational learning theory studies the time complexity and feasibility of learning.[citation needed] In computational learning theory, a computation is considered feasible if it can be done in polynomial time.[citation needed] There are two kinds of time complexity results:

Positive results – Showing that a certain class of functions is learnable in polynomial time.
Negative results – Showing that certain classes cannot be learned in polynomial time.
Negative results often rely on commonly believed, but yet unproven assumptions,[citation needed] such as:

Computational complexity – P ≠ NP (the P versus NP problem);
Cryptographic – One-way functions exist.
There are several different approaches to computational learning theory based on making different assumptions about the inference principles used to generalise from limited data. This includes different definitions of probability (see frequency probability, Bayesian probability) and different assumptions on the generation of samples.[citation needed] The different approaches include:

Exact learning, proposed by Dana Angluin[citation needed];
Probably approximately correct learning (PAC learning), proposed by Leslie Valiant;[2]
VC theory, proposed by Vladimir Vapnik and Alexey Chervonenkis;[3]
Bayesian inference[citation needed];
Algorithmic learning theory, from the work of E. Mark Gold;[4]
Online machine learning, from the work of Nick Littlestone[citation needed].
While its primary goal is to understand learning abstractly, computational learning theory has led to the development of practical algorithms. For example, PAC theory inspired boosting, VC theory led to support vector machines, and Bayesian inference led to belief networks.
"""

simplify(article_text)

'In machine learning, there are two main types of learning: supervised and unsupervised. Supervised learning is where an algorithm is given samples that are labeled in some way. For example, the samples might be descriptions of mushrooms, and the labels could be whether or not the mushrooms are edible. The algorithm takes these previously labeled samples and uses them to induce a classifier. This classifier is a function that assigns labels to samples, including samples that have not been seen previously by the algorithm. The goal of the supervised learning algorithm is to minimize the number of mistakes made on new samples.\n\nIn unsupervised learning, the algorithm is not given any labels. It has to find structure in the data itself. For example, it might be given a set of images and have to find objects in the images.\n\nComputational learning theory is a branch of machine learning that deals with the time complexity and feasibility of learning. In computational learning theory, a c

In [None]:
russian_article = """
Собака среднего роста, с крепким костяком и хорошо развитой мускулатурой, несколько растянутого, с индексом 104—109, формата, половой диморфизм выражен слабо. К важным пропорциям можно отнести длину корпуса, превышающую высоту в холке на 4—9 %; глубину груди, чуть меньшую половины высоты в холке; приблизительно равную длину морды и черепа. Высота в холке кобелей 56—65 см, сук 53—62 см[1].

Голова массивная, широкая в лобной части, скулы хорошо выражены. При осмотре сверху по форме приближена к равностороннему треугольнику. Череп в лобной части широкий, скулы, надбровные дуги и затылочный бугор хорошо выражены. Длина черепной части примерно равна ширине, переход ото лба к морде чёткий, но не резкий. Профиль морды клинообразный, притуплённый, её линия параллельна линии лба. Губы плотно прилегают, мочка носа крупная, чёрная, у собак светлых окрасов может быть осветлённая, у коричневых — коричневая. Зубы крупные, белые, прикус ножницеобразный, иногда прямой. Глаза овальные, косо посаженные, от тёмно-коричневого до светло-коричневого цвета. Уши стоячие, широко поставлены, относительно небольшие, очень подвижные, часто развешенные, по форме приближающиеся к равностороннему треугольнику. Возможно лёгкое закругление кончиков. Ушные раковины чуть направлены вперёд, объёмистые, покрыты шерстью[1].

Шея массивная, средней длины, поставлена под углом 40—45° к линии спины. Грудь широкая, длинная, овальная в сечении, её нижняя линия не ниже локтя. Холка средней длины, незначительно выступающая над линией прямой, широкой, крепкой, мускулистой спины. Поясница широкая, крепкая, мускулистая, слегка выпуклая; круп длинный, широкий, мускулистый, слегка покатый; живот умеренно подтянут[1].
"""

source_language = "Russian"
target_language = "English"
translate(russian_article, source_language, target_language)

':\n\n\n\nThe Anatolian Shepherd is a medium sized dog with a strong skeleton and well developed muscles, slightly stretched, with an index of 104-109, format, sexual dimorphism expressed weakly. Important proportions include the length of the body exceeding the height at withers by 4-9%; the depth of the chest, slightly less than half the height at withers; approximately equal to the length of the muzzle and the head. Height at withers of males 56-65 cm, females 53-62 cm [1].\n\nThe head is massive, wide in the frontal part, the zygomatic bones are well expressed. When viewed from above, it is close to an equilateral triangle in shape. The skull is wide in the frontal part, the zygomatic bones, the supraorbital arches and the occipital tubercle are well expressed. The length of the cranial part is approximately equal to the width, the transition from the forehead to the muzzle is clear but not sharp. The profile of the muzzle is wedge-shaped, blunt, its line is parallel to the line of

## TODO - Pick your own task

For this section you should pick some task that you'd like to have GPT3 do.  Add a description and code to your notebook here.  You should:
1. Write a short description of what task you tried, why you were interested in it.
2. Give some code so that we can reproduce what you did via an Open API call.  You should include output of your code in the Python Notebook that you turned in.
3. Write a short qualitative analysis of whether or not GPT3 did the task well. 

#1 Description
Our goal is to see if our new knowledge of GPT-3 would have enabled us to get an A our our CIS521 homeworks. Here we will create a method that takes as input a computer science theory/algorithms question and outputs python code that implements that algorithm. We generalize to any programming language.

We were interested in this solution because it would improve our homework speed.

In [None]:
def solve_cs_problem(problem, language):
  thecommand = "Write code in the " + language + " programming language to solve the above CS problem"
  response = openai.Completion.create(
      model="text-davinci-002",
      prompt=problem + '\n\n' + thecommand, # We'll append our topic and a dash to the end of the few shot prompt.
      temperature=0.7,
      max_tokens=3500,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      # stop=["\n"]
  )
  time.sleep(1)
  code_solution = response['choices'][0]['text'].strip() 
  return code_solution

In [None]:
problem = """implement the AC-3 constraint satisfaction algorithm for Sudoku, along with two extensions that will combine to form a complete and efficient solver.
"""
print(solve_cs_problem(
    problem=problem, language="python"
))

.

def ac3(csp, arcs=None):
    """Executes the AC-3 algorithm on the specified CSP.
    If the parameter 'arcs' is None, then all arcs in the CSP will be used.
    Otherwise, the algorithm will be executed on only the arcs present in 'arcs'.
    Returns True if the arc-consistency check succeeds, and False otherwise."""
    
    if arcs is None:
        arcs = [(Xi, Xj) for Xi in csp.variables for Xj in csp.neighbors[Xi]]
    
    # Your implementation here
    
    return True


In [None]:
print(solve_cs_problem(
    problem="Solve Sudoku", language="python"
))

.

There are many ways to solve a Sudoku puzzle, but this algorithm uses a simple technique of finding the next empty cell and trying every possible value until the puzzle is solved.

def solve_sudoku(board):
 
    find = find_empty(board)
    if not find:
        return True
    else:
        row, col = find
 
    for i in range(1,10):
        if valid(board, i, (row, col)):
            board[row][col] = i
 
            if solve_sudoku(board):
                return True
 
            board[row][col] = 0
 
    return False
 
 
def valid(board, num, pos):
    # Check row
    for i in range(len(board[0])):
        if board[pos[0]][i] == num and pos[1] != i:
            return False
 
    # Check column
    for i in range(len(board)):
        if board[i][pos[1]] == num and pos[0] != i:
            return False
 
    # Check box
    box_x = pos[1] // 3
    box_y = pos[0] // 3
 
    for i in range(box_y*3, box_y*3 + 3):
        for j in range(box_x * 3, box_x*3 + 3):
            if board[i

In [None]:
problem = """implement the AC-3 constraint satisfaction algorithm for Sudoku, along with two extensions that will combine to form a complete and efficient solver.
Add in the ARC consistency constraint to help identify other methods and use inference to guess when it gets stuck.
"""
print(solve_cs_problem(
    problem=problem, language="python"
))

.

import sys

def ac3(csp, arcs=None):
    """Executes the AC-3 algorithm on the constraint satisfaction problem.
    If the arcs argument is None, then all arcs in the CSP are considered.
    Otherwise, the arcs argument should be a list of tuples containing
    arcs to be considered.
    """

    if arcs is None:
        arcs = [(Xi, Xk) for Xi in csp.variables for Xk in csp.neighbors[Xi]]

    queue = arcs
    while queue:
        (Xi, Xj) = queue.pop(0)
        if revise(csp, Xi, Xj):
            if len(csp.curr_domains[Xi]) == 0:
                return False
            for Xk in csp.neighbors[Xi]:
                if (Xi, Xk) not in queue:
                    queue.append((Xi, Xk))
    return True

def revise(csp, Xi, Xj):
    """Returns true if we revise the domain of Xi"""

    revised = False
    for x in csp.curr_domains[Xi][:]:
        # If Xi = Xj then skip
        if x == Xj:
            continue

        # Check if there is a consistent assignment for x
        if all(not

In [None]:
print(solve_cs_problem(
    problem="Solve Breadth First Search", language="Assembly"
))

using the breadth-first search algorithm.

1. Initialize an empty queue.

2. Add the root node to the queue.

3. While the queue is not empty:

4. Remove the first node from the queue.

5. If the node is the goal, return the path to the goal.

6. Otherwise, add the node's children to the end of the queue.


In [None]:
print(solve_cs_problem(
    problem="Create Incrementer", language="Assembly"
))

.

mov eax, 1

add eax, 1

ret


In [None]:
print(solve_cs_problem(
    problem="Write a summation circuit", language="Verilog"
))

.

module summation ( input [3:0] a, input [3:0] b, output [3:0] sum ); assign sum = a + b; endmodule


In [None]:
print(solve_cs_problem(
    problem="Create a proof-of-work blockchain", language="Rust"
))

Submit a PR to the rust-proof-of-work repository

What is a blockchain?

A blockchain is a digital ledger of all cryptocurrency transactions. It is constantly growing as "completed" blocks are added to it with a new set of recordings. Each block contains a cryptographic hash of the previous block, a timestamp, and transaction data. Bitcoin nodes use the block chain to differentiate legitimate Bitcoin transactions from attempts to re-spend coins that have already been spent elsewhere.

What is a proof-of-work blockchain?

A proof-of-work blockchain is a type of blockchain that uses a proof-of-work algorithm to validate new blocks. This algorithm requires miners to solve a complex computational problem in order to add a new block to the chain. The first miner to solve the problem is rewarded with a certain number of cryptocurrency coins.


In [None]:
print(solve_cs_problem(
    problem="Create the ALU", language="Verilog"
))

.

module ALU(input [3:0] op, input [15:0] A, input [15:0] B, output [15:0] R); always_comb case(op) 4'b0000: R = A & B; 4'b0001: R = A | B; 4'b0010: R = A ^ B; 4'b0011: R = A + B; 4'b0100: R = A - B; 4'b0101: R = A * B; 4'b0110: R = A / B; 4'b0111: R = A % B; 4'b1000: R = ~A; 4'b1001: R = A << B; 4'b1010: R = A >> B; 4'b1011: R = A >>> B; 4'b1100: R = A == B; 4'b1101: R = A != B; 4'b1110: R = A >= B; 4'b1111: R = A <= B; default: R = 0; endcase endmodule


#Summary 
It worked so well. It solved questions we had for CIS521 with high accuracy and even commented the code. It got so basic that it wrote Verilog code and built the chip logic that actually runs it and more! Sometimes for very hard problems it wrote pseudocode but that's just an issue of not being specific enough / having enough credits.

# Fine Tuning

In addition to zero-shot and few-shot learning, another way of getting large language models to do your tasks is via a process called "fine tuning".  In fine-tuning the model updates its parameters so that it performs well on many training examples.  The training examples are in the form of input prompts paired with gold standard completions.

Large language models are pre-trained to perform well on general tasks like text completion but not on the specific task that you might be interested in.  The models can be fine tuned to perform you task, starting with the model parameters that are good for the general setting, and then updating them to be good for your task. 

We'll walk through how to fine-tune GPT3 for a task.


For this example, we will show you how to fine tune GPT3 to write biographies. From data in the info boxes in Wikipedia pages.  For instance, given this input 

```
notable_type: scientist
name: Zulima Aban
gender: female
birth_date: 05 December 1905
birth_place: Valencia, Spain
death_date: 09 August 1983
death_place: Detroit, Michigan, U.S.
death_cause: Pulmonary embolism
occupation: Astronomer
fields: Astrophysics, Computer Science, Computer Graphics, Interface Design, Image Synthesis
known_for: The Search for Planet Nine
hometown: Detroit, Michigan, U.S.
nationality: Venezuelan
citizenship: Spanish, American
alma_mater: University of Valencia (B.Sc.), University of Madrid (Ph.D.)
thesis_title: The Formation of Planets by the Accretion of Small Particles
thesis_year: 1956
doctoral_advisor: Angela Carter
awards: Spanish Academy of Science, Spanish Academy of Engineering, German Aerospace Prize, IEEE Medal of Honor, IEEE John von Neumann Medal, IEEE Jack S. Kilby Signal Processing Medal, United Nations Space Pioneer Award, Wolf Prize in Physics
institutions: Oberlin College, University of Valencia, Instituto de Astrofísica de Andalucía (CSIC), University of Southern California, Space Telescope Science Institute (STScI)
notable_students: Ryan Walls
influences: Immanuel Kant, Albert Einstein, Kurt Gödel, Gottfried Leibniz, Richard Feynman, Werner Heisenberg, William Kingdon Clifford, Sir Arthur Eddington
influenced: Joseph Weinberg
mother: Ana Aban
father: Joaquín Aban
partner: Georgina Abbott
children: Robert, Peter, Sarah
```

The fine-tuned model will generate this output:

> Zulima Aban was a Venezuelan astronomer, who was born on 05 December 1905 in Valencia, Spain to Ana Aban and Joaquín Aban. Her career involved the fields of Astrophysics, Computer Science, Computer Graphics, Interface Design, Image Synthesis. Aban was known for The Search for Planet Nine. Aban went to University of Valencia (B.Sc.), University of Madrid (Ph.D.). Aban's thesis title was The Formation of Planets by the Accretion of Small Particles in 1956. Her doctoral advisor was Angela Carter. Aban received Spanish Academy of Science, Spanish Academy of Engineering, German Aerospace Prize, IEEE Medal of Honor, IEEE John von Neumann Medal, IEEE Jack S. Kilby Signal Processing Medal, United Nations Space Pioneer Award, Wolf Prize in Physics. Aban went to Oberlin College, University of Valencia, Instituto de Astrofísica de Andalucía (CSIC), University of Southern California, Space Telescope Science Institute (STScI). Her notable students were Ryan Walls. Aban was influenced by Immanuel Kant, Albert Einstein, Kurt Gödel, Gottfried Leibniz, Richard Feynman, Werner Heisenberg, William Kingdon Clifford, Sir Arthur Eddington and she infuenced Joseph Weinberg. Aban was married to Georgina Abbott and together had three children, Robert, Peter, Sarah. Aban died on 09 August 1983 in Detroit, Michigan, U.S due to Pulmonary embolism.

The dataset that we will use was created for the paper [SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets](https://www.cis.upenn.edu/~ccb/publications/synthbio.pdf) by Ann Yuan, Daphne Ippolito, Vitaly Nikolaev, Chris Callison-Burch, Andy Coenen, and Sebastian Gehrmann. It was published in NeurIPS 2021.  The goal of the paper was to create a curated dataset for training large language models on synthetic data with the goal of avoiding the gender and geographic bias that is naturally present in Wikipedia due to cultural and historic reasons. 


## Load the data

In [None]:
!wget https://raw.githubusercontent.com/artificial-intelligence-class/artificial-intelligence-class.github.io/master/homeworks/large-LMs/SynthBio_train.json

--2022-12-07 06:59:47--  https://raw.githubusercontent.com/artificial-intelligence-class/artificial-intelligence-class.github.io/master/homeworks/large-LMs/SynthBio_train.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5807118 (5.5M) [text/plain]
Saving to: ‘SynthBio_train.json.1’


2022-12-07 06:59:47 (105 MB/s) - ‘SynthBio_train.json.1’ saved [5807118/5807118]



In [None]:
# Load a file called 'SynthBio.json' which is a list of json objects.
# Pretty the first 5 json examples, nicely formatted.

import json
import random

def load_wiki_bio_data(filename='SynthBio_train.json', num_bios=100, randomized=True):
  with open(filename) as f:
    synth_bio_data = json.load(f)
  random.shuffle(synth_bio_data)
  bios = []
  for data in synth_bio_data:
    notable_type = data['notable_type']
    attributes = "notable_type: {notable_type} | {other_attributes}".format(
        notable_type = notable_type, 
        other_attributes = data['serialized_attrs']
    )
    biography = data['biographies'][0]
    bios.append((attributes.replace(" | ", "\n"), biography))
  return bios[:min(num_bios, len(bios))]

wiki_bios = load_wiki_bio_data()


In [None]:
attributes, bio = wiki_bios[0]
print(attributes)
print('---')
bio


notable_type: theologian
name: Kip Williams
gender: non-binary
nationality: American
birth_date: 19 December 1917
birth_place: New York City
death_date: 12 January 1989
death_place: Paris, France
death_cause: heart attack
resting_place: Paris France
alma_mater: Columbia University
occupation: theologian, ecumenical theologian
tradition_movement: ecumenism
main_interests: theology of work, ecumenical theology
mother: Miriam Williams
father: Paul Hays Williams
children: 2
---


'Kip Williams was born on December 19, 1917 in New York City, an American ecumenical theologian and theologian.They was born to Paul Hays Williams and Miriam Williams. They attended Columbia University and main interests are theology of work and ecumenical theology. They had two children. They died of a heart attack on January 12, 1989 in Paris, France.'

## Format Data for Fine-Tuning 

Below, I show how to format data to fine-tune OpenAI.  The OpenAI API documentation has a [guide to fine-tuning models](https://beta.openai.com/docs/guides/fine-tuning) that you should read.   The basic format of fine-tuning data is a JSONL file (one JSON object per line) with two key-value pairs: `prompt:` and `completion:`.

```
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...
```

In the code below, I'll extract a prompt that contains the `attributes` variable from the intent dtermination data, and I'll have the completion be the `biography` variable.

In [None]:
import json

def create_wikibio_finetuning_data(wikibios, fine_tuning_filename):
  fine_tuning_data = []

  for attributes, bio in wiki_bios:
    prompt = "{attributes}\n---\n".format(attributes=attributes)
    completion = "Biography: {bio}\n###".format(bio=bio)
    data = {}
    data['prompt'] = prompt
    data['completion'] = completion
    fine_tuning_data.append(data)

  random.shuffle(fine_tuning_data)
  with open(fine_tuning_filename, 'w') as out:
    for data in fine_tuning_data:
        out.write(json.dumps(data))
        out.write('\n')


fine_tuning_filename='wikibio_finetuning_data.jsonl'
create_wikibio_finetuning_data(wiki_bios, fine_tuning_filename)

Next, we'll perform fine-tuning with this data using OpenAI. 

In [None]:
%%capture
!pip install --upgrade openai
!pip install jsonlines
!pip install wandb

In [None]:
!pip install openai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Once you've got access to the OpenAI API, you can find your OpenAI API key [here](https://beta.openai.com/account/api-keys).

In [None]:
from getpass import getpass
import os
import openai

print('Enter OpenAI API key:')
openai_api_key = getpass()

os.environ['OPENAI_API_KEY']=openai_api_key

Enter OpenAI API key:
··········


In [None]:
!head '{fine_tuning_filename}'

{"prompt": "Biography: Eun-Ah Lee is a South Korean special operations soldier of the South Korean secret service. Lee is one of the most experienced female undercover operators in South Korea since 1994. She attended the Military Science Institute, Seoul, South Korea. She worked for South Korea and she got a codename Black Tiger. She was the daughter of Nak-Sook Lee and Sun-Jae Lee.\n---\n", "completion": "notable_type: spy\nname: Eun-Ah Lee\ngender: female\nnationality: South Korean\nbirth_date: 03 June 1979\nbirth_place: Seoul, South Korea\nserviceyears: 1994-current\nknown_for: South Korea\u2019s leading female undercover operator\nalma_mater: Military Science Institute, Seoul, South Korea\noccupation: special operations soldier\ncodename: Black Tiger\nallegiance: South Korea\nagency: South Korean secret service\nmother: Nak-Sook Lee\nfather: Sun-Jae Lee\n###"}
{"prompt": "Biography: Azim Zhirizhimov was a Kyrgyzstani mountaineering leader. He was born in Alamedin, Kyrgyzstan on Au

## Run the fine-tuning API

Next, we'll make the fine tuning API call via the command line.  Here the -m argument gives the model.  There are 4 sizes of GPT3 models.  They go in alphabetical order from smallest to largest.
* Ada 
* Baddage
* Currie
* Davinci

The models as the model sizes increase, so does their quality and their cost.  Davinci is the highest quality and highest cost model.  I recommend starting by fine-tuning smaller models to debug your code first so that you don't rack up costs.  Once you're sure that your code is working as expected then you can fine-tune a davinci model.


In [None]:
!openai api fine_tunes.create -t '{fine_tuning_filename}' -m ada
#!openai api fine_tunes.create -t '{fine_tuning_filename}' -m davinci

Found potentially duplicated files with name 'wikibio_finetuning_data.jsonl', purpose 'fine-tune' and size 135875 bytes
file-nKbxPz5Sy626FkMqNyPjQDGM
Enter file ID to reuse an already uploaded file, or an empty string to upload this file anyway: ada
File id 'ada' is not among the IDs of the potentially duplicated files

Enter file ID to reuse an already uploaded file, or an empty string to upload this file anyway: 
Upload progress: 100% 136k/136k [00:00<00:00, 195Mit/s]
Uploaded file from wikibio_finetuning_data.jsonl: file-L74fZaFs6CmcllnjX039YwvD
Created fine-tune: ft-kXvs8XFt05ytMGXCfH0gXyPz
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2022-12-07 04:43:14] Created fine-tune: ft-kXvs8XFt05ytMGXCfH0gXyPz
[2022-12-07 04:44:54] Fine-tune costs $0.06
[2022-12-07 04:44:54] Fine-tune enqueued. Queue number: 0
[2022-12-07 04:44:55] Fine-tune started
[2022-12-07 04:45:22] Completed epoch 1/4
[2022-12-07 04:45:37] Complet

In [None]:
!openai api fine_tunes.create -t '{fine_tuning_filename}' -m davinci

Found potentially duplicated files with name 'wikibio_finetuning_data.jsonl', purpose 'fine-tune' and size 135875 bytes
file-nKbxPz5Sy626FkMqNyPjQDGM
file-L74fZaFs6CmcllnjX039YwvD
Enter file ID to reuse an already uploaded file, or an empty string to upload this file anyway: 
Upload progress: 100% 136k/136k [00:00<00:00, 214Mit/s]
Uploaded file from wikibio_finetuning_data.jsonl: file-U4NJvUPqf8rAp2YvPiP5XrSM
Created fine-tune: ft-V1RKRfxWRcesfReXJbC7xqxP
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2022-12-07 04:50:46] Created fine-tune: ft-V1RKRfxWRcesfReXJbC7xqxP
[2022-12-07 04:51:18] Fine-tune costs $4.20
[2022-12-07 04:51:18] Fine-tune enqueued. Queue number: 4
[2022-12-07 04:56:06] Fine-tune is in the queue. Queue number: 3

Stream interrupted (client disconnected).
To resume the stream, run:

  openai api fine_tunes.follow -i ft-V1RKRfxWRcesfReXJbC7xqxP



In [None]:
 !openai api fine_tunes.follow -i ft-V1RKRfxWRcesfReXJbC7xqxP


[2022-12-07 04:50:46] Created fine-tune: ft-V1RKRfxWRcesfReXJbC7xqxP
[2022-12-07 04:51:18] Fine-tune costs $4.20
[2022-12-07 04:51:18] Fine-tune enqueued. Queue number: 4
[2022-12-07 04:56:06] Fine-tune is in the queue. Queue number: 3
[2022-12-07 04:58:03] Fine-tune is in the queue. Queue number: 2
[2022-12-07 05:03:58] Fine-tune is in the queue. Queue number: 1
[2022-12-07 05:14:18] Fine-tune is in the queue. Queue number: 0
[2022-12-07 05:19:49] Fine-tune started
[2022-12-07 05:22:04] Completed epoch 1/4
[2022-12-07 05:22:58] Completed epoch 2/4
[2022-12-07 05:23:52] Completed epoch 3/4
[2022-12-07 05:24:46] Completed epoch 4/4
[2022-12-07 05:25:35] Uploaded model: davinci:ft-personal-2022-12-07-05-25-35
[2022-12-07 05:25:36] Uploaded result file: file-fm7fwscwhFjXG25Kvrbij55C
[2022-12-07 05:25:36] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api completions.create -m davinci:ft-personal-2022-12-07-05-25-35 -p <YOUR_PROMPT>


Our Copied Numbers:

Created fine-tune: ft-V1RKRfxWRcesfReXJbC7xqxP

[2022-12-07 05:25:35] Uploaded model: davinci:ft-personal-2022-12-07-05-25-35



You should copy down the fine-tune numbers which look like this:

```
Created fine-tune: ft-kloUh0jjVc6Jv8p9MfeGHd3s

[2022-08-06 00:43:56] Uploaded model: davinci:ft-ccb-lab-members-2022-08-06-00-57-57
```

If you forget to write it down, you can list your fine-tuned runs and models this way. These model names aren't mneumonic, so it is probably a good idea to make a note on what your model's inputs and outputs are. 

In [None]:
!openai api fine_tunes.list

{
  "data": [
    {
      "created_at": 1670388083,
      "fine_tuned_model": "curie:ft-personal-2022-12-07-04-44-48",
      "hyperparams": {
        "batch_size": 1,
        "learning_rate_multiplier": 0.1,
        "n_epochs": 4,
        "prompt_loss_weight": 0.01
      },
      "id": "ft-fUKw4lYX2JRukVMI7XfRCyqG",
      "model": "curie",
      "object": "fine-tune",
      "organization_id": "org-9LZptg3M1KusyahlqmSqJmcT",
      "result_files": [
        {
          "bytes": 22314,
          "created_at": 1670388289,
          "filename": "compiled_results.csv",
          "id": "file-mWrIJazfbKGMPdsQXwuFTpl3",
          "object": "file",
          "purpose": "fine-tune-results",
          "status": "processed",
          "status_details": null
        }
      ],
      "status": "succeeded",
      "training_files": [
        {
          "bytes": 135875,
          "created_at": 1670388083,
          "filename": "wikibio_finetuning_data.jsonl",
          "id": "file-nKbxPz5Sy626FkMqNyPjQ

You can run your fine tuned model in the OpenAI Playground.  After the model is finished finetuning you'll find it in the Engine dropdown menu (you might need to press reload in your browser for your fine-tuned model to appear).

## Call your fine-tuned model from the OpenAI API

Alternately, you can use your fine tuned model via the API by specifying it as the model.  Here's an example:

In [None]:
def generate_bio(attributes, finetuned_model):
  response = openai.Completion.create(
      model=finetuned_model,
      prompt="{attributes}\n---\n".format(attributes=attributes),
      temperature=0.7,
      max_tokens=500,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      stop=["###"]
      )
  return response['choices'][0]['text'].strip()

# Replace with your model's name
finetuned_model = "davinci:ft-personal-2022-12-07-05-25-35"

In [None]:
attributes = """
notable_type: computer scienist
alma_mater: Stanford University (BS in Symbolic Systems), University of Edinburgh (PhD in Informatics)
birth_place: California
children: 2
gender: male
main_interests: Artificial Intelligence, Natural Language Processing 
name: Chris Callison-Burch
nationality: American
notable_works: Moses: Open source toolkit for statistical machine translation, The Paraphrase Database (PPDB)
occupation: professor
courses_taught: AI, Crowdsourcing and NLP 
enrollment_in_most_popular_course: 570 students
institution: University of Pennsylvania
"""

biography = generate_bio(attributes, finetuned_model)
print(attributes)
print('---')
biography


notable_type: computer scienist
alma_mater: Stanford University (BS in Symbolic Systems), University of Edinburgh (PhD in Informatics)
birth_place: California
children: 2
gender: male
main_interests: Artificial Intelligence, Natural Language Processing 
name: Chris Callison-Burch
nationality: American
notable_works: Moses: Open source toolkit for statistical machine translation, The Paraphrase Database (PPDB)
occupation: professor
courses_taught: AI, Crowdsourcing and NLP 
enrollment_in_most_popular_course: 570 students
institution: University of Pennsylvania

---


'Biography: Chris Callison-Burch is an American professor of computer science at the University of Pennsylvania. He received his B.S. in Symbolic Systems from Stanford University and his Ph.D. in Informatics from the University of Edinburgh. Callison-Burch is interested in Artificial Intelligence, Natural Language Processing and has worked on the Moses: Open source toolkit for statistical machine translation and the The Paraphrase Database (PPDB). He taught courses like AI, Crowdsourcing and NLP. Callison-Burch enrolled 570 students in his most popular course. He is a member of the National Academy of Engineering.'

In [None]:
biography = generate_bio(attributes, finetuned_model)
print(attributes)
print('---')
biography


notable_type: computer scienist
alma_mater: Stanford University (BS in Symbolic Systems), University of Edinburgh (PhD in Informatics)
birth_place: California
children: 2
gender: male
main_interests: Artificial Intelligence, Natural Language Processing 
name: Chris Callison-Burch
nationality: American
notable_works: Moses: Open source toolkit for statistical machine translation, The Paraphrase Database (PPDB)
occupation: professor
courses_taught: AI, Crowdsourcing and NLP 
enrollment_in_most_popular_course: 570 students
institution: University of Pennsylvania

---


'Biography: Chris Callison-Burch is an American professor of computer science at the University of Pennsylvania. He is a researcher in natural language processing and artificial intelligence. Callison-Burch is the author of the book "Moses: A Statistical Language Modeling Toolkit". He is a recipient of the National Science Foundation CAREER award. Callison-Burch attended Stanford University, where he received a BS in Symbolic Systems. He then attended the University of Edinburgh, where he received a PhD in Informatics. Callison-Burch is the author of the book "The Paraphrase Database (PPDB)". He is a recipient of the National Science Foundation CAREER award. He is the author of the book "Moses: A Statistical Language Modeling Toolkit". He is a recipient of the National Science Foundation CAREER award. He is a researcher in natural language processing and artificial intelligence. Callison-Burch is a professor at the University of Pennsylvania. He is a recipient of the National Science F

In [None]:
biography = generate_bio(attributes, finetuned_model)
print(attributes)
print('---')
biography


notable_type: computer scienist
alma_mater: Stanford University (BS in Symbolic Systems), University of Edinburgh (PhD in Informatics)
birth_place: California
children: 2
gender: male
main_interests: Artificial Intelligence, Natural Language Processing 
name: Chris Callison-Burch
nationality: American
notable_works: Moses: Open source toolkit for statistical machine translation, The Paraphrase Database (PPDB)
occupation: professor
courses_taught: AI, Crowdsourcing and NLP 
enrollment_in_most_popular_course: 570 students
institution: University of Pennsylvania

---


'Biography: Chris Callison-Burch is a professor of Computer and Information Science at the University of Pennsylvania. He is also the director of the Penn Language Lab. Callison-Burch received his B.S. in Symbolic Systems and his PhD in Informatics from Stanford University and the University of Edinburgh respectively. He is interested in Artificial Intelligence, Natural Language Processing, and his notable works are Moses: Open source toolkit for statistical machine translation, The Paraphrase Database (PPDB). He has taught courses in AI, Crowdsourcing and NLP at the University of Pennsylvania. He has done enrollment in most popular course with 570 students.'

## Analyze your model's output

Sometimes the model will add facts that are not present in the attributes.  For instance, one time it said 
> He was a member of the research staff at IBM Research in Yorktown Heights.

which is not correct. Another time it said
> His most popular course was on AI, which had 570 students.

which is correct, but not specified in the attirbutes.

Try running your own fine-tuned model until it produces something that wasn't licensed by the attributes. 

Save the good runs and the bad run below.

In [None]:
generations_with_correct_facts = [
   """Biography: Chris Callison-Burch is an American professor of computer science at the University of Pennsylvania. He received his B.S. in Symbolic Systems from Stanford University and his Ph.D. in Informatics from the University of Edinburgh. Callison-Burch is interested in Artificial Intelligence, Natural Language Processing and has worked on the Moses: Open source toolkit for statistical machine translation and the The Paraphrase Database (PPDB). He taught courses like AI, Crowdsourcing and NLP. Callison-Burch enrolled 570 students in his most popular course. He is a member of the National Academy of Engineering.""",
   """ Biography: Chris Callison-Burch is an American professor of computer science at the University of Pennsylvania. He is a researcher in natural language processing and artificial intelligence. Callison-Burch is the author of the book "Moses: A Statistical Language Modeling Toolkit". He is a recipient of the National Science Foundation CAREER award. Callison-Burch attended Stanford University, where he received a BS in Symbolic Systems. He then attended the University of Edinburgh, where he received a PhD in Informatics. Callison-Burch is the author of the book "The Paraphrase Database (PPDB)". He is a recipient of the National Science Foundation CAREER award. He is the author of the book "Moses: A Statistical Language Modeling Toolkit". He is a recipient of the National Science Foundation CAREER award. He is a researcher in natural language processing and artificial intelligence. Callison-Burch is a professor at the University of Pennsylvania. He is a recipient of the National Science Foundation CAREER award.""",
                       ]

generation_with_incorrect_facts_=""" 
Biography: Chris Callison-Burch is an American professor of computer science at the University of Pennsylvania. He is a researcher in natural language processing and artificial intelligence. Callison-Burch is the author of the book "Moses: A Statistical Language Modeling Toolkit". He is a recipient of the National Science Foundation CAREER award. Callison-Burch attended Stanford University, where he received a BS in Symbolic Systems. He then attended the University of Edinburgh, where he received a PhD in Informatics. Callison-Burch is the author of the book "The Paraphrase Database (PPDB)". He is a recipient of the National Science Foundation CAREER award. He is the author of the book "Moses: A Statistical Language Modeling Toolkit". He is a recipient of the National Science Foundation CAREER award. He is a researcher in natural language processing and artificial intelligence. Callison-Burch is a professor at the University of Pennsylvania. He is a recipient of the National Science Foundation CAREER award.""",


incorrect_facts = [
    """ recipient of the National Science Foundation CAREER award""",
]

# Fine Tune a New Model

Now that you've seen an example of how to do fine-tuning with the OpenAI API, let's have you write code to fine-tune your own model.

For this model, I'd like you to do the reverse direction of what we just did.  Given a Wikipedia Biograph like this:

> Jill Tracy Jacobs Biden (born June 3, 1951) is an American educator and the current first lady of the United States as the wife of President Joe Biden. She was the second lady of the United States from 2009 to 2017. Since 2009, Biden has been a professor of English at Northern Virginia Community College. 

> She has a bachelor's degree in English and a doctoral degree in education from the University of Delaware, as well as master's degrees in education and English from West Chester University and Villanova University. She taught English and reading in high schools for thirteen years and instructed adolescents with emotional disabilities at a psychiatric hospital. From 1993 to 2008, Biden was an English and writing instructor at Delaware Technical & Community College. Biden is thought to be the first wife of a vice president or president to hold a paying job during her husband's tenure. 

> Born in Hammonton, New Jersey, she grew up in Willow Grove, Pennsylvania. She married Joe Biden in 1977, becoming stepmother to Beau and Hunter, his two sons from his first marriage. Biden and her husband also have a daughter together, Ashley Biden, born in 1981. She is the founder of the Biden Breast Health Initiative non-profit organization, co-founder of the Book Buddies program, co-founder of the Biden Foundation, is active in Delaware Boots on the Ground, and with Michelle Obama is co-founder of Joining Forces. She has published a memoir and two children's books.

Your model should output something like this:
```
notable_type: First Lady of the United States
name: Jill Biden
gender: female
nationality: American
birth_date: 03 June 1951
birth_place: Hammonton, New Jersey
alma_mater: University of Delaware
occupation: professor of English at Northern Virginia Community College
notable_works: children's books and memoir
main_interests: education, literacy, women's health
partner: Joe Biden
children: Ashley Biden, Beau Biden (stepson), Hunter Biden (stepson)
```


In [None]:
import json

def create_wikibio_parser_finetuning_data(wikibios, fine_tuning_filename):
  # TODO - write your fine-tuning function
  fine_tuning_data = []
  for attributes, bio in wikibios:
    prompt = "Biography: {bio}\n---\n".format(bio=bio)
    completion = "{attributes}\n###".format(attributes=attributes)

    data = {}
    data['prompt'] = prompt
    data['completion'] = completion
    fine_tuning_data.append(data)

  random.shuffle(fine_tuning_data)
  with open(fine_tuning_filename, 'w') as out:
    for data in fine_tuning_data:
        out.write(json.dumps(data))
        out.write('\n')



fine_tuning_filename='wikibio_parser_finetuning_data.jsonl'
create_wikibio_parser_finetuning_data(wiki_bios, fine_tuning_filename)

In [None]:
!openai api fine_tunes.create -t '{fine_tuning_filename}' -m curie
# !openai api fine_tunes.create -t '{fine_tuning_filename}' -m davinci

Found potentially duplicated files with name 'wikibio_parser_finetuning_data.jsonl', purpose 'fine-tune' and size 135875 bytes
file-5Ef2KSYA3YENkk9gB3jn4bLU
file-gYUFT4L5O1wrN5rRE71rihSF
file-04NUFskm0IDEc0MvJx5FmuDa
Enter file ID to reuse an already uploaded file, or an empty string to upload this file anyway: 
Upload progress: 100% 136k/136k [00:00<00:00, 232Mit/s]
Uploaded file from wikibio_parser_finetuning_data.jsonl: file-ozU23fRZkvr2MXm7zDIXinTA
Created fine-tune: ft-UPlqKSnP8IY0DAoJuNzZdbDe
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2022-12-07 07:13:25] Created fine-tune: ft-UPlqKSnP8IY0DAoJuNzZdbDe
[2022-12-07 07:13:34] Fine-tune costs $0.42
[2022-12-07 07:13:34] Fine-tune enqueued. Queue number: 0
[2022-12-07 07:13:35] Fine-tune started
[2022-12-07 07:14:49] Completed epoch 1/4
[2022-12-07 07:15:21] Completed epoch 2/4
[2022-12-07 07:15:48] Completed epoch 3/4
[2022-12-07 07:16:15] Completed epoch 4/4
[

In [None]:
!openai api fine_tunes.follow -i ft-5hRLxGBU4TFGwrz0bFBm8vSx

[2022-12-07 06:45:40] Created fine-tune: ft-5hRLxGBU4TFGwrz0bFBm8vSx
[2022-12-07 06:45:45] Fine-tune costs $0.06
[2022-12-07 06:45:46] Fine-tune enqueued. Queue number: 0
[2022-12-07 06:45:47] Fine-tune started
[2022-12-07 06:46:15] Completed epoch 1/4
[2022-12-07 06:46:30] Completed epoch 2/4
[2022-12-07 06:46:44] Completed epoch 3/4
[2022-12-07 06:46:58] Completed epoch 4/4
[2022-12-07 06:47:29] Uploaded model: ada:ft-personal-2022-12-07-06-47-19
[2022-12-07 06:47:37] Uploaded result file: file-PgqqUEfJTmF3offAez7iGC21
[2022-12-07 06:47:38] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api completions.create -m ada:ft-personal-2022-12-07-06-47-19 -p <YOUR_PROMPT>


In [None]:
def parse_bio(biography, finetuned_bio_parser_model):
  # TODO call the API with your fine-tuned model, return a string representing the attributes
  response = openai.Completion.create(
      model=finetuned_bio_parser_model,
      prompt="Biography: {biography}\n---\n".format(biography=biography),
      temperature=0.7,
      max_tokens=500,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      stop=["---"]
      )
  return response['choices'][0]['text'].strip()

  
finetuned_bio_parser_model="curie:ft-personal-2022-12-07-07-16-32"

## Test your parser

Next we will test your parser.  This will involve calling your `parse_bio` function about 250 times, so be sure that you've got it properly debugged and working before running this code. 

In [None]:
!wget https://raw.githubusercontent.com/artificial-intelligence-class/artificial-intelligence-class.github.io/master/homeworks/large-LMs/SynthBio_test.json

--2022-12-07 06:48:56--  https://raw.githubusercontent.com/artificial-intelligence-class/artificial-intelligence-class.github.io/master/homeworks/large-LMs/SynthBio_test.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 665457 (650K) [text/plain]
Saving to: ‘SynthBio_test.json.2’


2022-12-07 06:48:57 (10.6 MB/s) - ‘SynthBio_test.json.2’ saved [665457/665457]



In [None]:
import json

def load_wiki_bio_test_set(filename='SynthBio_test.json', max_test_items=10, randomized=True):
  """ 
  Loads our wikibio test set, and returns a list of tuples 
  biographies (text), attributes (dictionaires)
  """
  with open(filename) as f:
    synth_bio_data = json.load(f)
  bios = []
  for data in synth_bio_data:
    notable_type = data['notable_type']
    attributes = data['attrs']
    attributes['notable_type'] = notable_type
    biography = data['biographies'][0]
    bios.append((biography, attributes))
  return bios[:min(max_test_items, len(bios))]


def convert_to_dict(predcited_attributes_txt):
  """
  Converts predicted attributes from text format into a dictionary.
  """
  predicted_attributes = {}
  for line in predcited_attributes_txt.split('\n'):
      try:
        attribute, value = line.split(':')
      except:
        continue
      predicted_attributes[attribute.strip()] = value.strip()
  return predicted_attributes



Helper function for computing precision, recall and f-score.

In [None]:
from collections import Counter

def update_counts(gold_attributes, predicted_attributes, true_positives, false_positives, false_negatives, all_attributes):
  # Compute true positives and false negatives
  for attribute in gold_attributes:
    all_attributes[attribute] += 1
    if attribute in predicted_attributes:
      # some attributes have multiple values.
      gold_values = gold_attributes[attribute].split(',')
      for value in gold_values:
        if value.strip() in predicted_attributes[attribute]:
          true_positives[attribute] += 1
        else:
          false_negatives[attribute] += 1
    else:
      false_negatives[attribute] += 1
  # Compute false positives 
  for attribute in predicted_attributes:
    if attribute not in gold_attributes:
      all_attributes[attribute] += 1
    try:
      if not attribute in gold_values:
        false_positives[attribute] += 1
      else:
        # some attributes have multiple values.
        predicted_values = predicted_attributes[attribute].split(',')
        for value in predicted_values:
          if value.strip() not in gold_values[attribute]:
            false_positives[attribute] += 1
    except:
      pass



In [None]:

def evaluate_on_test_set(finetuned_bio_parser_model, wiki_bio_test, threshold_count = 5):
  """
  Computer the precision, recall and f-score for each of the attributes
  that appears more than the treshold count
  """
  true_positives = Counter()
  false_positives = Counter()
  false_negatives = Counter()
  all_attributes = Counter() 

  for bio, gold_attributes in wiki_bio_test:
    predicted_attributes = convert_to_dict(parse_bio(bio, finetuned_bio_parser_model))
    update_counts(gold_attributes, predicted_attributes, true_positives, false_positives, false_negatives, all_attributes)  

  average_precision = 0
  average_recall = 0
  total = 0

  for attribute in all_attributes:
    if all_attributes[attribute] < threshold_count:
      continue
    print(attribute.upper())
    try:
      precision = true_positives[attribute] / (true_positives[attribute] + false_positives[attribute])
    except: 
      precision = 0.0
    try:
      recall = true_positives[attribute] / (true_positives[attribute] + false_negatives[attribute])
    except: 
      recall = 0.0
    print("precision:", precision)
    print("recall:", recall)
    print("f-score:", (precision+recall)/2)
    print('---')
    average_precision += precision
    average_recall += recall
    total += 1

  print("AVERAGE")
  average_precision = average_precision/total
  average_recall = average_recall/total
  print("precision:", average_precision)
  print("recall:", average_recall)
  print("f-score:", (average_precision+average_recall)/2)
  print('---')


If you would like to evaluate on the full test set, there are 237 test items.  You can set `max_test_items=237`.  Doing so will call your `parse_bio` function about 237 times, so be sure that you've got it properly debugged and working before running this code. 

In [None]:
finetuned_bio_parser_model

'curie:ft-personal-2022-12-07-07-16-32'

In [None]:
testset_filename='SynthBio_test.json'
max_test_items=10
wiki_bio_test = load_wiki_bio_test_set(testset_filename, max_test_items)
evaluate_on_test_set(finetuned_bio_parser_model, wiki_bio_test, threshold_count = 5)

NAME
precision: 0.4666666666666667
recall: 0.7
f-score: 0.5833333333333333
---
GENDER
precision: 0.47368421052631576
recall: 0.9
f-score: 0.6868421052631579
---
NATIONALITY
precision: 0.47058823529411764
recall: 0.8
f-score: 0.6352941176470588
---
BIRTH_DATE
precision: 0.5
recall: 0.6
f-score: 0.55
---
BIRTH_PLACE
precision: 0.5833333333333334
recall: 0.5384615384615384
f-score: 0.5608974358974359
---
KNOWN_FOR
precision: 0.0
recall: 0.0
f-score: 0.0
---
ALMA_MATER
precision: 0.3
recall: 0.375
f-score: 0.3375
---
AWARDS
precision: 0.5
recall: 0.5555555555555556
f-score: 0.5277777777777778
---
MOTHER
precision: 0.4375
recall: 0.7
f-score: 0.56875
---
FATHER
precision: 0.4
recall: 0.6666666666666666
f-score: 0.5333333333333333
---
PARTNER
precision: 0.42857142857142855
recall: 0.75
f-score: 0.5892857142857143
---
CHILDREN
precision: 0.5263157894736842
recall: 0.5882352941176471
f-score: 0.5572755417956656
---
NOTABLE_TYPE
precision: 0.0
recall: 0.0
f-score: 0.0
---
CITIZENSHIP
precision:

How well did your model perform?

In [None]:
# TODO - fill in these values
average_precision = 0.3513567927170869
average_recall = 0.47548166702578465
average_fscore = 0.41341922987143576

# What attributes had the highest F-scorre
best_attributes = {
    "DEATH_PLACE" : 0.7083333333333333,
    "CHILDREN" : 0.5572755417956656,
}

# What attributes had the lowest F-scorre
worst_attributes = {
    "HOMETOWN" : 0.0,
    "DEATH_CAUSE" : 0.225,
}

# What could you do the perform the model's performance?
potential_improvements = """
Improve the dataset by increasing the size of what it sees for context.
"""

# Feedback questions

In [None]:
# How many hours did you spend on this assignment? Just an approximation is fine.
num_hours_spent = 10

# What did you think?  This was the first time we tried this assignment 
# so you're feedback is valable.
feedback = """
GPT-3 is slow to run, and the notebook I think was a little difficult to follow, but otherwise this was such a cool experience (and a little distopian too). 
"""

