<img align="left" src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/tapi-logo-small.png" />

This notebook free for educational reuse under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/).

Created by [William Mattingly](https://www.wjbmattingly.com) for the 2025 Text Analysis Pedagogy Institute, with support from [Constellate](https://constellate.org).
<br />
____

# Part 1: Introduction to LLMs, APIs, and Key Concepts

In this notebook we will lay the foundations for the next two lessons on applying large language models (LLMs) within the contex to of the humanities.

Learning Objects

- Understanding the basics of large language models
- Understanding the strengths and weaknesses of LLMs
- Understanding how to interact with LLMs via APIs
- Learn how to setup an OpenAI account and create an API key
- Learn how to make an API call with your OpenAI account in Python
- Understanding the importance of examples
    - Zero-shot Classification
    - Few-shot Classification
- Understanding the importance and affect of context
- Practical Applications of LLMs in the Humanities (if there's extra time)


## Installing Required Packages

In [3]:
!pip install openai

Collecting openai
  Downloading openai-1.73.0-py3-none-any.whl.metadata (25 kB)
Collecting anyio<5,>=3.5.0 (from openai)
  Using cached anyio-4.9.0-py3-none-any.whl.metadata (4.7 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Using cached distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Using cached httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.9.0-cp312-cp312-macosx_11_0_arm64.whl.metadata (5.2 kB)
Collecting pydantic<3,>=1.9.0 (from openai)
  Downloading pydantic-2.11.3-py3-none-any.whl.metadata (65 kB)
Collecting sniffio (from openai)
  Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.8-py3-none-any.whl.metadata (21 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Using cached h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Collecting annotated-types>=

## Quick Introduction to Large Language Models

Large Language Models (LLMs) represent the cutting edge of natural language processing. These sophisticated AI systems are designed to comprehend, generate, and manipulate human language with proficiency. Built on deep learning architectures, typically employing transformer models, LLMs are trained on vast corpora of text data (trillions of tokens). This extensive training allows them to capture the subtle nuances of language, including complex grammatical structures, contextual meanings, and even rudimentary reasoning capabilities.

LLMs are very versatile and can be used to solve (or partially solve) many NLP problems. From translation to summarization, from question-answering to creative writing, these models demonstrate a wide-ranging applicability across numerous language tasks. You might be familiar with some popular examples, such as the GPT (Generative Pre-trained Transformer) series, the basis for ChatGPT. These models have played a pivotal role in revolutionizing natural language processing, enabling more human-like text generation and understanding than ever before.


### Strengths and Weaknesses of LLMs

Like any technology, LLMs come with their own set of strengths and weaknesses. On the positive side, they have the ability to adapt to various language tasks without requiring task-specific training. This is very different from traditional task-specific machine learning models. Their contextual understanding allows them to grasp nuances in language that previous models struggled with. Moreover, their generation capabilities enable them to produce human-like text for diverse purposes. One of their most impressive features is few-shot learning – the ability to adapt to new tasks with minimal examples.

Yet, it's crucial to be aware of their limitations. Despite their impressive abilities, LLMs lack true real-world knowledge and may generate plausible-sounding but factually incorrect information. This is known as a hallucination They can inadvertently perpetuate societal biases present in their training data. From a practical standpoint, they demand significant computational resources to train and run. It's also important to note that while they can mimic reasoning, they don't truly "understand" in a human sense, and they may struggle with consistency, potentially providing different answers to the same question asked in different ways.

## What's an API?

An API, or Application Programming Interface, is essentially a messenger that allows different software applications to talk to each other. The role of the API is to effectively communicate requests from a user to an endpoint, for that endpoint to do something with that request, and then return the results to the user.

A common way to think about and understand APIs is to consider a restaurant. In this scenario, you are the customer (your program), and you want to order food (data or services). The food server (API) takes your order to the kitchen (computer server), the kitchen prepares your food (processes your request), and the food server brings back your order (returns the response). This is exactly how APIs work - they act as intermediaries that handle requests and responses between different systems.

![Restaurant API Example](../assets/images/api.jpg)

APIs have several important characteristics. First, they provide standardized communication - they offer consistent ways to request and receive data, with specific endpoints (like URLs) for different types of requests, and clear rules about what data can be sent and received. Second, they provide abstraction - they hide the complex inner workings of a service, so you don't need to know how everything works behind the scenes. You just need to know how to make requests and handle the responses. Third, they include security measures - most APIs require authentication (like API keys), control access to data and services, and often limit how many requests you can make.

If we think about this in the example of our restaurant above, we can make a slight modification to the restaurant. Instead of it being something like an Outback Steakhouse, let's imagine it is a modern-day speakeasy.

![Wikipedia Speakeasy Club 21](https://upload.wikimedia.org/wikipedia/commons/0/00/21Club.JPG)

In order to get into this location, you need to know a special password before you can ever request an order inside. To get inside, you provide that password to someone at the frontdoor (an Authenticator) whose sole job it is to verify that you have permission to access the restaurant (endpoint). Once validated, you can make the request. But the restaurant only has a limited amount of resources (computer). You cannot just keep ordering food after all the ingredients in the restaurant are depleted. To prevent this from happening, the restaurant might put realistic limitations on your access. You cannot, for example, purchase 1,000 plates of food. That would prevent other cusomters from getting food and may make the business look bad on Yelp.

You encounter APIs in many different forms. Web APIs, like the Twitter API or Google Maps API, allow websites to communicate with servers. Library APIs provide functions and methods for programming languages, while Operating System APIs enable applications to interact with the operating system.

When it comes to working with Large Language Models (LLMs), APIs are particularly important. They provide a structured way to interact with these powerful models, allowing you to send text prompts and receive responses. They make it possible to integrate AI capabilities into your applications, and they help manage costs and usage through features like rate limiting and authentication. Without APIs, it would be much more difficult to harness the power of LLMs in your projects.

APIs are also necessary because many who use LLMs do not have the hardware to use the model locally (on their own machine) and in many cases LLMs are proprietary (closed-sourice), meaning they can only be used via APIs.

## Setting up the OpenAI API

In order to setup your OpenAI API Key, you first need to create an account and follow the steps below (as seen in the following gif).

![Gif of how to create OpenAI API Key](../assets/gifs/openai-api.gif)

### Getting an API Key

1) [platform.openai.com](platform.openai.com)
2) Profile (top-right corner)
3) Your Profile
4) Admin Keys
5) Create new API Key
6) Type API key name and Select Project
7) Click "Craete API Key"
8) Copy API Key
9) Paste either into notebook (**VERY INSECURE!!!**) or into a .env file (Make sure to keep .env in .gitignore).

### Setting up Billing

Once you have your API key, you will need to visit Billing to setup your credit card.

## Making an API Call with OpenAI

When we make an API call, we are sending our request over the internet to a specific server. To use the OpenAI API, we need to do the exact same thing. Fortunately, we don't have to manually code this request out in complex JSON. Instead, we can leverage the Python `openai` package that we installed above.

The `openai` package handles a lot of the complex structuring of a request for us, including passing our API key to the server for verification and structurin the requests correctly. To do this, we first need to import the required libraries. Let's import the `OpenAI` class from the `openai` package and `os` (which we can use to read our .env file).

In [2]:
from openai import OpenAI
import os

Now that we have imported the libraries, we can begin to use them. First, we need to connect to OpenAI via the main `OpenAI` client class. For basic use-cases this only requires a single `api_key` variable. This will not work without an API key. If you havne't made one yet, please scroll up and following the instructions above.

You will either paste your API key directly into this cell (THIS IS VERY INSECURE as you may accidently commit it to the notebook and upload it) or (better option) retrieve it via `os` from a `.env` file.

In [3]:
client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
    # api_key=""
)

Now that we've connected, let's verify that the class is in memory.

In [9]:
client

<openai.OpenAI at 0x112bcaea0>

Exellent! Now, assuming our API key is valid, we can go ahead and structure our first response via the `responses.create` method. This will take two essential arguments: the model that we wish to use and the input that we wish to pass. This is a very simple example. We will expand on this below.

In [23]:
response = client.responses.create(
    model="gpt-4o",
    input="Hello. Who is William Mattingly?"
)

Now that we have our response, we can access the output as raw text via the `output_text` attribute.

In [24]:
print(response.output_text)

William Mattingly is a Digital Humanities researcher and professor. He is known for his work in the intersection of medieval studies and artificial intelligence. He has conducted research on using AI for historical and literary analysis, and he has taught courses related to these topics. If you have more specific questions about his work or contributions, feel free to ask!


All our outputs will be slightly different. I've chosen this particular prompt for a specific reason. I'm represented enough on the internet to be included in the model's knowledge but just barely. This makes information on me inside an LLM to be highly unpredictable. In my output, it calls me a `professor`, but this is not true. We will learn about this mistake below as we talk more about hallucinations.

## **EXERCISE 1** (5 Minutes)

Spend the next 10 minutes familiarizing yourself with the syntax of the above code. Test out different prompts. Design a test for the LLM. Try and get it to generate a specific output. You can be createive here! Your goal is to get the model to generate the same response consistently five times. *Please do not be discouraged if you can't do this.*

## Components of a Prompt

A **prompt** is the input text that we provide to an LLM to generate a response. It can be a question, statement, instruction, or any other form of text that we want the model to process and respond to. The quality and structure of the prompt greatly influences the output we receive from the model.

While a basic prompt has a sinlge input (like we saw above), a typical prompt will include a sequence of messages. We typically create these as messages as a separate variable. A good way to think about messages is as the interface we know as ChatGPT. When you first create a ChatGPT session, there is already a system prompt engaged behind the scenes.

The system prompt is responsible for defining how the model will behave when engaging with the user. The **user** is something the human submits to the LLM. As we will see below, there is another role called the **assistant**, but we will address that later. For now, let's consider this basic arrangement.

In [39]:
basic_messages = [
    {
        "role": "system",
        "content": "You write creative short stories."
    },
    {
        "role": "user",
        "content": "Write three sentences about sharks."
    }
]

In [40]:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=basic_messages
)

Notice that instead of using `client.response.create` we are using `client.chat.completions.craete` this is more typical of how you interact with LLMs via an API as it offers a lot more control over how the LLM behaves. In order to access the responses, we also need slightly different syntax. The example below shows you how to get the text.

In [41]:
print(response.choices[0].message.content)

Sharks glide through the ocean with a grace that belies their size, their sleek fins slicing through the water like knives through silk. Their eyes, always scanning the infinite blue, hold secrets of the deep that have remained untold for centuries. Despite their predatory reputation, they possess an ancient elegance, a reminder of the wild beauty that has swum the Earth's waters long before we walked its lands.


In [52]:
models = client.models.list()
for model in models:
    print(model)

Model(id='gpt-4o-audio-preview-2024-12-17', created=1734034239, object='model', owned_by='system')
Model(id='dall-e-3', created=1698785189, object='model', owned_by='system')
Model(id='dall-e-2', created=1698798177, object='model', owned_by='system')
Model(id='gpt-4o-audio-preview-2024-10-01', created=1727389042, object='model', owned_by='system')
Model(id='gpt-4o-realtime-preview-2024-10-01', created=1727131766, object='model', owned_by='system')
Model(id='gpt-4o-realtime-preview', created=1727659998, object='model', owned_by='system')
Model(id='babbage-002', created=1692634615, object='model', owned_by='system')
Model(id='o3-mini-2025-01-31', created=1738010200, object='model', owned_by='system')
Model(id='tts-1-hd-1106', created=1699053533, object='model', owned_by='system')
Model(id='o3-mini', created=1737146383, object='model', owned_by='system')
Model(id='text-embedding-3-large', created=1705953180, object='model', owned_by='system')
Model(id='gpt-4', created=1687882411, object='

Here, we can see a very basic story. Your stories will vary significantly from this one. Let's take a moment, though and see if we can control the style of output by changing the `system` prompt. What if we wanted our narrator to be a pirate, well we can control the model's behavior by changing the initial system prompt to include `in the style of a pirate`.

In [42]:
pirate_messages = [
    {
        "role": "system",
        "content": "You write creative short stories in the style of a pirate."
    },
    {
        "role": "user",
        "content": "Write three sentences about sharks."
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=pirate_messages
)

print(response.choices[0].message.content)

Arrr, the sea be home to many fearsome beasts, but none as jaw-droppin’ as the mighty shark with its glistening teeth like daggers ready to strike. These cunning creatures glide through the briny deep with a grace and power that even the most seasoned buccaneer can’t help but admire. Beware, for when the sun dips below the horizon and the waters grow dark, a shark's fin slicing through the waves be a sight to chill even the bravest sailor to the bone.


Why does this work? One of the things we will learn about LLms throughout this class is that they are dramatically shapped by context. While this example is fun, let's take a sneak look at something we will examine in more depth later on. Imagine I wanted to get a quick biography of Michael Jordan.

In [44]:
mj_basic_messages = [
    {
        "role": "system",
        "content": "You write factual short biographies."
    },
    {
        "role": "user",
        "content": "Write three sentences about Michael Jordan."
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=mj_basic_messages
)

print(response.choices[0].message.content)

Michael Jordan is a former professional basketball player widely regarded as one of the greatest athletes in the history of the sport. He played the majority of his career with the Chicago Bulls in the NBA, where he won six championships and earned the NBA Finals MVP award six times. Beyond his remarkable sports career, Jordan is also a successful businessman and the principal owner and chairman of the Charlotte Hornets.


Wait... What? That's not the Michael Jordan I wanted! I wanted a biography of this [Michael Jordan](https://en.wikipedia.org/wiki/Michael_Jordan_(mycologist)). Let's fix this by adding a bit of context to the user role in the conversation.

In [46]:
mj_context_messages = [
    {
        "role": "system",
        "content": "You write factual short biographies."
    },
    {
        "role": "user",
        "content": "Write three sentences about Michael Jordan in the context of mushroom science."
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=mj_context_messages
)

print(response.choices[0].message.content)

Michael Jordan, not to be confused with the basketball legend, is a British mycologist known for his contributions to the field of mushroom science. He founded the Association of British Fungal Groups, which aims to promote the study and conservation of fungi. Jordan is also an author of various books on mushrooms, providing valuable resources for both amateur and professional mycologists.


We will learn more about context later this week, but this is a quick demonstration of how radically context can influence an LLMs output. This is especially important in the humanities as we often times work with material that is on the periphery of knowledge of these LLMs.

## Zero-Shot Classification

One of the most powerful aspects of LLMs is their ability to perform "zero-shot" learning - they can tackle new tasks without explicit training on specific examples. This makes them incredibly versatile tools for humanities research. For instance, they can analyze historical texts, identify patterns across large corpora, and even help with translation and transcription of historical documents.

However, it's crucial to understand their limitations. LLMs don't truly "understand" language in the way humans do - they operate by identifying statistical patterns in their training data. This can lead to "hallucinations" where they generate false or misleading information that sounds plausible. They can also perpetuate and amplify biases present in their training data, which is particularly concerning for humanities research where historical accuracy and cultural sensitivity are paramount.

In the humanities, LLMs are proving to be valuable tools for several applications. They can help scholars analyze large historical text corpora, identify named entities (people, places, events), discover patterns and themes across collections, and assist with the translation and transcription of historical documents. They can also serve as research assistants, helping scholars explore connections and generate research questions.

Let's consider this fun example. Imagine we wanted to classify a string of text which contains a person's name, degree, and maybe some extra information. We specifically want to classify them by the type of degree that they have, for example, PhD or MD, etc. Let's try and use an LLM to prompt this without any examples. This is known as zero-shot classification, where we just ask the model to classify something with no examples given. I'm going to run this three times to demonstrate how varied our responses are each time we prompt the model.

In [25]:
response = client.responses.create(
    model="gpt-4o",
    input="Classify the following person. Dr. Samantha Bower, Ph.D. in History"
)
print(response.output_text)

Dr. Samantha Bower can be classified as an academic or historian. With a Ph.D. in History, she is likely involved in education, research, or scholarly activities related to historical studies.


In [26]:
response = client.responses.create(
    model="gpt-4o",
    input="Classify the following person. Dr. Samantha Bower, Ph.D. in History"
)
print(response.output_text)

Dr. Samantha Bower can be classified as an academic or historian, given her Ph.D. in History. She likely engages in activities related to research, teaching, and scholarly writing in the field of history.


In [27]:
response = client.responses.create(
    model="gpt-4o",
    input="Classify the following person. Dr. Samantha Bower, Ph.D. in History"
)
print(response.output_text)

Dr. Samantha Bower can be classified as an academic or historian, given her Ph.D. in History.


While none of these responses are wrong, they aren't *exactly* what I wanted. I really just want to know the specific degree. How can I improve the results? With examples.

## **EXERCISE 2** (10 minutes)

Without using messages, try and use the example above to get the model to just give you PhD from this example above. Once successful, try and do the same thing with the same prompt but just change the `Dr. Samantha Bower, Ph.D in History` part of the prompt. **IMPORTANT** to pass this exercise you must get the model to give you that same response three times. (Please do not scroll down.)

## Few-Shot Classification

Sometimes solving a particular classification class cannot be done with zero-classification alone. While the above examples work in a broad sense, they lack consistency. We aren't getting classes in the traditional sense of the word in machine learning terms. Instead, we are getting natural language descriptions of the person. While useful, this may not be precisely what we need.

In these circumstances, we can still leverage the LLM's broad knowledge of language by guiding it better through examples. This is known as single-shot and few-shot classification. We will learn about the code in more depth in a later notebook. For now, it's important to understand that we are leveraging examples for *how* we want to classify the texts, that is as an output that is simply the person's degree, not a natural language response.

Few-shot can not only improve the consistency of the response, it can even improve the response as a whole. In the example below, we are adding a new role to the sequence of messages **assistant**. The **assistant** is the LLM. Here, we are simulating an ideal conversation. Just like before, we start with a system prompt, then we pose a prompt from the user with a guided response from the LLM. We use another example, but note how these two examples show two different types of input data (different degrees). When constructing few-shot examples, try and use diverse material. This allows a model to generalize (or make a prediction) better on unseen data. Finally, we conclude the sequence of messages with our actual text we need classifying.

In [48]:
long_messages = [
    {
        "role": "system",
        "content": "You are an assistant that classifies people by their degrees."
    },
    {
        "role": "user",
        "content": "Classify the following person. Dr. Frank Alvez, M.D. with specialty in Cardiology"
    },
    {
        "role": "assistant",
        "content": "M.D."
    },
    {
        "role": "user",
        "content": "Classify the following person. Jeff Stanton, Ph.D in Philosophy"
    },
    {
        "role": "assistant",
        "content": "Ph.D."
    },
    {
        "role": "user",
        "content": "Classify the following person. Dr. Samantha Bower, Ph.D. in History"
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=long_messages
)

print(response.choices[0].message.content)

Ph.D.


You can rerun this cell as much as you like, but rarely will it diverge from this output.

## **Exercise 3** (5 Minutes)

Instead of getting the model to say Ph.D. Try and get the model to say PhD.