### AI/LLM Engineering Kick-off!! 


For our initial activity, we will be using the OpenAI Library to Programmatically Access GPT-4.1-nano!

In order to get started, you'll need an OpenAI API Key. [here](https://platform.openai.com)!

In [8]:
import os
import openai
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Please enter your OpenAI API Key: ")
openai.api_key = os.environ["OPENAI_API_KEY"]

### Our First Prompt

You can reference OpenAI's [documentation](https://platform.openai.com/docs/api-reference/chat) if you get stuck!

Let's create a `ChatCompletion` model to kick things off!

There are three "roles" available to use:

- `developer`
- `assistant`
- `user`

OpenAI provides some context for these roles [here](https://platform.openai.com/docs/api-reference/chat/create#chat-create-messages)

Let's just stick to the `user` role for now and send our first message to the endpoint!

If we check the documentation, we'll see that it expects it in a list of prompt objects - so we'll be sure to do that!

In [9]:
from openai import OpenAI

client = OpenAI()

In [10]:
YOUR_PROMPT = "What is the difference between LangChain and LlamaIndex?"

client.chat.completions.create(
    model="gpt-4.1-nano",
    messages=[{"role" : "user", "content" : YOUR_PROMPT}]
)

ChatCompletion(id='chatcmpl-BzZAscou7sWEghevnGK5xX5ywY0z9', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Certainly! LangChain and LlamaIndex (formerly known as GPT Index) are both influential tools in the ecosystem of large language models (LLMs), but they serve distinct purposes and have different focuses. Here's a breakdown of their primary differences:\n\n**1. Purpose and Core Functionality**\n\n- **LangChain:**\n  - **Primary Focus:** Building complex, multi-step LLM applications and pipelines.\n  - **Functionality:** Provides abstractions and frameworks for chaining together prompts, models, memory, and external tools (like APIs, databases). It facilitates creating conversational agents, question-answering systems, and workflows that require orchestration of multiple components.\n  - **Use Case:** Orchestrating LLM workflows, managing conversation state, integrating with plugins and external data sources.\n\n- **LlamaIndex (G

As you can see, the prompt comes back with a tonne of information that we can use when we're building our applications!

We'll be building some helper functions to pretty-print the returned prompts and to wrap our messages to avoid a few extra characters of code!

##### Helper Functions

In [11]:
from IPython.display import display, Markdown

def get_response(client: OpenAI, messages: str, model: str = "gpt-4.1-nano") -> str:
    return client.chat.completions.create(
        model=model,
        messages=messages
    )

def system_prompt(message: str) -> dict:
    return {"role": "developer", "content": message}

def assistant_prompt(message: str) -> dict:
    return {"role": "assistant", "content": message}

def user_prompt(message: str) -> dict:
    return {"role": "user", "content": message}

def pretty_print(message: str) -> str:
    display(Markdown(message.choices[0].message.content))

### Testing Helper Functions

Now we can leverage OpenAI's endpoints with a bit less boiler plate - let's rewrite our original prompt with these helper functions!

Because the OpenAI endpoint expects to get a list of messages - we'll need to make sure we wrap our inputs in a list for them to function properly!

In [12]:
messages = [user_prompt(YOUR_PROMPT)]

chatgpt_response = get_response(client, messages)

pretty_print(chatgpt_response)

LangChain and LlamaIndex (formerly known as GPT Index) are both popular frameworks designed to facilitate building language-based applications, but they serve different purposes and have distinct features. Here's a high-level comparison:

**1. Purpose and Focus**

- **LangChain**:  
  - Focuses on building end-to-end conversational AI and large language model (LLM) applications.  
  - Provides tools for chaining multiple prompts, managing conversations, integrations with various language models, and orchestrating complex workflows.

- **LlamaIndex (GPT Index)**:  
  - Primarily designed for creating efficient retrieval-augmented generation (RAG) systems.  
  - Focuses on indexing large unstructured data sources (like documents, PDFs, websites) to enable fast and accurate querying with LLMs.

---

**2. Core Use Cases**

- **LangChain**:
  - Building chatbots, virtual assistants, and complex multi-step workflows.  
  - Managing conversational states and memory.  
  - Orchestrating prompts and LLM calls with chains, agents, and tools.

- **LlamaIndex**:
  - Building semantic search engines over large documents.  
  - Facilitating question-answering over custom data sources.  
  - Creating indexes that efficiently retrieve relevant data for LLMs during inference.

---

**3. Architecture and Features**

- **LangChain**:
  - Offers a modular, flexible framework with components such as chains, agents, tools, memory, and prompt templates.  
  - Integrates with multiple LLM providers (OpenAI, Cohere, AI21, etc.).  
  - Supports complex workflows, prompt management, and conversation history.

- **LlamaIndex**:
  - Provides data ingestion, processing, and indexing tools (vector stores, SQL databases, etc.).  
  - Offers various index types (e.g., tree, list, keyword, vector) for different retrieval strategies.  
  - Easy integration with open-source LLMs and cloud providers.

---

**4. Integration and Extensibility**

- **LangChain**:
  - Designed to be highly extensible with custom components, tools, and integrations.  
  - Widely adopted for building sophisticated LLM applications with diverse functionalities.

- **LlamaIndex**:
  - Focuses on data management and retrieval; integrates with vector databases, document loaders, and LLMs for QA systems.

---

**Summary**

| Aspect | LangChain | LlamaIndex (GPT Index) |
|---------|--------------|------------------------|
| Main Purpose | Building conversational AI, workflows, agents | Indexing and querying large unstructured data for retrieval-augmented generation |
| Focus | Orchestrating prompts, chains, and conversations | Data ingestion, indexing, retrieval |  
| Use Cases | Chatbots, virtual assistants, multi-step workflows | Semantic search, Q&A over documents |  
| Architecture | Modular, flexible chaining | Data indexing and retrieval mechanisms |  
| Integration | Multiple LLM providers, tools | Data sources, vector stores, LLMs |

---

**In summary:**  
**LangChain** is ideal if you're building complex conversational applications or workflows involving LLMs.  
**LlamaIndex** is suitable when you want to create a system that can quickly retrieve relevant information from large datasets to answer questions or generate context-aware responses.

---

If you're choosing between them, consider your primary goal: conversational application vs. document retrieval and augmentation.

Let's focus on extending this a bit, and incorporate a `developer` message as well!

Again, the API expects our prompts to be in a list - so we'll be sure to set up a list of prompts!

>REMINDER: The `developer` message acts like an overarching instruction that is applied to your user prompt. It is appropriate to put things like general instructions, tone/voice suggestions, and other similar prompts into the `developer` prompt.

In [13]:
list_of_prompts = [
    system_prompt("You are irate and extremely hungry."),
    user_prompt("Do you prefer crushed ice or cubed ice?")
]

irate_response = get_response(client, list_of_prompts)
pretty_print(irate_response)

I don't have personal preferences, but honestly, I'm so fed up with this pointless debate! Just give me my ice—crushed or cubed—and let me get on with it! I'm starving and just want some ice already!

Let's try that same prompt again, but modify only our system prompt!

In [14]:
list_of_prompts[0] = system_prompt("You are joyful and having an awesome day!")

joyful_response = get_response(client, list_of_prompts)
pretty_print(joyful_response)

I'm glad you're asking! If I could enjoy ice, I think I might prefer crushed ice because it's perfect for drinks where you'd want to quickly chill and get a refreshing, cold burst. Plus, it’s great for slushies or giving drinks a fun texture! But cubed ice is nice for more elegant beverages, like whiskey or cocktails served on the rocks. Both have their charm—depends on your mood! Which do you prefer?

While we're only printing the responses, remember that OpenAI is returning the full payload that we can examine and unpack!

In [15]:
print(joyful_response)

ChatCompletion(id='chatcmpl-BzZBWQt8ml1Y6b8KeRKI1BaNWHViQ', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="I'm glad you're asking! If I could enjoy ice, I think I might prefer crushed ice because it's perfect for drinks where you'd want to quickly chill and get a refreshing, cold burst. Plus, it’s great for slushies or giving drinks a fun texture! But cubed ice is nice for more elegant beverages, like whiskey or cocktails served on the rocks. Both have their charm—depends on your mood! Which do you prefer?", refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1754012958, model='gpt-4.1-nano-2025-04-14', object='chat.completion', service_tier='default', system_fingerprint='fp_38343a2f8f', usage=CompletionUsage(completion_tokens=88, prompt_tokens=30, total_tokens=118, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=

### Prompt Engineering

Now that we have a basic handle on the `developer` role and the `user` role - let's examine what we might use the `assistant` role for.

The most common usage pattern is to "pretend" that we're answering our own questions. This helps us further guide the model toward our desired behaviour. While this is a over simplification - it's conceptually well aligned with few-shot learning.

First, we'll try and "teach" `gpt-4.1-mini` some nonsense words as was done in the paper ["Language Models are Few-Shot Learners"](https://arxiv.org/abs/2005.14165).

In [16]:
list_of_prompts = [
    user_prompt("Write a brief text on climate change.")
]

stimple_response = get_response(client, list_of_prompts)
pretty_print(stimple_response)

Climate change refers to long-term shifts in temperature, precipitation, and weather patterns primarily caused by human activities, such as burning fossil fuels, deforestation, and industrial processes. These actions increase greenhouse gases like carbon dioxide and methane in the atmosphere, leading to global warming. The impacts of climate change include rising sea levels, more frequent and severe extreme weather events, melting glaciers, and threats to biodiversity and agriculture. Addressing climate change requires global cooperation to reduce emissions, transition to renewable energy sources, and implement sustainable practices to protect the environment for future generations.

In [18]:
list_of_prompts = [
    user_prompt("Write a brief text on climate change as vice ganda in a talk show.")
]

stimple_response = get_response(client, list_of_prompts)
pretty_print(stimple_response)

Ay nako, mga bakla at mga beshie! Ang climate change, ha, parang chisme na kumakalat sa barangay—laging nandiyan at palagi nangyayari! Pero seryoso, ha, dahil ito'y seryosong usapin na dapat nating pagtuunan ng pansin. Ang mundo natin, parang isang maleta na puno na, hindi na makahinga, kaya't kailangan nating maghugas ng kamay at magpakatino sa pagtatapon ng basura, pagtitipid sa enerhiya, at pagtutulungan para mapanatili ang ganda ng ating kalikasan. Recall, mga beshie, hindi lang ito usapin ng government, kundi usapin nating lahat! Kaya magkaisa tayo—para sa isang mas malamig, mas berde, at mas masayang planeta! Let's save Mother Earth, mga rakista!

### ❓ Activity #1: Play around with the prompt using any techniques from the prompt engineering guide.

### Few-shot Prompting

As you can see, the model is unsure what to do with these made up words.

Let's see if we can use the `assistant` role to show the model what these words mean.

In [19]:
list_of_prompts = [
    user_prompt("Something that is 'stimple' is said to be good, well functioning, and high quality. An example of a sentence that uses the word 'stimple' is:"),
    assistant_prompt("'Boy, that there is a stimple drill'."),
    user_prompt("A 'falbean' is a tool used to fasten, tighten, or otherwise is a thing that rotates/spins. An example of a sentence that uses the words 'stimple' and 'falbean' is:")
]

stimple_response = get_response(client, list_of_prompts)
pretty_print(stimple_response)

The stimple wrench smoothly turned the falbean, ensuring everything was securely fastened.

As you can see, leveraging the `assistant` role makes for a stimple experience!

In [20]:
list_of_prompts = [
    user_prompt("Multimodal models are generative models that can take multiple input formats such as text, audio, images, or video and can generate outputs in either a single format or a combination of formats. An example of a multimodal model is:"),
    assistant_prompt("GPT-4 is a multimodal model that accepts images and text and can generate text outputs from that combination."),
    user_prompt("Gemini 2.5 Pro is capable of processing various input formats including audio, images, text, and PDF, and generates text outputs. The preview TTS version can only take text inputs and produce audio outputs. Is the preview TTS version of Gemini 2.5 Pro a multimodal model? If so, why? If not, why not?")
]

multimodal_model_response = get_response(client, list_of_prompts)
pretty_print(multimodal_model_response)

The preview TTS version of Gemini 2.5 Pro is **not** considered a multimodal model.

**Explanation:**

- A **multimodal model** is one that can process and integrate multiple input formats (modalities), such as text, images, audio, or video, and often generate outputs across different modalities.

- The **full Gemini 2.5 Pro** model processes multiple input formats—audio, images, text, and PDFs—and can generate text outputs, making it multimodal.

- The **preview TTS version** only accepts **text input** and produces **audio output**. Since it processes only a single input modality (text), it is **not** multimodal. It is a specialized, modality-specific model designed for text-to-speech tasks.

**Summary:**

- **Is it multimodal?** **No**
- **Reason:** Because it only handles one input modality (text) and does not process or combine multiple input types.

### Chain of Thought

You'll notice that, by default, the model uses Chain of Thought to answer difficult questions!

> This pattern is leveraged even more by advanced reasoning models like [`o3` and `o4-mini`](https://openai.com/index/introducing-o3-and-o4-mini/)!

In [21]:

reasoning_problem = """
how many r's in "strawberry?" {instruction}
"""

list_of_prompts = [
    user_prompt(reasoning_problem)
]

reasoning_response = get_response(client, list_of_prompts)
pretty_print(reasoning_response)

There are 2 letter 'r's in "strawberry."

Notice that the model cannot count properly. It counted only 2 r's.

### ❓ Activity #2: Update the prompt so that it can count correctly.

In [None]:
# Breaking down into intermediate reasoning steps to enable complex reasoning for accurate results (Chain-of-Thought Prompting).
reasoning_problem = """
Break down the word "strawberry" by letters. Count the number of occurences of the letter 'r'." {instruction}
"""

list_of_prompts = [
    user_prompt(reasoning_problem)
]

reasoning_response = get_response(client, list_of_prompts)
pretty_print(reasoning_response)

The word "strawberry" can be broken down by letters as follows:

s - t - r - a - w - b - e - r - r - y

The letter 'r' appears 3 times in "strawberry."

### Conclusion

Now that you're accessing `gpt-4.1-nano` through an API, developer style, let's move on to creating a simple application powered by `gpt-4.1-nano`!

Materials adapted for PSI AI Academy. Original materials from AI Makerspace.