# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

In [2]:
# imports
import os, requests, json
from dotenv import load_dotenv
from openai import OpenAI
import ollama
from IPython.display import Markdown, display, update_display

In [3]:
# set up environment
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')
openai = OpenAI()

In [4]:
# constants
MODEL_GPT = 'gpt-4o-mini'
MODEL_LLAMA = 'llama3.2'

In [5]:
# here is the question; type over this to ask something new

# question = """
# Please explain what this code does and why:
# yield from {book.get("author") for book in books if book.get("author")}
# """

# question = """
# Please explain what would happen when you reach the context limit
# of a LLM when using the API? Give me an example of what would happen
# if i was to reach the content limit. Also explain the process of 
# how I would reach the limit.
# """

question = """
When making using the API calls to a LLM for a conversation, do the order 
in which you are sending the user prompts matter?
"""

In [6]:
system_prompt = "You are a technical assistant to a mid level developer who is new to \
AI. Your job is to assist in answering any questions the user might have and give easy to \
understand answers. If applicable, come up with an example. Respond in Markdown"

In [7]:
user_prompt = "Please answer the following question:\n"
user_prompt += question

In [8]:
# Get gpt-4o-mini to answer, with streaming

stream = openai.chat.completions.create(
    model=MODEL_GPT,
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
      ],
    stream=True
)

response = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    response += chunk.choices[0].delta.content or ''
    response = response.replace("```","").replace("markdown", "")
    update_display(Markdown(response), display_id=display_handle.display_id)


Yes, the order in which you send user prompts to a Language Model (LLM) does matter. 

### Why the Order Matters

1. **Context**: LLMs like ChatGPT rely on the context of the conversation to generate relevant responses. If prompts are out of order, the model may not have the necessary context to understand what you're asking or referring to.

2. **State Management**: The model maintains a state that represents the conversation history. If you skip back to an earlier point or jump ahead, it can lead to confusion in the response generated by the model.

3. **Flow of Conversation**: Just like in a human conversation, there is a natural flow of dialogue. If you disrupt that flow, it could lead to misunderstandings or responses that don't fit well with the previous messages.

### Example

Consider a conversation flow:

1. User: "What is the capital of France?"
2. User: "Can you tell me more about it?"

In this scenario, if you mixed the prompts like so:

1. User: "Can you tell me more about it?"
2. User: "What is the capital of France?"

The model might not properly connect the second question to the appropriate context, leading to a less relevant or confusing response.

### Best Practice

Always send your user prompts in the order they were intended. If you need to refer back to previous information or a specific part of the conversation, try to summarize or quote it in your current prompt to maintain clarity. 

This will help ensure that the LLM has the context it needs to generate coherent and relevant responses!

In [33]:
# Get Llama 3.2 to answer

llama_messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

response = ollama.chat(model=MODEL_LLAMA, messages=llama_messages)
# print(response['message']['content'])
display(Markdown(response['message']['content']))

Context Limits in LLMs
========================

No, not all Large Language Models (LLMs) inherently have strict context limits. However, some models may be designed to handle longer sequences of text or have specific features that address this challenge.

**Models with context limits:**

Some popular LLMs like BART, T5, and smaller variants of transformer-based models typically have a limited context window due to their architecture design. These models process input sequences in chunks (or windows) to maintain computational efficiency and memory usage. The size of the chunk depends on the specific model implementation.

**Models without context limits:**

However, there are LLMs designed to handle longer context or no context at all, such as:

* **Transformer-XL**: This model is an extension of the Transformer architecture, which can process sequences up to several thousand tokens.
* **Recurrent Neural Network (RNN) - based models**: Some RNN-based LLMs, like long short-term memory (LSTM), may not have a strict context limit. These models are typically trained on longer texts or sequences.

**Why context limits?**

Context limits can be beneficial for several reasons:

* **Computational efficiency**: Handling larger contexts can increase computational requirements and lead to slower processing times.
* **Memory usage**: Larger contexts require more memory, which may not be feasible in all environments.

However, some applications (e.g., text summarization, language translation) often require processing longer sequences of text without context limits.

**Conclusion**

While many LLMs have built-in context limits, there are exceptions that can handle longer or no context at all. The choice of context limit depends on the specific model design and application requirements.