## The Illusion of "Memory"

Many of you will know this already. But for those that don't -- this might be an "AHA" moment!

In [13]:
from google import genai
from google.genai import types
import os
from dotenv import load_dotenv

# Load .env file from parent directory
load_dotenv(dotenv_path='../.env')

API_KEY = os.environ["GEMINI_API_KEY"]
client = genai.Client(api_key=API_KEY)

### Let's introduce ourselves

In [14]:
# First message: Introduce yourself
response_1 = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Hi! I'm Yash!"
)

print("ðŸ¤– Assistant:", response_1.text)

ðŸ¤– Assistant: Hi Yash! It's great to meet you!

I'm a large language model, designed to assist you. How can I help you today?


### OK let's now ask a follow-up question

In [15]:
# Second message: Ask about your name (WITHOUT sending previous context)
response_2 = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What's my name?"
)

print("ðŸ¤– Assistant:", response_2.text)

ðŸ¤– Assistant: As an AI, I don't know your name. I don't have access to personal information about you unless you choose to tell me.


### Wait, what??

We just told you!

What's going on??

Here's the thing: every call to an LLM is completely STATELESS. It's a totally new call, every single time. As AI engineers, it's OUR JOB to devise techniques to give the impression that the LLM has a "memory".

In [16]:
from google import genai
from google.genai import types

# Build conversation history using types.Content
messages = [
    types.Content(
        role="user",
        parts=[types.Part(text="Hi! I'm Yash!")]
    ),
    types.Content(
        role="model",
        parts=[types.Part(text="Hi Yash! How can I assist you today?")]
    ),
    types.Content(
        role="user",
        parts=[types.Part(text="What's my name?")]
    )
]

# Get response
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=messages
)

print("ðŸ¤– Assistant:", response.text)

ðŸ¤– Assistant: Your name is Yash! You told me that a moment ago.


### OpenAI Example

In [None]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hi! I'm Yash!"},
    {"role": "assistant", "content": "Hi Yash! How can I assist you today?"},
    {"role": "user", "content": "What's my name?"}
    ]

In [None]:
response = chat.completions.create(model="gpt-5-nano", messages=messages)
response.choices[0].message.content

## ðŸŽ¯ Complete Example: Multi-Turn Conversation

Let's have a longer conversation to really see how this works!

In [None]:
from google import genai
from google.genai import types

# Start fresh conversation
messages = []

def chat(user_message):
    """Helper function to send a message and get response"""
    # Add user message using types.Content and types.Part
    messages.append(
        types.Content(
            role="user",
            parts=[types.Part(text=user_message)]
        )
    )
    
    # Get response with full history
    response = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=messages
    )
    
    # Add assistant response to history using types.Content and types.Part
    messages.append(
        types.Content(
            role="model",
            parts=[types.Part(text=response.text)]
        )
    )
    
    return response.text

# Have a conversation
print("=" * 70)
print("ðŸ’¬ MULTI-TURN CONVERSATION".center(70))
print("=" * 70)

print("\nðŸ‘¤ User: Hi! I'm Yash and I love coding in Python!")
response = chat("Hi! I'm Yash and I love coding in Python!")
print(f"ðŸ¤– Assistant: {response}")

print("\nðŸ‘¤ User: What's my name?")
response = chat("What's my name?")
print(f"ðŸ¤– Assistant: {response}")

print("\nðŸ‘¤ User: What programming language do I like?")
response = chat("What programming language do I like?")
print(f"ðŸ¤– Assistant: {response}")

print("\nðŸ‘¤ User: Can you write me a simple Python function?")
response = chat("Can you write me a simple Python function?")
print(f"ðŸ¤– Assistant: {response}")


## ðŸ“œ Let's View the Full Conversation History

This is what we're actually sending to the API each time!

In [None]:
print("=" * 70)
print("ðŸ“œ FULL CONVERSATION HISTORY".center(70))
print("=" * 70)
print(f"\nTotal messages in history: {len(messages)}\n")

for i, msg in enumerate(messages, 1):
    # Access properties directly from types.Content object
    role_emoji = "ðŸ‘¤" if msg.role == "user" else "ðŸ¤–"
    role_name = "User" if msg.role == "user" else "Assistant"
    
    # Access text from types.Part object
    content = msg.parts[0].text[:100] + "..." if len(msg.parts[0].text) > 100 else msg.parts[0].text
    
    print(f"{i}. {role_emoji} {role_name}: {content}\n")


### Key Takeaways

1. **LLMs are stateless** - They don't remember anything between API calls

2. **You create the illusion of memory** - By sending the full conversation history each time

3. **Message structure matters** - Use proper role names ("user" and "model" for Gemini)

4. **Costs grow with conversation length** - Each message includes ALL previous messages

5. **Manage your context** - For long conversations, you might need to:
   - Summarize old messages
   - Remove irrelevant history
   - Start fresh conversations

### What's Next?

Now you understand how chat memory works! In real applications, you'll need to:
- Store conversation history in a database
- Implement context window management
- Handle conversation summarization
- Optimize for cost and performance