This whole file will be used as an example to showcase how `history` in the context of LLMs works.

---
For this use case, we will need a smarter model. But will be nice if it still will be fast enough.

So there not a bit list of small and fast models: 
- [qwen](https://ollama.com/library/qwen): `qwen:0.5b`, `qwen:1.5b`, `qwen:4b`. [qwen2](https://ollama.com/library/qwen2): `qwen2:0.5b`, `qwen2:1.5b`
- [phi](https://ollama.com/library/phi): `phi:2.7b`.    [phi3](https://ollama.com/library/phi3): `phi3:3.8b`
- [gemma](https://ollama.com/library/gemma): `gemma:2b`
- [orca-mini](https://ollama.com/library/orca-mini): `orca-mini:3b`
- [tinyllama](https://ollama.com/library/tinyllama): `tinyllama:1.1b`
- [tinydolphin](https://ollama.com/library/tinydolphin): `tinydolphin:1.1b`

In [None]:
USE_MODEL = "phi3:3.8b"

In [None]:
import ollama

# If we don't have this specific model, let's pull it.
ollama.pull(USE_MODEL)

In [None]:
client = ollama.Client(host='http://localhost:11434')

stream = client.chat(
    model=USE_MODEL,
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)


---
Now, we can try to ask LLM about our previous question, but the model can lie/hallucinate or say that there was no previous conversation.

Because, as for the model, there was no previous conversation, this example lacks the history to do that.

In [None]:
stream = client.chat(
    model=USE_MODEL,
    messages=[{'role': 'user', 'content': 'About what was the previosu question?'}],
    stream=True,
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

---
Let's add supppot for history in chat with LLM

In [None]:
# History it's just an array of all messages which was sent to LLM
history = [{'role': 'user', 'content': 'What is the difference between lemon and orange in a nutshell?'}]

client = ollama.Client(host='http://localhost:11434')


stream = client.chat(
    model=USE_MODEL,
    messages=history,
    stream=True,
)

# we need to collect all chunks of output somewhere
answer = ""

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

  # adding every chunk to the `answer` variable
  answer += chunk['message']['content']

# making a `assistant` message using the `answer` variable
message = {'role': 'assistant', 'content': answer}

# adding `message` to the end of history
history.append(message)

---
Let's try to verify is history of messages works or not.

In [None]:
history.append({'role': 'user', 'content': 'About what was the previosu question?'})

stream = client.chat(
    model=USE_MODEL,
    messages=history,
    stream=True,
)

# we need to collect all chunks of output somewhere
answer = ""

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

  # adding every chunk to the `answer` variable
  answer += chunk['message']['content']

# making a `assistant` message using the `answer` variable
message = {'role': 'assistant', 'content': answer}

# adding `message` to the end of history
history.append(message)

In [None]:
history.append({'role': 'user', 'content': 'Could you make a very short recap?'})

stream = client.chat(
    model=USE_MODEL,
    messages=history,
    stream=True,
)

# we need to collect all chunks of output somewhere
answer = ""

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

  # adding every chunk to the `answer` variable
  answer += chunk['message']['content']

# making a `assistant` message using the `answer` variable
message = {'role': 'assistant', 'content': answer}

# adding `message` to the end of history
history.append(message)

In [None]:
history.append({'role': 'user', 'content': 'Thank you for that!'})

stream = client.chat(
    model=USE_MODEL,
    messages=history,
    stream=True,
)

# we need to collect all chunks of output somewhere
answer = ""

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

  # adding every chunk to the `answer` variable
  answer += chunk['message']['content']

# making a `assistant` message using the `answer` variable
message = {'role': 'assistant', 'content': answer}

# adding `message` to the end of history
history.append(message)

---
After a long conversation, we can inspect history to see how the model perceives history.

In [None]:
for message in history:
    print(message)

So, history for LLMs is just a list with all previous messages that can fit into it's context.