# Mistral AI

In [1]:
from mistralai import Mistral
import os, tomli

In [2]:
with open('../.streamlit/secrets.toml','rb') as f:
    secrets = tomli.load(f)
    
api_key = secrets["MISTRAL_API_KEY"]
# api_key = os.environ["MISTRAL_API_KEY"]
model = "mistral-tiny"

client = Mistral(api_key=api_key)

In [3]:
messages = [
    {"role": "user", "content": "What is the best French cheese?"}
]
messages

[{'role': 'user', 'content': 'What is the best French cheese?'}]

In [4]:
# No streaming
chat_response = client.chat.complete(
    model=model,
    messages=messages,
)

print(chat_response.choices[0].message.content)

The "best" French cheese can be subjective and depends on personal taste preferences. However, some of the most popular and highly regarded French cheeses include:

1. Roquefort: A blue-veined cheese made from sheep's milk. It's known for its strong, pungent flavor and is often used in salads or as a topping.

2. Brie de Meaux: A soft, creamy cheese with a white rind. It's known for its mild, buttery flavor and is often served with fruit or bread.

3. Camembert: A soft, creamy cheese that's similar to Brie. It's known for its strong, mushroomy flavor and is often eaten at room temperature.

4. Comté: A semi-hard cheese made from cow's milk. It's known for its nutty, sweet flavor and is often used in cooking or as a table cheese.

5. Reblochon: A soft, creamy cheese with a rind. It's known for its strong, pungent flavor and is often used in fondue or as a topping.

6. Époisses de Bourgogne: A soft, smelly cheese that's often served with walnuts. It's known for its strong, pungent flavor

## Summary

In [None]:
!pip install webvtt-py

In [5]:
import os, webvtt

In [7]:
os.listdir('../data/vtt')

['captions.vtt',
 'sample.vtt',
 'subtitles-2xxziIWmaSA.vtt',
 'subtitles-f9_BWhCI4Zo.vtt',
 'subtitles-jkrNMKz9pWU.vtt',
 'subtitles-zjkBMFhNj_g.vtt']

In [9]:
file = 'sample.vtt'
chat = webvtt.read('../data/vtt/'+file)
str = []
for caption in chat:
    str.append(caption.text)
sep = '\n'
convo = sep.join(str)
with open('../data/txt/'+file.replace('vtt','txt'),mode='w') as f:
    f.write(convo)

In [10]:
import math
words = convo.split() # split the string into words 
num_tokens = math.floor(len(words) * 3/4) # 3/4 of the words are tokens
num_tokens

68

In [11]:
context = 'summarize the following conversation'
model = 'mistral-tiny'
messages = [
    {"role": "system", "content": context},
    {"role": "user", "content": convo}
]
# messages = [
#         {'role': 'system','content': context},
#         {'role': 'user', 'content': convo}
#             ]
completion = client.chat.complete(
    model=model,
    messages=messages
)
completion.choices[0].message.content

'The user is planning an experiment involving recording conversations and processing the resulting VTT (WebVTT) file using a Python app they developed. They aim to use the ChatGPT API for analysis, but are currently struggling to find time due to other commitments.'

## Question Answering

https://docs.mistral.ai/guides/basic-RAG/

### RAG from scratch

In [12]:
from mistralai import Mistral
import numpy as np
import os

In [16]:
import requests

response = requests.get('https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt')
text = response.text

In [17]:
folder = "book"
filename = "essay.txt"

# Create the folder if it doesn't exist
if not os.path.exists(folder):
    os.makedirs(folder)
    
with open(os.path.join(folder, filename), "wb") as file:
    file.write(response.text.encode('utf-8'))

In [18]:
chunk_size = 2048
chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
len(chunks)

37

In [22]:
def get_text_embedding(input):
    embeddings_batch_response = client.embeddings.create(
          model="mistral-embed",
          inputs=input
      )
    return embeddings_batch_response.data[0].embedding
text_embeddings = np.array([get_text_embedding(chunk) for chunk in chunks])

In [23]:
import faiss

d = text_embeddings.shape[1]
index = faiss.IndexFlatL2(d)
index.add(text_embeddings)

In [24]:
question = "Are we going to save democracy?"
question_embeddings = np.array([get_text_embedding(question)])

In [25]:
question_embeddings

array([[-0.01547241,  0.00917053,  0.01520538, ..., -0.05145264,
         0.01053619, -0.01382446]])

In [26]:
D, I = index.search(question_embeddings, k=2) # distance, index
retrieved_chunk = [chunks[i] for i in I.tolist()[0]]

In [27]:
prompt = f"""
Context information is below.
---------------------
{retrieved_chunk}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {question}
Answer:
"""

In [28]:
def run_mistral(user_message, model="mistral-medium-latest"):
    messages = [
        {"role": "user", "content": user_message}
    ]
    chat_response = client.chat.complete(
        model=model,
        messages=messages
    )
    return (chat_response.choices[0].message.content)

run_mistral(prompt)

'The context information provided does not mention or discuss anything related to democracy or saving it. Therefore, it is not possible to provide an accurate answer to this query based solely on the given context.'

### Impromptu

In [29]:
import requests
import os

url = "https://www.impromptubook.com/wp-content/uploads/2023/03/impromptu-rh.pdf"
folder = "book"
filename = "impromptu-rh.pdf"

# Create the folder if it doesn't exist
if not os.path.exists(folder):
    os.makedirs(folder)

# Download the file
response = requests.get(url)
response.raise_for_status()

# Save the file
with open(os.path.join(folder, filename), "wb") as file:
    file.write(response.content)

print("PDF file downloaded and saved successfully.")

PDF file downloaded and saved successfully.


In [30]:
# break down into 1 pdf per page
import pypdf

def pdf_to_pages(file):
	"extract text (pages) from pdf file"
	pages = []
	pdf = pypdf.PdfReader(file)
	for p in range(len(pdf.pages)):
		page = pdf.pages[p]
		text = page.extract_text()
		pages += [text]
	return pages

file = "book/impromptu-rh.pdf"
pages = pdf_to_pages(file)

In [31]:
print(pages[31])

25
EDUCATION
If Hollywood central casting ever wants to portray a 
beloved instructor from an idealized past, they could do worse 
than University of Texas at Austin Professor Steven Mintz. Over 
four decades of teaching, Professor Mintz has published books 
and articles on topics as diverse as the psychology of prominent 
Anglo-American literary families and political good vs. evil.
In collared shirts, with graying hair, Mintz can’t suppress his 
smile as he teaches. Students adore him: among hundreds who 
have anonymously rated Mintz online, his average rating is a 
perfect five out of five, with posts such as “easily the best orator 
I’ve ever witnessed,” “his lectures feel more like storytelling 
than class,” and “passionate about what he teaches.”
Professor Mintz, frankly, excelled as a professor long before the 
development of LLMs. So you might have expected him to have 
reacted with indifference or hostility to the late 2022 public 
release of GPT-4’s cousin, ChatGPT.
Instead, 

In [32]:
if not os.path.exists("book/chap1"):
    os.makedirs("book/chap1")
for p in pages[31:53]: # PDF pages 32-54 - chapter 1 on Education
    with open(f"book/chap1/impromptu-rh_ch1_p{pages.index(p)+1}.txt",mode='w',encoding='utf-8') as f:
        f.write(p)

In [33]:
text = pages[31:53]
chunk_size = 2048
chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]

In [35]:
def get_text_embedding(input):
    embeddings_batch_response = client.embeddings.create(
          model="mistral-embed",
          inputs=input
      )
    return embeddings_batch_response.data[0].embedding
text_embeddings = np.array([get_text_embedding(chunk) for chunk in chunks])

In [36]:
import faiss
d = text_embeddings.shape[1]
index = faiss.IndexFlatL2(d)
index.add(text_embeddings)

In [37]:
question = "How can AI be used for Education?"
question_embeddings = np.array([get_text_embedding(question)])
D, I = index.search(question_embeddings, k=2) # distance, index
retrieved_chunk = [chunks[i] for i in I.tolist()[0]]
prompt = f"""
Context information is below.
---------------------
{retrieved_chunk}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {question}
Answer:
"""
def run_mistral(user_message, model="mistral-medium-latest"):
    messages = [
        {"role": "user", "content": user_message}
    ]
    chat_response = client.chat.complete(
        model=model,
        messages=messages
    )
    return (chat_response.choices[0].message.content)

run_mistral(prompt)

"AI can be used in various ways for education. For instance, teachers can use AI to create personalized learning paths for students, analyze their prior knowledge, skills, interests, and goals, and generate adaptive and engaging content, activities, and assessments that match their needs and preferences. Large language models can also help design and facilitate collaborative learning experiences for large classes of students, provide feedback and guidance, and help facilitate interactive discussions or debates among students on various topics or issues. Additionally, AI can be used to provide students with immediate personalized feedback on their work, freeing up teachers to focus on other aspects of instruction. AI can also help teachers create customized quizzes or tests for each student based on their learning goals, progress, and preferences, and provide immediate feedback for each answer. Furthermore, AI can be used to develop customized lesson plans and activity guides based on t