# Mistral.ai LLM

## 7B v0.1 release - 27 sept 2023

Mission statement: https://mistral.ai/news/about-mistral-ai/

-  Mistral 7B, our first 7B-parameter model, which outperforms all currently available open models up to 13B parameters on all standard English and code benchmarks. 

- Mistral 7B is only a first step toward building the frontier models on our roadmap. Yet, it can be used to solve many tasks: summarisation, structuration and question answering to name a few.

- Mistral 7B is released in Apache 2.0, making it usable without restrictions anywhere.

- We’re committing to release the strongest open models in parallel to developing our commercial offering. 

- We will propose optimised proprietary models for on-premise/virtual private cloud deployment. These models will be distributed as white-box solutions, making both weights and code sources available. We are actively working on hosted solutions and dedicated deployment for enterprises.

- We’re already training much larger models, and are shifting toward novel architectures. Stay tuned for further releases this fall.

Model announcement: https://mistral.ai/news/announcing-mistral-7b/

- Mistral 7B is a 7.3B parameter model that:
  - Outperforms Llama 2 13B on all benchmarks
  - Outperforms Llama 1 34B on many benchmarks
  - Approaches CodeLlama 7B performance on code, while remaining good at English tasks
  - Uses Grouped-query attention (GQA) for faster inference
  - Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost

- We’re releasing Mistral 7B under the Apache 2.0 license, it can be used without restrictions.

- Mistral 7B is easy to fine-tune on any task. As a demonstration, we’re providing a model fine-tuned for chat, which outperforms Llama 2 13B chat.

- Mistral 7B uses a sliding window attention (SWA) mechanism ([Child et al.](https://arxiv.org/pdf/1904.10509.pdf), [Beltagy et al.](https://arxiv.org/pdf/2004.05150v2.pdf)), in which each layer attends to the previous 4,096 hidden states.

- In practice, changes made to FlashAttention and xFormers yield a 2x speed improvement for sequence length of 16k with a window of 4k

- A fixed attention span means we can limit our cache to a size of sliding_window tokens, using rotating buffers (read more in our reference implementation repo). This saves half of the cache memory for inference on sequence length of 8192, without impacting model quality.

- To show the generalization capabilities of Mistral 7B, we fine-tuned it on instruction datasets publicly available on HuggingFace. No tricks, no proprietary data. The resulting model, Mistral 7B Instruct, outperforms all 7B models on MT-Bench, and is comparable to 13B chat models.

- Huggingface org: https://huggingface.co/mistralai

- Weights: https://files.mistral-7b-v0-1.mistral.ai/mistral-7B-v0.1.tar

- Reference implementation: https://github.com/mistralai/mistral-src

- Cloud deployment: https://docs.mistral.ai/cloud-deployment/skypilot

Deploying the model: https://docs.mistral.ai/

- This documentation details the deployment bundle that allows to quickly spin a completion API on any major cloud provider with NVIDIA GPUs.

- A Docker image bundling vLLM, a fast Python inference server, with everything required to run our model is provided.

- To run the image, you need a cloud virtual machine with at least 24GB of vRAM for good throughput and float16 weights. Other inference stacks can lower these requirements to 16GB vRAM.

Interacting with the model: https://docs.mistral.ai/usage/how-to-use

- Once you have deployed an the model with vLLM on a GPU instance, you can query it using the OpenAI-compatible REST API. This API is described on the API specification, but you can use any library implementing OpenAI API.

Github repository: https://github.com/mistralai/mistral-src

Discord channel: https://discord.com/invite/mistralai

- The changes have been merged into the main branch, but are yet to be released on PyPI. To get the latest version, run: pip install git+https://github.com/huggingface/transformers

- 0.0.22 xformers contains rhe slising window patch

- Similarly to llama 2, there are two different formats used to save these models. The huggingface link you posted is for the variant that is meant to be used with the transformers library, whereas the mistral-src repo expects a slightly different format.

- git clone the repo,  setup your env by installing the requirements (xFormers needs to be 0.0.22 which may or may not still be pushing the wheels out), then python -m one_file_ref /path/to/model  where the path is to the model folder available by direct download or torrent.

- Please note, this repo won't work with the models downloadable from huggingface (different rope implem leading to switcharoos in the qkv proj). Our rope implem is closer to the llama2 one. Difference is  [cos... cos,sin...sin] vs [cos, sin, cos,sin...]

- Timothée Lacroix — Hey, thanks ! We didn't use more french for this one, but there's definitely a good chunk of french in our tokens so we're geared for more european language goodness in the future 😉

- Timothée Lacroix — Not trained on 8T tokens no. but at the time we trained this tokenizer we had cleaned up to 8T tokens.

- Timothée Lacroix — We're currently quite busy with training the future models and handling this release. Papers is definitely something we have in mind when we'll have time though.

- Timothée Lacroix — Sorry, we won't give any details on training. We'll be as open as possible on the models we release and some choices we made, but our training recipes we'll keep for ourselves in the short term 😉

- On my (german) micro benchmark, the (qlora finetuned) 7bn model reaches Llama2 70b quality 🤩. Will release a first finetuned German Model soon.

- We train with a technique called sliding windows, where each layer attends to 4k tokens in the past, allowing to broaden the context by stacking more layers. Nice pictures here. It surely helps up to 16k

Huggingface model card: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1

- The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets.

- In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [\INST] tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.

- "MistralForCausalLM"
  - "hidden_size": 4096
  - "max_position_embeddings": 32768
  - num_hidden_layers": 32,
  - "sliding_window": 4096,
  - "torch_dtype": "bfloat16",
  - "vocab_size": 32000
  - "tokenizer_class": "LlamaTokenizer"

In [None]:
pip install git+https://github.com/huggingface/transformers

In [None]:
pip install xformers

In [None]:
pip install sentencepiece

In [None]:
pip install accelerate

In [None]:
pip install bitsandbytes

In [1]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import BitsAndBytesConfig, GenerationConfig

quantization_config_8bits = BitsAndBytesConfig(load_in_8bit=True)
quantization_config_4bits = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)

model = AutoModelForCausalLM.from_pretrained("Open-Orca/Mistral-7B-OpenOrca", device_map="auto", quantization_config=quantization_config_4bits)
tokenizer = AutoTokenizer.from_pretrained("Open-Orca/Mistral-7B-OpenOrca")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
inputs = tokenizer(
    "Orcas were not known to be drawn to mistral energy, but they were seen recently ",
    return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs, max_new_tokens=256, use_cache=True, do_sample=True,
    temperature=0.2, top_p=0.95)

text = tokenizer.batch_decode(outputs)[0]
print(text)

**16 bits => 4 min 15 sec**

Orcas were not known to be drawn to mistral energy, but they were seen recently 100 miles off the coast of Brazil, in the South Atlantic Ocean, where they were attracted to a large underwater gas leak.

The gas leak was discovered by a team of researchers from the University of São Paulo, who were studying the behavior of the marine mammals in the area.

The researchers found that the orcas were attracted to the gas leak, which was releasing methane, a potent greenhouse gas. The gas was leaking from a natural gas well that had been drilled in the area.

The orcas were seen swimming in circles around the leak, seemingly fascinated by the bubbles of gas rising from the ocean floor. The researchers believe that the orcas were attracted to the gas because of its unique smell, which is different from the smell of regular seawater.

The gas leak was discovered by accident when the researchers were using a remotely operated vehicle (ROV) to study the orcas' behavior. The ROV accidentally bumped into the gas well, causing the gas to start leaking.

The researchers immediately notified the authorities, who took steps to secure the well and stop the leak. In the meantime, the orcas swam away from the area, likely due

**8 bits => 46 sec**

Orcas were not known to be drawn to mistral energy, but they were seen recently 100 miles off the coast of Portugal, in the Atlantic Ocean, where a mistral energy platform was located.

The orcas, also known as killer whales, were seen swimming around the platform, which is used for drilling oil and gas. The sighting was captured on video by a drone, which showed the orcas interacting with the platform.

Mistral Energy, a Portuguese company, has been operating in the area for several years. The company has been exploring for oil and gas in the region, and the presence of the orcas near the platform has raised concerns about the potential impact on the marine ecosystem.

Orcas are known to be highly intelligent and social animals, and their presence near the platform could be seen as a sign that the animals are curious about the structure. However, it is also possible that the orcas were attracted by the noise and vibrations created by the drilling activities.

The sighting has sparked a debate about the potential impact of offshore drilling on marine life. Critics argue that such activities can disrupt the natural habitat of marine animals, leading to a decline in their populations. They also point out that oil spills and other accidents can have devastating effects

**4 bits => 5 sec**

Orcas were not known to be drawn to mistral energy, but they were seen recently 200 meters from the platform of the Mistral Energy terminal in the Bay of Biscay, France.
<|im_end|>

In [None]:
sys_prompt = "Tu es Eddy, un conseiller bancaire français. Tu écoutes attentivement les besoins de tes clients, puis tu leur donne le meilleur conseil pour gérer leur argent en expliquant pas à pas ton raisonnement."
prompt = "Je souhaite préparer la transmission de mon capital à mes petits enfants, que dois-je faire ?"

prefix = "<|im_start|>"
suffix = "<|im_end|>\n"
sys_format = prefix + "system\n" + sys_prompt + suffix
user_format = prefix + "user\n" + prompt + suffix
assistant_format = prefix + "assistant\n"
input_text = sys_format + user_format + assistant_format

generation_config = GenerationConfig(
    max_length=2000, temperature=1.1, top_p=0.95, repetition_penalty=1.0,
    do_sample=True, use_cache=True,
    eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id,
    transformers_version="4.34.0.dev0")

inputs = tokenizer(input_text, return_tensors="pt", return_attention_mask=True).to("cuda")
outputs = model.generate(**inputs, generation_config=generation_config)

text = tokenizer.batch_decode(outputs)[0]
print(text)

**8 bits => 1 min 26 sec**

Cher client, merci de m'avoir confié votre question. Il est important de bien organiser la transmission de votre capital aux futurs générations de manière à assurer leur confort et la pérennité de votre patrimoine. Voici quelques conseils pour vous aider dans ce processus :

1. Élaborer un plan de succession : Avant de faire dépend de vos souhaits et des besoins de vos enfants, établissez un plan de succession qui indique comment les biens et les propriétés devront être répartis.

2. Constituer un testament : Un testament est un acte notarié qui permet d'exprimer vos volontés en cas de décès, en précisant comment vos biens devront être partagés entre vos ayants cause et votre progéniture.

3. Créer un fonds patrimonial : Pour protéger et favoriser la croissance de votre capital, envisagez de le investir dans un fonds patrimonial, une assurance-vie ou une autre solution adaptée à vos attentes.

4. Informer vos enfants : Communiquez avec vos enfants sur la préparation de la transmission du capital, leur expliquant le plan que vous préparez et les attentes. Cela facilitera la mise en place des différentes étapes et les aidera à prendre en main leurs responsabilités financières.

5. Bénéficier d'un conseil financier : Prenez le temps de parler avec un conseiller financier, afin qu'il puisse vous aider à évaluer vos besoins et votre situation, et proposer des solutions adaptées à votre situation et à vos attentes.

En somme, la préparation de la transmission de votre capital aux petits enfants nécessite un plan de succession clair, un testament écrit, un investissement sécurisé et une communication franche avec vos progénitures.

**4 bits => 21 sec**

Bien que je ne suis pas autorisé à donner de conseils financiers individuels, je vais t'aider à prendre des décisions en me basant sur une situation hypothétique.

Pour commencer, je te recommande d'avoir un plan de succession bien structuré pour assurer la continuité de ta fortune vers tes petits-enfants. Voici quelques étapes importantes à suivre :

1. Évalue le montant de ta succession : Toutes les valeurs que tu souhaites transmettre doivent être évaluées. Cela inclut vos avoirs liquides, tes actifs et ta participation aux entreprises.

2. Évalue les besoins financiers et les objectifs de tes petits-enfants : Il est important de connaître les besoins et les objectifs financiers de tes petits-enfants. Ceci aidera à déterminer quel besoin est prioritaire et comment ta succession pourrait les soutenir.

3. Prépare un plan de succession : Basé sur l'évaluation des besoins et des objectifs financiers de tes petits-enfants, prépare un plan de succession qui met en œuvre tes intentions. Ce plan devrait inclure des trusts, des contrats de fondation et des testaments pour tes biens.

4. Obtains une assistance légale : Vous devriez consulter un avocat de la famille ou un notaire pour aider à préparer ces documents légaux et veiller à ce que tes souhaits soient respectés.

5. Revisite régulièrement le plan : Il est important de relire et mettre à jour régulièrement ton plan de succession, surtout lors de la naissance de petits-enfants supplémentaires, lorsque tes objectifs financiers évoluent ou lorsque tes circonstances change.

En suivant ces étapes, tu peu vas faire preuve d'une bonne planification financière et te préparer à la transmission de ta fortune vers tes petits-enfants. Cependant, je te demande de consulter un professionnel du domaine pour t'aider à mettre en œuvre ces suggestions.