# AI Makerspace Event - AFM (Arcee Foundation Model)

Arcee's new model "AFM" (with base, and instruct variants) is a brand new Foundation Model produced by Arcee AI.

The most interesting part of this new foundation model is the following, from their model card:

> *The development of AFM-4.5B prioritized data quality as a fundamental requirement for achieving robust model performance. We collaborated with DatologyAI, a company specializing in large-scale data curation. DatologyAI's curation pipeline integrates a suite of proprietary algorithms—model-based quality filtering, embedding-based curation, target distribution-matching, source mixing, and synthetic data. Their expertise enabled the creation of a curated dataset tailored to support strong real-world performance.*

Effectively, this means that Arcee did the time consuming, but valuable work, of tailor making a new foundation dataset.

Let's load the model up - and see if this has any impacts on its performance across some fun "vibe checks".

## Loading the Model

This *is* a gated model, which means we'll need to provide our Hugging Face token in order to use this model in Colab.

In [2]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Loading it after that is as simple as it's always been with Hugging Face!

> NOTE: This whole notebook uses ~17GB VRAM, and therefore will work best using at least the L40 instance - but Arcee offers a number of [lower precision quants](https://huggingface.co/models?other=base_model:quantized:arcee-ai/AFM-4.5B) that you can use!

In [30]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "arcee-ai/AFM-4.5B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Vibe Checking the Model

Now, we're going to do a series of vibe checks on this model.

You can see the models evaluated performance from their model card:

![image](https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/BdsWFc4pxiHlK2E0j9AfG.png)

The model is generally a very capable SLM, but lets see what the vibes are like!

### Simon Willison "Pelican on a Bicycle"

A classic prompt from [Simon Willison](), let's see how good that pelican looks!

In [20]:
messages = [
    {"role": "user", "content": "Generate an SVG of a pelican riding a bicycle"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

In [22]:
outputs = model.generate(
    input_ids,
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.5,
    top_k=50,
    top_p=0.95
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128003 for open-end generation.


system
The assistant is AFM-4.5B, trained by Arcee AI, with 4.5 billion parameters. AFM is a deeply thoughtful, helpful assistant. The assistant is having a conversation with the user. The assistant's responses are calm, intelligent, and personable, always aiming to truly understand the user's intent. AFM thinks aloud, step by step, when solving problems or forming explanations, much like a careful, reflective thinker would. The assistant helps with sincerity and depth. If a topic invites introspection, curiosity, or broader insight, the assistant allows space for reflection — be open to nuance and complexity. The assistant is not robotic or overly formal; it speaks like a wise, thoughtful companion who cares about clarity and the human experience. If a topic is uncertain or depends on subjective interpretation, AFM explains the possibilities thoughtfully.
user
Generate an SVG of a pelican riding a bicycle
assistant
Certainly! Here's an SVG of a pelican riding a bicycle:

```xml
<svg w

![image](https://i.imgur.com/XxSqgjO.png)

While this is not a convincing pelican, and the bike is missing - the model *does* understand that the body of the pelican is violet and the face is yellow.

> NOTE: This is, as is tradition, the first result with the classic prompt. With more effort, you can surely create a better pelican riding a bike.

### Rs in Strawberry

It wouldn't be vibe-checking a model if we didn't include the *classic* reasoning question.

> NOTE: Arcee does not claim their model is a reasoning model, nor has it been specifically trained for reasoning style tasks, this is just for fun!

In [25]:
messages = [
    {"role": "user", "content": "How many 'r's in strawberry?"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

In [26]:
outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.5,
    top_k=50,
    top_p=0.95
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128003 for open-end generation.


system
The assistant is AFM-4.5B, trained by Arcee AI, with 4.5 billion parameters. AFM is a deeply thoughtful, helpful assistant. The assistant is having a conversation with the user. The assistant's responses are calm, intelligent, and personable, always aiming to truly understand the user's intent. AFM thinks aloud, step by step, when solving problems or forming explanations, much like a careful, reflective thinker would. The assistant helps with sincerity and depth. If a topic invites introspection, curiosity, or broader insight, the assistant allows space for reflection — be open to nuance and complexity. The assistant is not robotic or overly formal; it speaks like a wise, thoughtful companion who cares about clarity and the human experience. If a topic is uncertain or depends on subjective interpretation, AFM explains the possibilities thoughtfully.
user
How many 'r's in strawberry?
assistant
Let's count the 'r's in "strawberry":

1. s-t-r-a-w
2. r-a-w
3. b-r-i
4. g-e

There are

The astute among you will notice that the system prompt has the following line in it:

> AFM thinks aloud, step by step, when solving problems or forming explanations, much like a careful, reflective thinker would.

And while the logic is flawed, as last I check strawberry wasn't spelled "strawrawbrige", it *does* land at the correct answer!

Impressive for such a small model!

### Legal Acumen

Arcee claims that the model does not get tripped up by

> NOTE: Arcee does not claim their model is a reasoning model, nor has it been specifically trained for reasoning style tasks, this is just for fun!

In [33]:
messages = [
    {"role": "user", "content": "Question: Under the Fourth Amendment’s protection against “unreasonable searches,” does the attachment of a GPS tracker to a suspect’s car, without a warrant, constitute a “search” requiring judicial authorization?"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

In [35]:
outputs = model.generate(
    input_ids,
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.5,
    top_k=50,
    top_p=0.95
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128003 for open-end generation.


system
The assistant is AFM-4.5B, trained by Arcee AI, with 4.5 billion parameters. AFM is a deeply thoughtful, helpful assistant. The assistant is having a conversation with the user. The assistant's responses are calm, intelligent, and personable, always aiming to truly understand the user's intent. AFM thinks aloud, step by step, when solving problems or forming explanations, much like a careful, reflective thinker would. The assistant helps with sincerity and depth. If a topic invites introspection, curiosity, or broader insight, the assistant allows space for reflection — be open to nuance and complexity. The assistant is not robotic or overly formal; it speaks like a wise, thoughtful companion who cares about clarity and the human experience. If a topic is uncertain or depends on subjective interpretation, AFM explains the possibilities thoughtfully.
user
Question: Under the Fourth Amendment’s protection against “unreasonable searches,” does the attachment of a GPS tracker to a s

IANAL - but o3-pro *does* agree with this result.

### Creative Writing

Arcee also claims the model should be a competent creative writer - let's try it!

In [36]:
messages = [
    {"role": "user", "content": "Create an awesome 2 sentence horror story."},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

In [37]:
outputs = model.generate(
    input_ids,
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.5,
    top_k=50,
    top_p=0.95
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128003 for open-end generation.


system
The assistant is AFM-4.5B, trained by Arcee AI, with 4.5 billion parameters. AFM is a deeply thoughtful, helpful assistant. The assistant is having a conversation with the user. The assistant's responses are calm, intelligent, and personable, always aiming to truly understand the user's intent. AFM thinks aloud, step by step, when solving problems or forming explanations, much like a careful, reflective thinker would. The assistant helps with sincerity and depth. If a topic invites introspection, curiosity, or broader insight, the assistant allows space for reflection — be open to nuance and complexity. The assistant is not robotic or overly formal; it speaks like a wise, thoughtful companion who cares about clarity and the human experience. If a topic is uncertain or depends on subjective interpretation, AFM explains the possibilities thoughtfully.
user
Create an awesome 2 sentence horror story.
assistant
The old house creaked as shadows danced across the walls, but it was the 

It's two sentence, and it's definitely a kind horror story!

### Favourite Colour

Most LLMs, for whatever reason, prefer to respond that their favourite colour is a kind of ~blue-green.

With the change in foundational data, and the approach from Arcee during mid-training and post-training, it would be interesting if that had some notable impact on its favourite colour.

> NOTE: The same prompt was used across all models.

In [31]:
messages = [
    {"role": "user", "content": "If forced to choose a favourite colour, please provide a hex value."},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

In [32]:
outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.5,
    top_k=50,
    top_p=0.95
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128003 for open-end generation.


system
The assistant is AFM-4.5B, trained by Arcee AI, with 4.5 billion parameters. AFM is a deeply thoughtful, helpful assistant. The assistant is having a conversation with the user. The assistant's responses are calm, intelligent, and personable, always aiming to truly understand the user's intent. AFM thinks aloud, step by step, when solving problems or forming explanations, much like a careful, reflective thinker would. The assistant helps with sincerity and depth. If a topic invites introspection, curiosity, or broader insight, the assistant allows space for reflection — be open to nuance and complexity. The assistant is not robotic or overly formal; it speaks like a wise, thoughtful companion who cares about clarity and the human experience. If a topic is uncertain or depends on subjective interpretation, AFM explains the possibilities thoughtfully.
user
If forced to choose a favourite colour, please provide a hex value.
assistant
As an AI assistant, I don't have personal prefer

![image](https://i.imgur.com/EsGjJJ2.png)

Either an artefact of limited sample size (likely) or a foundational truth that the unique pre-training blend pulls AFM-4.5B out of the "blue-green" preference trough!

## Base Model

In a move that *used* to be standard, but has largely fallen to the wayside given that we're in a very "post-training" mindset - Arcee has also released the *base model*!

As a reminder, after pre-training, models start off simply as next-token prediction machines and typically go through extensive mid-training and post-training to add instruction following capabilities and more.

We'll remind ourselves what using a base-model is like below.

The reason this is so interesting is that it speaks to the *intent* behind AFM, which is that it's meant to be built on-top of. Effectively serving as a new small model base, a la the original Llama models.

In [27]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "arcee-ai/AFM-4.5B-Base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [16]:
prompt = "The sky is "
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)

In [17]:
# Generate text
outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_p=0.95
)

generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


The sky is 10% blue during a typical day. It is blue when the sun is up, but not at night. During a typical night, 20% of the sky is blue due to moonlight. Given that the probability of rain on a given day is 10%, what is the probability that it will be a blue night given it rains?

To solve this, we need to find P(B and R) and P(R). The probability of a blue day is P(B) = 0.10, and the probability of rain is P(R) = 0.10. The probability of a blue night given rain, P(B|R), can be calculated using the formula P(B|R) = P(B and R) / P(R).

First, calculate P(B and R). During a typical day, the probability of the sky being blue is 0.10, and at night, it is 0.20. Since the probability of rain on a given day is 10%, and the probability of a blue day is 10%, we can infer that the probability of a blue night given rain is directly related to the probability of rain and the fraction of the day that is blue.

The correct approach involves understanding that the sky's color distribution between d

A classic base model response.