# tt-NN Embedding Layer Example

This notebook shows how you can create an embedding layer out of `ttnn` tensors. 

The techniques in this notebook are adapted from [Sebastian Raschka](https://github.com/rasbt)'s [LLMs-from-scratch](https://github.com/rasbt/LLMs-from-scratch) repository. Please check it out. He is a huge inspiration for the work in this repo.

## What are Embeddings?

Embeddings are special tensors which record information about a context of tokens. We have 2 types:
1. **Token embeddings** - Records the information necessary to take a token and form a word within context -- "What kind of word is this?"
2. **Positional embeddings** - Information about a token and the relationship between the other tokens in context to the position for its window. -- "Where does this word sit in the sentence?"

Both of these contain trainable weights in which will be adjusted during training. 

We won't train them separately, we'll create and use **input embeddings** to do this. This is the sum of the token and positional embedding tensors.

## Dependencies

Let's assume we are developing a GPT-2 LLM model. We will need to specify a `vocab_size` and `output_dim`. 

In [1]:
vocab_size = 50257
output_dim = 256

Next, let's import some dependencies:

In [2]:
import torch
import ttnn
from scripts.prepare_data import create_dataloader_v1

2025-04-19 16:35:21.539 | DEBUG    | ttnn:<module>:83 - Initial ttnn.CONFIG:
Config{cache_path=/home/avgdev/.cache/ttnn,model_cache_path=/home/avgdev/.cache/ttnn/models,tmp_dir=/tmp/ttnn,enable_model_cache=false,enable_fast_runtime_mode=true,throw_exception_on_fallback=false,enable_logging=false,enable_graph_report=false,enable_detailed_buffer_report=false,enable_detailed_tensor_report=false,enable_comparison_mode=false,comparison_mode_should_raise_exception=false,comparison_mode_pcc=0.9999,root_report_path=generated/ttnn/reports,report_name=std::nullopt,std::nullopt}


## Data Preparation

Let's build a simple dataset by first acquiring some text. We will use a short story called `the-verdict.txt`. You can find it in the `data` folder.

In [3]:
with open("data/the-verdict.txt", "r", encoding="utf-8") as f:
    raw_text = f.read()

print(raw_text[:50])

I HAD always thought Jack Gisburn rather a cheap g


Next, let's create a dataloader so that we can obtain some batches. We'll assume a:

* context length of 4
* batch size of 8.

In [4]:
context_length = 4
batch_size = 8

Note we'll just return a single input and target batch for this example.

In [5]:
dataloader = create_dataloader_v1(
    raw_text, batch_size=batch_size, max_length=context_length,
    stride=context_length, shuffle=False
)
data_iter = iter(dataloader)

inputs, targets = next(data_iter)

inputs, targets

(tensor([[   40,   367,  2885,  1464],
         [ 1807,  3619,   402,   271],
         [10899,  2138,   257,  7026],
         [15632,   438,  2016,   257],
         [  922,  5891,  1576,   438],
         [  568,   340,   373,   645],
         [ 1049,  5975,   284,   502],
         [  284,  3285,   326,    11]]),
 tensor([[  367,  2885,  1464,  1807],
         [ 3619,   402,   271, 10899],
         [ 2138,   257,  7026, 15632],
         [  438,  2016,   257,   922],
         [ 5891,  1576,   438,   568],
         [  340,   373,   645,  1049],
         [ 5975,   284,   502,   284],
         [ 3285,   326,    11,   287]]))

## Torch Example

First, in `torch`, we can typically create input embeddings by creating a token embedding layer, and positional embedding layer concatenated together. 

The token embedding layer receives the input batch, and the positional embedding can be initialized to increasing numbers. 
Once we create the embedding layers with `torch.nn.Embedding`, we just pass in those inputs to get the embeddings.

With `torch`, it is pretty simple.

In [6]:
token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim)
token_embeddings = token_embedding_layer(inputs)

positional_embedding_layer = torch.nn.Embedding(context_length, output_dim)
positional_embeddings = positional_embedding_layer(torch.arange(context_length))

input_embeddings = token_embeddings + positional_embeddings

print(input_embeddings[0:2])
print(input_embeddings.shape)

tensor([[[ 0.7318, -0.1439,  0.5758,  ..., -0.5013,  0.4811,  1.0332],
         [-0.3948, -0.2520,  0.0813,  ..., -2.9140, -0.7542, -2.5006],
         [-1.6651,  1.4861,  2.0051,  ..., -1.8440, -2.2303,  1.8921],
         [ 0.4186, -0.7514,  0.0198,  ..., -0.0732, -0.5632, -1.1741]],

        [[ 1.5556,  0.5127, -0.7803,  ..., -0.8864, -1.2456,  0.7119],
         [ 1.0383, -2.5513,  0.6281,  ..., -0.2517, -0.9386, -1.7914],
         [-0.5865, -0.8332, -0.6496,  ..., -1.4694, -1.5772,  2.8228],
         [-0.9811, -0.3645, -0.1702,  ..., -1.7683, -1.6742, -2.0518]]],
       grad_fn=<SliceBackward0>)
torch.Size([8, 4, 256])


## tt-NN Example

Unfortunately life isn't as easy with `ttnn`, but we can get to the same place. 

Let's create the token embeddings and positional embeddings one-by-one and we can combine them to create the input_embeddings. 

### Device Initialization

Various operations require the tensors to be on the device. So let's initialize it.

In [7]:
device_id = 0 
device = ttnn.open_device(device_id=device_id)

                 Device | INFO     | Opening user mode device driver
[32m2025-04-19 16:35:27.562[0m | [1m[38;2;100;149;237mINFO    [0m | [36mSiliconDriver  [0m - Opened PCI device 0; KMD version: 1.33.0, IOMMU: disabled

[32m2025-04-19 16:35:27.574[0m | [1m[38;2;100;149;237mINFO    [0m | [36mSiliconDriver  [0m - Opened PCI device 0; KMD version: 1.33.0, IOMMU: disabled
[32m2025-04-19 16:35:27.575[0m | [1m[38;2;100;149;237mINFO    [0m | [36mSiliconDriver  [0m - Harvesting mask for chip 0 is 0x200 (physical layout: 0x1, logical: 0x200, simulated harvesting mask: 0x0).
[32m2025-04-19 16:35:27.576[0m | [1m[38;2;100;149;237mINFO    [0m | [36mSiliconDriver  [0m - Opened PCI device 0; KMD version: 1.33.0, IOMMU: disabled
[32m2025-04-19 16:35:27.577[0m | [1m[38;2;100;149;237mINFO    [0m | [36mSiliconDriver  [0m - Detected PCI devices: [0]
[32m2025-04-19 16:35:27.577[0m | [1m[38;2;100;149;237mINFO    [0m | [36mSiliconDriver  [0m - Using local chip ids: 

New chip! We now have 1 chips
Chip initialization complete (found )
Chip initializing complete...
 ARC

 [4/4] DRAM

 [16/16] ETH

 CPU

Chip detection complete (found )


### Creating Token Embeddings

We can first start by creating the token embeddings. Let's turn the `inputs` and `targets` batches into `ttnn` tensors. We'll also need to send them to the device to operate on. Note that `ttnn.embedding` is an on-device storage operation.

In [8]:
inputs_ttnn = ttnn.from_torch(inputs, dtype=ttnn.uint32)
targets_ttnn = ttnn.from_torch(targets, dtype=ttnn.uint32)

inputs_ttnn = ttnn.to_device(inputs_ttnn, device)
targets_ttnn = ttnn.to_device(targets_ttnn, device)

inputs_ttnn, targets_ttnn

(ttnn.Tensor([[   40,   367,  ...,  2885,  1464],
              [ 1807,  3619,  ...,   402,   271],
              ...,
              [ 1049,  5975,  ...,   284,   502],
              [  284,  3285,  ...,   326,    11]], shape=Shape([8, 4]), dtype=DataType::UINT32, layout=Layout::ROW_MAJOR),
 ttnn.Tensor([[  367,  2885,  ...,  1464,  1807],
              [ 3619,   402,  ...,   271, 10899],
              ...,
              [ 5975,   284,  ...,   502,   284],
              [ 3285,   326,  ...,    11,   287]], shape=Shape([8, 4]), dtype=DataType::UINT32, layout=Layout::ROW_MAJOR))

Creating an embedding tensor is more involved. We will need to **initialize a weight tensor** that has the dimensions of the vocabularly size and output dimensions.

These will just be consisted of random values.

The dimensions end up being (50257, 256)

In [9]:
token_embedding_weights_ttnn = ttnn.from_torch(
    torch.randn(vocab_size, output_dim),
    dtype=ttnn.bfloat16
)
token_embedding_weights_ttnn = ttnn.to_device(token_embedding_weights_ttnn, device)

token_embedding_weights_ttnn

ttnn.Tensor([[ 0.34961, -1.04688,  ...,  0.87891, -1.46094],
             [-0.50781,  1.07031,  ..., -0.66406,  0.47266],
             ...,
             [ 1.10938,  0.14355,  ...,  0.62109, -0.98828],
             [-1.05469, -1.05469,  ...,  0.89844, -0.83594]], shape=Shape([50257, 256]), dtype=DataType::BFLOAT16, layout=Layout::ROW_MAJOR)

Now we can create the token_embeddings in one shot with `ttnn.embedding`. 

In [10]:
token_embeddings_ttnn = ttnn.embedding(inputs_ttnn, token_embedding_weights_ttnn)
token_embeddings_ttnn

ttnn.Tensor([[[-1.40625, -1.02344,  ..., -0.03613, -1.40625],
              [-0.37500,  0.01392,  ..., -0.21094,  0.23145],
              ...,
              [-0.83594,  0.66016,  ...,  0.87500, -0.70312],
              [-1.28125,  0.89062,  ...,  0.42969, -1.62500]],

             [[-0.16406,  0.17676,  ...,  2.42188,  1.03906],
              [-0.41992,  0.38281,  ..., -0.97266,  0.30859],
              ...,
              [-0.87500,  0.83594,  ...,  0.16602,  1.04688],
              [ 0.23535,  1.66406,  ...,  1.26562,  0.95312]],

             ...,

             [[-1.01562,  1.47656,  ...,  0.32617,  0.09131],
              [ 1.32031,  1.08594,  ...,  0.89844, -0.38086],
              ...,
              [ 0.15820, -0.81641,  ...,  3.28125,  0.35938],
              [ 1.33594, -0.01080,  ..., -0.02966,  0.33789]],

             [[ 0.15820, -0.81641,  ...,  3.28125,  0.35938],
              [-0.10352, -0.69141,  ..., -0.30664, -0.83594],
              ...,
              [-0.66016,  0.609

### Creating Positional Embeddings

We can repeat the same thing with positional embeddings

We'll need to generate some positional inputs first. We'll create a simple tensor from 0 to the context_length. 

In [11]:
positional_inputs_ttnn = ttnn.arange(end=context_length, dtype=ttnn.uint32)
positional_inputs_ttnn = ttnn.to_device(positional_inputs_ttnn, device)

positional_inputs_ttnn

ttnn.Tensor([    0,     1,  ...,     2,     3], shape=Shape([4]), dtype=DataType::UINT32, layout=Layout::ROW_MAJOR)

Now we can create positional embedding weights. These are random again.

In [12]:
positional_embeddings_weights = ttnn.from_torch(
    torch.randn(context_length, output_dim),
    dtype=ttnn.bfloat16
)
positional_embeddings_weights = ttnn.to_device(positional_embeddings_weights, device)

Create positional embeddings now using the positional inputs and the randomly initialized positional embeddings weights. This ends up being a tensor that is (4, 256).

In [13]:
positional_embeddings_ttnn = ttnn.embedding(positional_inputs_ttnn, positional_embeddings_weights)
positional_embeddings_ttnn

ttnn.Tensor([[-2.01562, -1.32812,  ..., -2.21875, -1.43750],
             [-1.62500,  1.10938,  ...,  0.36523, -1.01562],
             ...,
             [ 0.81250,  0.24512,  ..., -0.35742, -0.33789],
             [-0.72266, -0.08984,  ...,  0.77734,  0.66797]], shape=Shape([4, 256]), dtype=DataType::BFLOAT16, layout=Layout::ROW_MAJOR)

We're not quite done with the positional_embeddings_ttn yet. We have to now reshape for addition operation coming up. This involves:
1. Reshape the `positional_embeddings_ttnn` tensor to be the same number of dimensions as the `token_embeddings_ttn`. In this case we go from (4, 256) to (1, 4, 256).
2. However, the process in step 1 only results in operating in a batch size of 1. So we need to "broadcast" by using `repeat_interleave` to make an effective addition broadcast across all elements in the tensor when added against the `token_embeddings_ttnn`

It is expected that we turn the (4, 246) shape into a (8, 4, 256) shape tensor

In [14]:
positional_embeddings_ttnn = ttnn.reshape(positional_embeddings_ttnn, (1, context_length, output_dim))
positional_embeddings_ttnn = ttnn.repeat_interleave(positional_embeddings_ttnn, repeats=batch_size, dim=0)
positional_embeddings_ttnn

ttnn.Tensor([[[-2.01562, -1.32812,  ..., -2.21875, -1.43750],
              [-1.62500,  1.10938,  ...,  0.36523, -1.01562],
              ...,
              [ 0.81250,  0.24512,  ..., -0.35742, -0.33789],
              [-0.72266, -0.08984,  ...,  0.77734,  0.66797]],

             [[-2.01562, -1.32812,  ..., -2.21875, -1.43750],
              [-1.62500,  1.10938,  ...,  0.36523, -1.01562],
              ...,
              [ 0.81250,  0.24512,  ..., -0.35742, -0.33789],
              [-0.72266, -0.08984,  ...,  0.77734,  0.66797]],

             ...,

             [[-2.01562, -1.32812,  ..., -2.21875, -1.43750],
              [-1.62500,  1.10938,  ...,  0.36523, -1.01562],
              ...,
              [ 0.81250,  0.24512,  ..., -0.35742, -0.33789],
              [-0.72266, -0.08984,  ...,  0.77734,  0.66797]],

             [[-2.01562, -1.32812,  ..., -2.21875, -1.43750],
              [-1.62500,  1.10938,  ...,  0.36523, -1.01562],
              ...,
              [ 0.81250,  0.245

### Create the Input Embeddings

We can now compute the input_embeddings with token_embeddings_tttn and positional_embeddings_ttn. 
Operating on device memory requires us to reshape the layout of the tensors to be tile. (32x32)

Since we have a small context length and batch size, notice that there will be lots of padding as a result.

In [15]:
input_embeddings_ttnn = ttnn.add(
    ttnn.tilize(token_embeddings_ttnn),
    ttnn.tilize(positional_embeddings_ttnn)
)
input_embeddings_ttnn



ttnn.Tensor([[[-3.42188, -2.35938,  ..., -2.25000, -2.84375],
              [-2.00000,  1.12500,  ...,  0.15430, -0.78516],
              ...,
              [-0.02344,  0.90625,  ...,  0.51953, -1.03906],
              [-2.00000,  0.80078,  ...,  1.21094, -0.95703]],

             [[-1.26562, -3.09375,  ...,  1.38281, -0.95703],
              [-1.02344, -0.21094,  ...,  2.89062, -0.38281],
              ...,
              [ 0.18359, -0.84766,  ..., -2.31250,  0.03906],
              [ 1.93750, -0.82031,  ..., -2.03125, -0.38281]],

             ...,

             [[ 0.65625, -0.04297,  ..., -0.37500,  2.81250],
              [-1.14062,  0.32812,  ..., -1.21875, -1.89062],
              ...,
              [-0.91406, -0.10400,  ..., -0.59766,  1.80469],
              [-0.22168, -0.62500,  ...,  2.34375, -1.57031]],

             [[ 1.13281,  0.16016,  ...,  0.67578, -0.67969],
              [ 1.95312,  0.13672,  ..., -0.08105,  0.01709],
              ...,
              [ 0.16602, -1.054

Thre's a lot of padding inserted, which is why you will see extreme values at the end of the tensors. We can untilize to get back the data in a view that looks better.

In [16]:
input_embeddings_ttnn = ttnn.untilize(input_embeddings_ttnn)
input_embeddings_ttnn



ttnn.Tensor([[[-3.42188, -2.35938,  ..., -2.25000, -2.84375],
              [-2.00000,  1.12500,  ...,  0.15430, -0.78516],
              ...,
              [-0.02344,  0.90625,  ...,  0.51953, -1.03906],
              [-2.00000,  0.80078,  ...,  1.21094, -0.95703]],

             [[-2.18750, -1.14844,  ...,  0.20312, -0.39844],
              [-2.04688,  1.49219,  ..., -0.60938, -0.70703],
              ...,
              [-0.06250,  1.07812,  ..., -0.19141,  0.71094],
              [-0.48828,  1.57812,  ...,  2.04688,  1.62500]],

             ...,

             [[-3.03125,  0.14844,  ..., -1.89062, -1.34375],
              [-0.30469,  2.20312,  ...,  1.26562, -1.39844],
              ...,
              [ 0.97266, -0.57031,  ...,  2.92188,  0.02148],
              [ 0.61328, -0.10059,  ...,  0.74609,  1.00781]],

             [[-1.85938, -2.14062,  ...,  1.06250, -1.07812],
              [-1.72656,  0.41797,  ...,  0.05859, -1.85156],
              ...,
              [ 0.15234,  0.855



Let's do a sanity check. We're expecting the same (8, 4, 256) shape.

This means a batch_size of 8, with 4 tokens in context, for 256 dimensions. The greater the dimensions the more "detail" we will have to record the embeddings for each token.

In [17]:
print(input_embeddings_ttnn[0:2])
print(input_embeddings_ttnn.shape)

ttnn.Tensor([[[-3.42188, -2.35938,  ..., -2.25000, -2.84375],
              [-2.00000,  1.12500,  ...,  0.15430, -0.78516],
              ...,
              [-0.02344,  0.90625,  ...,  0.51953, -1.03906],
              [-2.00000,  0.80078,  ...,  1.21094, -0.95703]],

             [[-1.26562, -3.09375,  ...,  1.38281, -0.95703],
              [-1.02344, -0.21094,  ...,  2.89062, -0.38281],
              ...,
              [ 0.18359, -0.84766,  ..., -2.31250,  0.03906],
              [ 1.93750, -0.82031,  ..., -2.03125, -0.38281]]], shape=Shape([2, 4, 256]), dtype=DataType::BFLOAT16, layout=Layout::ROW_MAJOR)
Shape([8, 4, 256])


Finally, don't forget to clean up.

## Cleanup

In [18]:
ttnn.close_device(device)

                  Metal | INFO     | Closing device 0
                  Metal | INFO     | Disabling and clearing program cache on device 0


## 🚀 DONE!