$$
\def\CC{\bf C}
\def\QQ{\bf Q}
\def\RR{\bf R}
\def\ZZ{\bf Z}
\def\NN{\bf N}
$$
# Basic Operations with TT-NN

We will review a simple example that demonstrates how to create various
tensors and perform basic arithmetic operations on them using TT-NN, a
high-level Python API. These operations include addition,
multiplication, and matrix multiplication, as well as simulating
broadcasting a row vector across a tile.

Let's create the example file, `ttnn_basic_operations.py`

## Import the necessary libraries

In [None]:
import torch
import numpy as np
import ttnn
from loguru import logger

## Open Tenstorrent device

Create device on which we will run our program.

In [None]:
# Open Tenstorrent device
device = ttnn.open_device(device_id=0)

## Helper Function for Tensor Preparation

Let's create a helper function to convert PyTorch tensors to
TT-NN tiled tensors.

In [None]:
# Helper to create a TT-NN tensor from torch with TILE_LAYOUT and bfloat16
def to_tt_tile(torch_tensor):
   return ttnn.from_torch(torch_tensor, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT, device=device)

## Host Tensor Creation

Create a tensor for our tests and fill with different values. We will
use this and other tensors to demonstrate various operations.

In [None]:
logger.info("\n--- TT-NN Tensor Creation with Tiles (32x32) ---")
host_rand = torch.rand((32, 32), dtype=torch.float32)

## Convert Host Tensors to TT-NN Tiled Tensors or Create Natively on Device

Tensix cores operate most efficiently on tiled data, allowing them to
perform a large amount of compute in parallel. 

Where necessary, let's convert host tensors to TT-NN tiled tensors using the `to_tt_tile()` helper function we
created earlier, and transfer tensors to the TT device. Alternatively, we
can create tensors natively using TT-NN's tensor creation functions, and
initialize them directly on the TT device. TT-NN calls that create
tensors natively on the device are a more efficient way to create
tensors, as they avoid the overhead of transferring data from the host
to the device.

In [None]:
tt_t1 = ttnn.full(
   shape=(32, 32),
   fill_value=1.0,
   dtype=ttnn.float32,
   layout=ttnn.TILE_LAYOUT,
   device=device,
)

tt_t2 = ttnn.zeros(
   shape=(32, 32),
   dtype=ttnn.bfloat16,
   layout=ttnn.TILE_LAYOUT,
   device=device,
)
tt_t3 = ttnn.ones(
   shape=(32, 32),
   dtype=ttnn.bfloat16,
   layout=ttnn.TILE_LAYOUT,
   device=device,
)
tt_t4 = to_tt_tile(host_rand)

t5 = np.array([[5, 6], [7, 8]], dtype=np.float32).repeat(16, axis=0).repeat(16, axis=1)
tt_t5 = ttnn.Tensor(t5, device=device, layout=ttnn.TILE_LAYOUT)

## Tile-Based Arithmetic Operations

Let's use some of the tensors we created and perform different operations
on them.

In [None]:
logger.info("\n--- TT-NN Tensor Operations on (32x32) Tiles ---")
add_result = ttnn.add(tt_t3, tt_t4)
logger.info(f"Addition:\n{add_result}")

mul_result = ttnn.mul(tt_t4, tt_t5)
logger.info(f"Element-wise Multiplication:\n{mul_result}")

matmul_result = ttnn.matmul(tt_t3, tt_t4, memory_config=ttnn.DRAM_MEMORY_CONFIG)
logger.info(f"Matrix Multiplication:\n{matmul_result}")

## Simulated Broadcasting (Row Vector Expansion)

Let's simulate broadcasting a row vector across a tile. Every element of a given column will contain the same value.

This is useful for operations that require expanding a smaller tensor to match the dimensions of a larger one.

$
\begin{bmatrix}
1 & 2 & \cdots & 30 & 31 \\
\end{bmatrix}
\rightarrow
\begin{bmatrix}
1 & 2 & \cdots & 30 & 31 \\
1 & 2 & \cdots & 30 & 31 \\
\cdots & \cdots & \cdots & \cdots \\
1 & 2 & \cdots & 30 & 31 \\
1 & 2 & \cdots & 30 & 31 \\
\end{bmatrix}
$

In [None]:
logger.info("\n--- Simulated Broadcasting (32x32 + Broadcasted Row Vector) ---")
broadcast_vector = torch.tensor(np.arange(0, 32), dtype=torch.float32).repeat(32, 1)
logger.info(f"Broadcast Vector:\n{broadcast_vector}")

broadcast_tt = to_tt_tile(broadcast_vector)
broadcast_add_result = ttnn.add(tt_t4, broadcast_tt)
logger.info(f"Broadcast Add Result (TT-NN):\n{ttnn.to_torch(broadcast_add_result)}")

## Closing Device

In [None]:
ttnn.close_device(device)