# Matrix Multiplication

## Import Libraries

In [None]:
import ttnn

## Open the Device

In [None]:
device_id = 0
device = ttnn.open_device(device_id=device_id)

## Tensor Configuration

In [None]:
m = 1024
k = 1024
n = 1024

## Initialize tensors a and b with random values

In [None]:
a = ttnn.rand((m, k), dtype=ttnn.bfloat16, device=device, layout=ttnn.TILE_LAYOUT)
b = ttnn.rand((k, n), dtype=ttnn.bfloat16, device=device, layout=ttnn.TILE_LAYOUT)

## Matrix multiply tensor a and b
The operation will run longer the first time because the kernels need to get compiled

In [None]:
output = a @ b

Re-running the operation shows significant speed up by utilizing program caching

In [None]:
output = a @ b

## Inspect the layout of matrix multiplication output

In [None]:
print(output.layout)

As can be seen, matrix multiplication produces outputs in a tile layout. That is because it's much more efficient to use this layout for computing matrix multiplications on Tenstorrent accelerators compared to a row-major layout.

And this is also why the logs show 2 tilize operations, as the inputs get automatically convered to the tile layout if they are in a row-major layout.

Learn more about tile layout [here](https://github.com/tenstorrent/tt-metal/blob/main/tech_reports/tensor_layouts/tensor_layouts.md#32-tiled-layout)

## Inspect the result of the matrix multiplication

To inspect the results we will first convert to row-major layout.

In [None]:
output = ttnn.to_layout(output, ttnn.ROW_MAJOR_LAYOUT)

print("Printing ttnn tensor")
print(f"shape: {output.shape}")
print(f"chunk of a tensor:\n{output[:1, :32]}")

## Matrix multiply tensor a and b by using more performant config
By default, matrix multiplication might not be as effecient as it could be. To speed it up further, the user can specify how many cores they want matrix multiplication to use. This can speed up the operation significantly.

In [None]:
a = ttnn.rand((m, k), dtype=ttnn.bfloat16, device=device, layout=ttnn.TILE_LAYOUT, memory_config=ttnn.L1_MEMORY_CONFIG)
b = ttnn.rand((k, n), dtype=ttnn.bfloat16, device=device, layout=ttnn.TILE_LAYOUT, memory_config=ttnn.L1_MEMORY_CONFIG)

Run once to compile the kernels

In [None]:
output = ttnn.matmul(a, b, memory_config=ttnn.L1_MEMORY_CONFIG, core_grid=ttnn.CoreGrid(y=8, x=8))

Enjoy a massive speed up on the subsequent runs

In [None]:
output = ttnn.matmul(a, b, memory_config=ttnn.L1_MEMORY_CONFIG, core_grid=ttnn.CoreGrid(y=8, x=8))

## Close the device

In [None]:
ttnn.close_device(device)

## Full Example and Output

Lets put everything together in a complete example that can be run directly.

[ttnn_add_tensors.py](https://github.com/tenstorrent/tt-metal/blob/main/ttnn/tutorials/basic_python/ttnn_add_tensors.py)

Running this script will generate the following output:

```console
$ python3 $TT_METAL_HOME/ttnn/tutorials/basic_python/ttnn_basic_matrix_multiplication.py
2025-10-23 09:03:21.386 | info     |          Device | Opening user mode device driver (tt_cluster.cpp:209)
2025-10-23 09:03:21.512 | info     |             UMD | Harvesting mask for chip 0 is 0x20 (NOC0: 0x20, simulated harvesting mask: 0x0). (cluster.cpp:394)
2025-10-23 09:03:21.751 | info     |             UMD | Opening local chip ids/PCIe ids: {0}/[2] and remote chip ids {} (cluster.cpp:252)
2025-10-23 09:03:21.751 | info     |             UMD | All devices in cluster running firmware version: 18.10.0 (cluster.cpp:232)
2025-10-23 09:03:21.751 | info     |             UMD | IOMMU: disabled (cluster.cpp:174)
2025-10-23 09:03:21.751 | info     |             UMD | KMD version: 2.4.0 (cluster.cpp:177)
2025-10-23 09:03:21.752 | info     |             UMD | Software version 6.0.0, Ethernet FW version 7.0.0 (Device 0) (cluster.cpp:1085)
2025-10-23 09:03:21.765 | info     |             UMD | Pinning pages for Hugepage: virtual address 0x7f5480000000 and size 0x40000000 pinned to physical address 0x4c0000000 (pci_device.cpp:536)
Layout.TILE
Printing ttnn tensor
shape: Shape([1024, 1024])
chunk of a tensor:
ttnn.Tensor([[258.0000, 260.0000,  ..., 266.0000, 272.0000]], shape=Shape([1, 32]), dtype=DataType::BFLOAT16, layout=Layout::ROW_MAJOR)
2025-10-23 09:03:46.028 | info     |          Device | Closing user mode device drivers (tt_cluster.cpp:426)
```