# Matrix Multiplication Demo

Quick demo to show how to multiply 2 matrices on the device.

## Import Libraries

Import `torch` and `ttnn`. The paradigm here for this demo is that if we're on a CUDA card, we use `torch` to build our data, and then send to the CUDA device when we're ready. `torch` has built in capabilities for handling CUDA devices, but not Tenstorrent devices. That's why we use `ttnn` library. `ttnn` helps us send our tensors to the device.

In [1]:
import torch
import ttnn

2025-04-19 07:28:15.040 | DEBUG    | ttnn:<module>:83 - Initial ttnn.CONFIG:
Config{cache_path=/home/avgdev/.cache/ttnn,model_cache_path=/home/avgdev/.cache/ttnn/models,tmp_dir=/tmp/ttnn,enable_model_cache=false,enable_fast_runtime_mode=true,throw_exception_on_fallback=false,enable_logging=false,enable_graph_report=false,enable_detailed_buffer_report=false,enable_detailed_tensor_report=false,enable_comparison_mode=false,comparison_mode_should_raise_exception=false,comparison_mode_pcc=0.9999,root_report_path=generated/ttnn/reports,report_name=std::nullopt,std::nullopt}


## Create Tensors

Notice creating tensors is the same as always with `torch`. We just need to convert them to `ttnn` tensors by using `from_torch`. More details on `TILE_LAYOUT` later, but just know that the Wormhole n150d has a different memory access pattern than `torch`. (tile-based access)

In [2]:
a = torch.tensor([[3, 3]])
b = torch.tensor([[2], [5]])

In [3]:
a = ttnn.from_torch(a, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)
b = ttnn.from_torch(b, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)

## Opening the Device

Here it is! You communicate with the device. It's just as simple as finding the device id (most of the time, `0`) and using `open_device` with the `device_id`. 

In [4]:
device_id = 0
device = ttnn.open_device(device_id=device_id)

                 Device | INFO     | Opening user mode device driver
[32m2025-04-19 07:28:18.654[0m | [1m[38;2;100;149;237mINFO    [0m | [36mSiliconDriver  [0m - Opened PCI device 0; KMD version: 1.33.0, IOMMU: disabled

[32m2025-04-19 07:28:18.667[0m | [1m[38;2;100;149;237mINFO    [0m | [36mSiliconDriver  [0m - Opened PCI device 0; KMD version: 1.33.0, IOMMU: disabled
[32m2025-04-19 07:28:18.669[0m | [1m[38;2;100;149;237mINFO    [0m | [36mSiliconDriver  [0m - Harvesting mask for chip 0 is 0x200 (physical layout: 0x1, logical: 0x200, simulated harvesting mask: 0x0).
[32m2025-04-19 07:28:18.670[0m | [1m[38;2;100;149;237mINFO    [0m | [36mSiliconDriver  [0m - Opened PCI device 0; KMD version: 1.33.0, IOMMU: disabled
[32m2025-04-19 07:28:18.671[0m | [1m[38;2;100;149;237mINFO    [0m | [36mSiliconDriver  [0m - Detected PCI devices: [0]
[32m2025-04-19 07:28:18.671[0m | [1m[38;2;100;149;237mINFO    [0m | [36mSiliconDriver  [0m - Using local chip ids: 

New chip! We now have 1 chips
Chip initialization complete (found )
Chip initializing complete...
 ARC

 [4/4] DRAM

 [16/16] ETH

 CPU

Chip detection complete (found )


## Send to the Device and Operate

Just like `torch` can send tensors to the CUDA device, you can send `ttnn` tensors to the Tenstorrent device. 

Straightforward here, you can use `matmul` to compute the result once you have those tensors in. `output` lives in the device. 

In [5]:
a = ttnn.to_device(a, device)
b = ttnn.to_device(b, device)

output = ttnn.matmul(a, b)



## Check the Result

Print and assert...

In [6]:
print(f"{output.shape}, {output.dtype}, {output}")

assert output[0] == 21.0000
assert output.shape == [1, 1]
assert output.dtype == ttnn.bfloat16

Shape([1, 1]), DataType.BFLOAT16, ttnn.Tensor([[21.00000]], shape=Shape([1, 1]), dtype=DataType::BFLOAT16, layout=Layout::TILE)


## Sending the Result back to Host

Here, we demonstrate how we can move the `output` tensor back to the host computer (CPU memory)

In [7]:
output_cpu = ttnn.from_device(output)

Same as before, we validate.

In [8]:
print(output_cpu)

ttnn.Tensor([[21.00000]], shape=Shape([1, 1]), dtype=DataType::BFLOAT16, layout=Layout::TILE)


## Close the Device

In [9]:
ttnn.close_device(device)

                  Metal | INFO     | Closing device 0
                  Metal | INFO     | Disabling and clearing program cache on device 0
