# 2 - View and ShapeTracker



So far we have been making scalar UOps that don't have a shape associated with them.

While we have been getting away with it so far, the UOp trees we made are not really valid without a shape.

In [36]:
#| hide
from nbdev.showdoc import *
import nbdev; nbdev.nbdev_export()
import os

In [37]:
os.environ["CPU"] = "1"
os.environ["DEBUG"]="4"

import tinygrad as tg
from tinygrad import Tensor, dtypes
from tinygrad.ops import UOp, Ops
from tinygrad.spec import type_verify

In [38]:
a = UOp.const(dtypes.float, 1)
a

UOp(Ops.CONST, dtypes.float, arg=1.0, src=())

In [43]:
#| hide

import sys
import traceback
import linecache

def print_last_frame_context(exception, num_lines=2):
    # Get the last frame from the traceback
    tb = traceback.extract_tb(sys.exc_info()[2])
    last_frame = tb[-1]

    # Unpack frame info
    filename, lineno, funcname, line = last_frame

    # Print location info
    print(f"{type(exception).__name__} in {filename}:{lineno} in {funcname}()")

    # Get surrounding lines
    start = max(lineno - num_lines, 1)
    end = lineno + num_lines

    # Print code context
    print("\nCode context:")
    for i in range(start, end + 1):
        line = linecache.getline(filename, i).rstrip()
        prefix = "--->" if i == lineno else "    "
        print(f"{prefix} {i:4d} {line}")

In [44]:
try:
    print(a.shape)
except Exception as e:
    print_last_frame_context(e)

AssertionError in /home/xl0/work/projects/grads/tinygrad/tinygrad/helpers.py:61 in unwrap()

Code context:
       59   return ret
       60 def unwrap(x:Optional[T]) -> T:
--->   61   assert x is not None
       62   return x
       63 def get_single_element(x:list[T]) -> T:


Another thing we were missing is the device:

In [45]:
try:
    print(a.device)
except Exception as e:
    print_last_frame_context(e)

AssertionError in /home/xl0/work/projects/grads/tinygrad/tinygrad/helpers.py:61 in unwrap()

Code context:
       59   return ret
       60 def unwrap(x:Optional[T]) -> T:
--->   61   assert x is not None
       62   return x
       63 def get_single_element(x:list[T]) -> T:


Let's fix this real quick

In [7]:
from tinygrad.shape.shapetracker import ShapeTracker, View

a = UOp.const(dtypes.float, 1).replace(src=(
        UOp(Ops.VIEW, dtypes.void, arg=ShapeTracker.from_shape( (0,) ), src=(
            UOp(Ops.DEVICE, dtypes.void, arg="CPU", src=()),)),))
a


UOp(Ops.CONST, dtypes.float, arg=1.0, src=(
  UOp(Ops.VIEW, dtypes.void, arg=ShapeTracker(views=(View(shape=(0,), strides=(0,), offset=0, mask=None, contiguous=True),)), src=(
    UOp(Ops.DEVICE, dtypes.void, arg='CPU', src=()),)),))

In [8]:
a.shape

(0,)

In [9]:
a.device

'CPU'

Looks better.

Now, what's up with that `ShapeTracker` and `View`. Let's start with the later.

## View

You are probably familiar with how shape and strides work in Pytorch or Numpy:

In [10]:
import torch

In [11]:
a = torch.linspace(0, 31, 32, dtype=torch.int32).view(4, 8)
a

tensor([[ 0,  1,  2,  3,  4,  5,  6,  7],
        [ 8,  9, 10, 11, 12, 13, 14, 15],
        [16, 17, 18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29, 30, 31]], dtype=torch.int32)

In [12]:
a.shape

torch.Size([4, 8])

The shape defined the, well, the shape of the array. It can have any number of dimensions (2 in this case), and each dimension has its size.

A Tensor is just a linear array, and the shape is there for convenience, because we usually want to work with multi-dimensional data.

We can change the shape, as long as the number of elements in the new shape stays the same.

In [13]:
b = a.view(2,4,4) # This creates a view that refers to the same data, but now it's seen as a 3-d array.
b

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11],
         [12, 13, 14, 15]],

        [[16, 17, 18, 19],
         [20, 21, 22, 23],
         [24, 25, 26, 27],
         [28, 29, 30, 31]]], dtype=torch.int32)

The stride tells us how many elements do we need to move in the underlying 1-d array (base), to get to the next element in the given dimension.

For out 2x4x4 array, to move 1 element in the row (last dimension), we need to move ... 1 element in the base.

And to move by one element in the column dimension, we need to move by 4 elements in the base, because each row is 4 elements.

> This is the standard `C`, or `row-major` order format for multidimensional data.

> ![Row-major and Column-major order](https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Row_and_column_major_order.svg/255px-Row_and_column_major_order.svg.png)

> You might have seen references to the `F`, or `column-major` order at some point. Historically this is how data was stored in Fortran, and I'm sure they had their reasons for it, but it's definitely less intuitive.

To move in the next dimension, we'd have to skip 4 columns, and for each column we skip 4 elements, so 16 in total:

In [14]:
b.stride()

(16, 4, 1)

Now, if the stride always matched the shape, things would be boring. We can set the stride independently.

Let's go back to our 4x8 view to make things easies. In this case we need to skip 8 elements to move by one row:

In [15]:
print(a)
print("Shape: ",a.shape)
print("Stride:",a.stride())

tensor([[ 0,  1,  2,  3,  4,  5,  6,  7],
        [ 8,  9, 10, 11, 12, 13, 14, 15],
        [16, 17, 18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29, 30, 31]], dtype=torch.int32)
Shape:  torch.Size([4, 8])
Stride: (8, 1)


What if we want to create a view that would skip every other element in the rows? We can do this by creatig a view with shape (torch refers to it as `size`) 4x4, and stride (8, 2)!

In [16]:
c = a.as_strided(size=(4,4), stride=(8, 2))
c

tensor([[ 0,  2,  4,  6],
        [ 8, 10, 12, 14],
        [16, 18, 20, 22],
        [24, 26, 28, 30]], dtype=torch.int32)

We can also specify an offset from the start of the base array. This will give us the odd elements in each row:

In [17]:
a.as_strided(size=(4,4), stride=(8, 2), storage_offset=1)

tensor([[ 1,  3,  5,  7],
        [ 9, 11, 13, 15],
        [17, 19, 21, 23],
        [25, 27, 29, 31]], dtype=torch.int32)

Let's create a view that has the diagonal elements of `a` (0, 9, 18, 27)

In [18]:
a.as_strided(size=(4,), stride=(9,))

tensor([ 0,  9, 18, 27], dtype=torch.int32)

Another fun thing we can do - set one of more of the strides to 0, to duplicate (broadcast) dimensions:

In [19]:
d = torch.linspace(1, 4, 4, dtype=torch.int32)
d

tensor([1, 2, 3, 4], dtype=torch.int32)

In [20]:
d.as_strided(size=(4,4), stride=(1, 0))

tensor([[1, 1, 1, 1],
        [2, 2, 2, 2],
        [3, 3, 3, 3],
        [4, 4, 4, 4]], dtype=torch.int32)

For each step in the output column, we take 1 step in the base, and for each step in the output row, we don't take any steps at all!

That's how `.full()` works - it creates 1 single element, and makes all elements in the Tensor refer to it by setting the strides to 0.

In [21]:
e = torch.Tensor([1]).to(torch.int32)
e

tensor([1], dtype=torch.int32)

In [22]:
e.as_strided(size=(4,4), stride=(0,0))

tensor([[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]], dtype=torch.int32)

The `View` class is intended to keep track of the shape and stride of the data. Let's play with it a bit.

In [23]:
v = View(shape=(4,8), strides=(8,1), offset=0, mask=None, contiguous=True)
v # A normal array 4x8

View(shape=(4, 8), strides=(8, 1), offset=0, mask=None, contiguous=True)

In [24]:
a

tensor([[ 0,  1,  2,  3,  4,  5,  6,  7],
        [ 8,  9, 10, 11, 12, 13, 14, 15],
        [16, 17, 18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29, 30, 31]], dtype=torch.int32)

In [25]:
a.as_strided(size=v.shape, stride=v.strides)

tensor([[ 0,  1,  2,  3,  4,  5,  6,  7],
        [ 8,  9, 10, 11, 12, 13, 14, 15],
        [16, 17, 18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29, 30, 31]], dtype=torch.int32)

In [26]:
v32 = v.reshape( (32,) ) # 1-d array of 32 elements
v32

View(shape=(32,), strides=(1,), offset=0, mask=None, contiguous=True)

In [27]:
a.as_strided(size=v32.shape, stride=v32.strides)

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
       dtype=torch.int32)

In [28]:
v_flip = v.flip( (False, True) ) # Flip the last dimension
v_flip

View(shape=(4, 8), strides=(8, -1), offset=7, mask=None, contiguous=False)

In [46]:
try:
    a.as_strided(size=v_flip.shape, stride=v_flip.strides)
except Exception as e:
    print_last_frame_context(e, 0)

AttributeError in /tmp/ipykernel_606682/772861321.py:2 in <module>()

Code context:
--->    2     a.as_strided(size=v_flip.shape, stride=v_flip.strides)


Oops, torch actually does not support negative strides. This should have looked like this:

In [30]:
a.flip((1))

tensor([[ 7,  6,  5,  4,  3,  2,  1,  0],
        [15, 14, 13, 12, 11, 10,  9,  8],
        [23, 22, 21, 20, 19, 18, 17, 16],
        [31, 30, 29, 28, 27, 26, 25, 24]], dtype=torch.int32)

Now, what's up with the mask? It allows us to create arrays with elements that are outside of the underlying storage!

For example, if we want to pad a 2-d array, we don't want to allocate a new array - just mark the padded elements as being not valid!

In [31]:
v

View(shape=(4, 8), strides=(8, 1), offset=0, mask=None, contiguous=True)

In [32]:
v.pad( ((1,1,),( 1, 1)) )

View(shape=(6, 10), strides=(8, 1), offset=-9, mask=((1, 5), (1, 9)), contiguous=False)

Torch does not allow negative offsets either, but I think the idea is clear:

![](pad-view.png){width=70% fig-align="left"}