# Common PyTorch Functions and Methods

## Tensor Creation

### torch.tensor(): Creates a tensor from data (like a list or array).

In [2]:
import torch

data = [[1, 2], [3, 4]]
x = torch.tensor(data)
x

tensor([[1, 2],
        [3, 4]])

### torch.zeros(): Creates a tensor filled with 0s.

In [3]:
x = torch.zeros(2, 3) # 2 rows, 3 columns of 0.0
x

tensor([[0., 0., 0.],
        [0., 0., 0.]])

### torch.ones(): Creates a tensor filled with 1s.

In [5]:
x = torch.ones(2,3,4) # 1D tensor with five 1.0s
x

tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])

### torch.rand(): Random values from a uniform distribution $[0, 1)$.



In [7]:
x = torch.rand(3, 4) # Useful for initializing weights randomly
x

tensor([[0.6515, 0.4992, 0.3089, 0.5694],
        [0.9685, 0.2066, 0.6741, 0.1608],
        [0.8231, 0.6310, 0.9926, 0.0780]])

### torch.randn(): Random values from a normal distribution (mean 0, var 1).

In [8]:
x = torch.randn(3, 3) # Often preferred for initializing neural network layers
x

tensor([[ 1.9755, -2.8533, -0.0843],
        [ 1.2666, -0.2682, -0.5117],
        [-0.4182,  1.0114, -1.2105]])

### torch.arange(): Returns a 1D tensor with a range of values.

In [9]:
x = torch.arange(start=0, end=10, step=2) # tensor([0, 2, 4, 6, 8])
x

tensor([0, 2, 4, 6, 8])

### torch.linspace(): Creates evenly spaced values between a start and end.

In [10]:
x = torch.linspace(start=0, end=1, steps=5) # tensor([0.0, 0.25, 0.5, 0.75, 1.0])
x

tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])

### torch.eye(): Creates an identity matrix.

In [11]:
x = torch.eye(3) # 3x3 Identity matrix
x

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

In [12]:
x = torch.eye(4, 2) # 4x2 Identity matrix
x

tensor([[1., 0.],
        [0., 1.],
        [0., 0.],
        [0., 0.]])

In [14]:
x = torch.eye(5, 3) # 5x3 Identity matrix
x

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 0., 0.],
        [0., 0., 0.]])

### torch.full(): Creates a tensor filled with a specific value.

In [15]:
x = torch.full((2, 2), 3.14159) # A 2x2 tensor of Pi
x

tensor([[3.1416, 3.1416],
        [3.1416, 3.1416]])

In [16]:
x = torch.full((4, 3), 3.14159) # A 4x3 tensor of Pi
x

tensor([[3.1416, 3.1416, 3.1416],
        [3.1416, 3.1416, 3.1416],
        [3.1416, 3.1416, 3.1416],
        [3.1416, 3.1416, 3.1416]])

### torch.from_numpy(): Converts a NumPy array into a PyTorch tensor.

In [17]:
import numpy as np
n = np.array([1, 2, 3])
t = torch.from_numpy(n)
t

tensor([1, 2, 3])

### torch.zeros_like(): Creates a tensor of 0s with the same shape as another.

In [18]:
y = torch.zeros_like(x) # x is any existing tensor
y

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

### torch.ones_like(): Creates a tensor of 1s with the same shape as another.

In [19]:
y = torch.ones_like(x)
y

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

### dtype: Concept of data types (e.g., torch.float32, torch.int64).

In [20]:
x = torch.ones(2, 2, dtype=torch.float16) # Saves memory compared to float32
x

tensor([[1., 1.],
        [1., 1.]], dtype=torch.float16)

### device: Concept of where the tensor lives (cpu, cuda, or mps).

In [22]:
# Moving a tensor to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
x = torch.rand(3, 3).to(device)
x

tensor([[0.9810, 0.5945, 0.9302],
        [0.4438, 0.0609, 0.8142],
        [0.7583, 0.6873, 0.9753]])

### requires_grad: A flag that enables gradient tracking for a tensor.

- If set to True, PyTorch starts tracking every operation on this tensor for automatic differentiation (calculating gradients).

In [25]:
# Weights in a model need this to be True to learn!
weights = torch.randn(10, 5, requires_grad=True) # Default is False
weights

tensor([[ 0.6116,  0.8640,  0.6185, -0.6311,  0.3483],
        [ 1.5945,  0.4967,  0.2957, -1.7618,  0.9806],
        [ 0.3773,  0.7827, -0.9259,  0.1817,  0.5624],
        [-0.9019,  0.5726,  0.3483, -1.5594, -0.7715],
        [ 1.6089,  1.4334, -0.0570,  0.2737, -0.0631],
        [-0.1730, -1.1494,  0.1716,  0.6753,  0.7992],
        [ 0.8921, -0.8448,  0.9771,  0.8726,  1.2721],
        [ 0.1113, -1.3985, -0.1056, -1.3583, -1.9216],
        [-0.7093, -0.1859,  1.5060, -0.5990,  1.0691],
        [-0.5215, -1.1779,  0.5231,  0.0641, -0.7135]], requires_grad=True)

## Tensor Shaping & Manipulation

### view(): 
- Returns a new tensor with the same data but a different shape. The new shape must have the same number of elements as the original.

- Constraint: It only works on "contiguous" tensors (data stored in a single block of memory).

In [27]:
x = torch.randn(4, 4) # 16 elements
print("Original shape:", x.shape)
y = x.view(2, 8)      # Reshaped to 2x8
print("Reshaped to 2x8:", y.shape)
z = x.view(-1, 2)     # -1 tells PyTorch to "figure out" the dimension (results in 8x2)
print("Reshaped to 8x2:", z.shape)

Original shape: torch.Size([4, 4])
Reshaped to 2x8: torch.Size([2, 8])
Reshaped to 8x2: torch.Size([8, 2])


### **reshape():** 
- Similar to view, but more robust. If the tensor isn't contiguous, it will copy the data to a new memory block automatically.

### **flatten():** 
- Collapses a range of dimensions into one. Often used to turn a 2D image into a 1D vector before a fully connected layer.

In [29]:
x = torch.randn(3, 4)
print("Original shape:", x.shape)
y = x.reshape(2, 6)  # Reshaped to 2x6
print("Reshaped to 2x6:", y.shape)

Original shape: torch.Size([3, 4])
Reshaped to 2x6: torch.Size([2, 6])


In [28]:
x = torch.randn(1, 28, 28)
print("Original shape:", x.shape)
y = x.flatten() # Shape: [784]
print("Flattened shape:", y.shape)
z = x.view(-1)  # Shape: [784]
print("Reshaped to 1D shape:", z.shape)

Original shape: torch.Size([1, 28, 28])
Flattened shape: torch.Size([784])
Reshaped to 1D shape: torch.Size([784])


### **transpose():**
- Swaps exactly two dimensions.

In [30]:
x = torch.randn(3, 5)
y = x.transpose(0, 1) # Shape: [5, 3]
y

tensor([[ 0.3671,  1.2576, -0.6083],
        [ 1.2173, -0.3585,  0.1795],
        [ 1.2256,  0.4310,  0.6151],
        [-0.9671,  1.2287, -1.2524],
        [ 0.1616, -1.1140,  0.2838]])

In [33]:
x = torch.tensor([[1,2,3,4], [5,6,7,8], [9,10,11,12]])  # 3x4 tensor
print("Original shape:\n", x)
y = x.t()  # Transposed to 4x3
y

Original shape:
 tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])


tensor([[ 1,  5,  9],
        [ 2,  6, 10],
        [ 3,  7, 11],
        [ 4,  8, 12]])

### **permute():** 
- A more powerful version of transpose that can reorder any number of dimensions at once.

In [34]:
# Useful for converting (Batch, Channels, H, W) -> (Batch, H, W, Channels)
x = torch.randn(16, 3, 64, 64)
print("Original shape:", x.shape)
y = x.permute(0, 2, 3, 1) # Shape: [16, 64, 64, 3]
print("Permuted shape:", y.shape)

Original shape: torch.Size([16, 3, 64, 64])
Permuted shape: torch.Size([16, 64, 64, 3])


### **squeeze():** 
- Removes all dimensions of size 1.

In [42]:
x = torch.zeros(1, 5, 1, 3)
print("Original shape:", x.shape)
y = x.squeeze() # Shape: [5, 3]
print("Squeezed shape:", y.shape)

Original shape: torch.Size([1, 5, 1, 3])
Squeezed shape: torch.Size([5, 3])


In [43]:
x = torch.zeros(1, 2, 3, 4)
print("Original shape:", x.shape)
y = x.squeeze() # Shape: [2, 3, 4], Removes only dimension 1 (second dimension) 
print("Squeezed shape:", y.shape)

Original shape: torch.Size([1, 2, 3, 4])
Squeezed shape: torch.Size([2, 3, 4])


### **unsqueeze():** 
- Adds a dimension of size 1 at the specified index. This is essential for adding a "batch" dimension to a single input.

In [36]:
x = torch.randn(3, 224, 224) # A single RGB image
print("Original shape:", x.shape)
y = x.unsqueeze(0)           # Shape: [1, 3, 224, 224] (Ready for a model)
print("Unsqueezed shape:", y.shape)

Original shape: torch.Size([3, 224, 224])
Unsqueezed shape: torch.Size([1, 3, 224, 224])


### **torch.cat():** 
- Concatenates tensors along an existing dimension.

In [45]:
a = torch.randn(2, 3)
b = torch.randn(2, 3)
c = torch.cat((a, b), dim=0) # Shape: [4, 3]
print("Concatenated shape:", c.shape)

Concatenated shape: torch.Size([4, 3])


### **torch.stack():** 
- Joins tensors along a new dimension.

In [46]:
c = torch.stack((a, b), dim=0) # Shape: [2, 2, 3]
print("Stacked shape:", c.shape)

Stacked shape: torch.Size([2, 2, 3])


### **torch.split():** 
- Breaks a tensor into chunks of a specific size.

In [49]:
# Example: split tensor into chunks of size 2 along dim=1
x = torch.randn(4, 4)
c = torch.split(x, 2, dim=1)  # returns a tuple of tensors
print("Split tensors (size=2 along dim=1):", c)

# Example: split with a list of sizes
d = torch.split(x, [1, 3], dim=1)
print("Split tensors (sizes [1,3]):", d)

Split tensors (size=2 along dim=1): (tensor([[-1.2095,  0.0408],
        [-0.6972,  1.3195],
        [ 0.7223, -0.2128],
        [ 1.5339,  0.1758]]), tensor([[-1.0989, -0.3553],
        [-0.7979,  0.3025],
        [-0.0906,  0.8457],
        [-0.3977,  0.3002]]))
Split tensors (sizes [1,3]): (tensor([[-1.2095],
        [-0.6972],
        [ 0.7223],
        [ 1.5339]]), tensor([[ 0.0408, -1.0989, -0.3553],
        [ 1.3195, -0.7979,  0.3025],
        [-0.2128, -0.0906,  0.8457],
        [ 0.1758, -0.3977,  0.3002]]))


In [50]:
# torch.chunk(): Splits a tensor into a specific number of chunks along a dimension
x = torch.randn(4, 6)
print("Original shape:", x.shape)
chunks = torch.chunk(x, chunks=2, dim=0)  # Split into 2 chunks along dimension 0
print("Number of chunks:", len(chunks))
print("Shape of each chunk:", [chunk.shape for chunk in chunks])

Original shape: torch.Size([4, 6])
Number of chunks: 2
Shape of each chunk: [torch.Size([2, 6]), torch.Size([2, 6])]


### **gather():** 
- Collects values from a tensor along an axis using an index tensor. (Common in Reinforcement Learning to select specific action probabilities).

In [51]:
# Examples of torch.gather using tensors already defined in the notebook.

# 1) 2D gather along dim=1 (select columns per row) using 'a' (2x3)
idx1 = torch.tensor([[0, 2, 1],
                     [1, 0, 2]], dtype=torch.long)
res1 = torch.gather(a, dim=1, index=idx1)
print("a:\n", a)
print("idx1:\n", idx1)
print("gather(a, dim=1, idx1):\n", res1)

# 2) gather along dim=0 (select rows per column) using 'b' (2x3)
idx2 = torch.tensor([[1, 0, 1],
                     [0, 1, 0]], dtype=torch.long)
res2 = torch.gather(b, dim=0, index=idx2)
print("\nb:\n", b)
print("idx2:\n", idx2)
print("gather(b, dim=0, idx2):\n", res2)

# 3) Select one column per row from x (4x6)
idx3 = torch.randint(0, x.size(1), (x.size(0), 1), dtype=torch.long)  # shape [4,1]
res3 = torch.gather(x, dim=1, index=idx3)  # shape [4,1]
print("\nx:\n", x)
print("idx3 (one index per row):\n", idx3)
print("gather(x, dim=1, idx3):\n", res3)

# 4) RL-style: select action-values from 'weights' (10x5) given action indices
actions = torch.tensor([0, 1, 2, 3, 4, 0, 1, 2, 3, 4], dtype=torch.long).unsqueeze(1)  # [10,1]
selected_values = torch.gather(weights, dim=1, index=actions)  # [10,1]
print("\nSelected action values from weights:\n", selected_values)

# 5) 4D gather: use 'y' (1x2x3x4) and pick one element from last dim per position
idx_y = torch.randint(0, y.size(-1), (y.size(0), y.size(1), y.size(2), 1), dtype=torch.long)  # [...,1]
res_y = torch.gather(y, dim=3, index=idx_y)
print("\ny.shape:", y.shape)
print("idx_y.shape:", idx_y.shape)
print("gather(y, dim=3, idx_y).shape:", res_y.shape)

a:
 tensor([[-1.8226, -1.2664, -1.1982],
        [-0.6893,  1.5571, -0.2017]])
idx1:
 tensor([[0, 2, 1],
        [1, 0, 2]])
gather(a, dim=1, idx1):
 tensor([[-1.8226, -1.1982, -1.2664],
        [ 1.5571, -0.6893, -0.2017]])

b:
 tensor([[-1.4220,  0.7331, -1.4626],
        [ 0.2823, -1.3231, -1.4902]])
idx2:
 tensor([[1, 0, 1],
        [0, 1, 0]])
gather(b, dim=0, idx2):
 tensor([[ 0.2823,  0.7331, -1.4902],
        [-1.4220, -1.3231, -1.4626]])

x:
 tensor([[ 0.5097, -0.5562,  0.1617, -0.4310, -0.4496, -0.2203],
        [-0.0255,  0.5188, -0.6625, -0.6019,  0.1647,  0.8892],
        [ 1.1193, -0.9448,  0.5424, -0.4241, -0.0626, -1.5036],
        [ 0.2396, -0.5232, -0.5933, -0.1551, -3.1055, -0.5005]])
idx3 (one index per row):
 tensor([[3],
        [2],
        [0],
        [4]])
gather(x, dim=1, idx3):
 tensor([[-0.4310],
        [-0.6625],
        [ 1.1193],
        [-3.1055]])

Selected action values from weights:
 tensor([[ 0.6116],
        [ 0.4967],
        [-0.9259],
        [

### **scatter_():** 
- The opposite of gather; it writes values into a tensor at specified indices. (The _ means it happens in-place).

In [52]:
# 1) Basic scatter_: place src values into target according to idx1 (shape [2,3])
target = torch.zeros_like(a)
src = torch.tensor([[10., 20., 30.],
                    [40., 50., 60.]], dtype=target.dtype)
target.scatter_(dim=1, index=idx1, src=src)
print("scatter_ basic ->\n", target)

# 2) scatter_add_: accumulate values at indices (useful when indices repeat)
t2 = torch.zeros(2, 3, dtype=a.dtype)
src2 = torch.tensor([[1., 1., 1.],
                     [1., 1., 1.]], dtype=t2.dtype)
t2.scatter_add_(dim=1, index=idx1, src=src2)
print("scatter_add_ ->\n", t2)

# 3) One-hot encoding from actions (common RL pattern)
one_hot = torch.zeros(actions.size(0), weights.size(1), dtype=weights.dtype, device=weights.device)
one_hot.scatter_(1, actions, 1.0)
print("one_hot shape:", one_hot.shape)
print(one_hot[:5])

# 4) Scatter into 4D tensor using idx_y (inverse-style of the gather example)
target_y = torch.zeros_like(y)
src_y = torch.ones_like(idx_y, dtype=y.dtype)
target_y.scatter_(3, idx_y, src_y)  # write 1.0 at positions specified in idx_y along last dim
print("target_y.sum() (number of scattered ones):", target_y.sum().item())

scatter_ basic ->
 tensor([[10., 30., 20.],
        [50., 40., 60.]])
scatter_add_ ->
 tensor([[1., 1., 1.],
        [1., 1., 1.]])
one_hot shape: torch.Size([10, 5])
tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]])
target_y.sum() (number of scattered ones): 6.0


### **expand():** 
- Efficiently makes a tensor look larger by "repeating" it without actually copying the data in memory.

In [55]:
x = torch.tensor([[1], [2]]) # Shape: [2, 1]
y = x.expand(2, 4)           # Shape: [2, 4], but uses no extra memory!
y

tensor([[1, 1, 1, 1],
        [2, 2, 2, 2]])

### **repeat():** 
- Unlike expand, this physically copies the data to fill the new shape.

In [53]:
y = x.repeat(1, 4) # Shape: [2, 4], but creates a new, larger memory block.
y

tensor([[ 0.5097, -0.5562,  0.1617, -0.4310, -0.4496, -0.2203,  0.5097, -0.5562,
          0.1617, -0.4310, -0.4496, -0.2203,  0.5097, -0.5562,  0.1617, -0.4310,
         -0.4496, -0.2203,  0.5097, -0.5562,  0.1617, -0.4310, -0.4496, -0.2203],
        [-0.0255,  0.5188, -0.6625, -0.6019,  0.1647,  0.8892, -0.0255,  0.5188,
         -0.6625, -0.6019,  0.1647,  0.8892, -0.0255,  0.5188, -0.6625, -0.6019,
          0.1647,  0.8892, -0.0255,  0.5188, -0.6625, -0.6019,  0.1647,  0.8892],
        [ 1.1193, -0.9448,  0.5424, -0.4241, -0.0626, -1.5036,  1.1193, -0.9448,
          0.5424, -0.4241, -0.0626, -1.5036,  1.1193, -0.9448,  0.5424, -0.4241,
         -0.0626, -1.5036,  1.1193, -0.9448,  0.5424, -0.4241, -0.0626, -1.5036],
        [ 0.2396, -0.5232, -0.5933, -0.1551, -3.1055, -0.5005,  0.2396, -0.5232,
         -0.5933, -0.1551, -3.1055, -0.5005,  0.2396, -0.5232, -0.5933, -0.1551,
         -3.1055, -0.5005,  0.2396, -0.5232, -0.5933, -0.1551, -3.1055, -0.5005]])

## Mathematical Operations

### add() / +: Adds two tensors or a tensor and a scalar.

### sub() / -: Subtracts one tensor from another.

### mul() / *: Performs element-wise multiplication (Hadamard product).

### div() / /: Performs element-wise division.

In [57]:
a = torch.tensor([[1,2,3,4], [5,6,7,8], [9,10,11,12]])  # 3x4 tensor
b = torch.tensor([[13,14,15,16], [17,18,19,20], [21,22,23,24]])  # 3x4 tensor
c = a + b # Element-wise addition
print(c)

tensor([[14, 16, 18, 20],
        [22, 24, 26, 28],
        [30, 32, 34, 36]])


In [58]:
c = torch.add(a, b) # Element-wise addition
print(c)

tensor([[14, 16, 18, 20],
        [22, 24, 26, 28],
        [30, 32, 34, 36]])


In [59]:
c = a - b # Element-wise subtraction
print(c)

tensor([[-12, -12, -12, -12],
        [-12, -12, -12, -12],
        [-12, -12, -12, -12]])


In [60]:
c = torch.sub(b, a) # Element-wise subtraction
print(c)

tensor([[12, 12, 12, 12],
        [12, 12, 12, 12],
        [12, 12, 12, 12]])


In [56]:
a = torch.tensor([1, 2])
b = torch.tensor([3, 4])
print(a * b) # tensor([3, 8])

tensor([3, 8])


In [61]:
print(torch.div(a, b)) # tensor([0.3333, 0.5000])

tensor([[0.0769, 0.1429, 0.2000, 0.2500],
        [0.2941, 0.3333, 0.3684, 0.4000],
        [0.4286, 0.4545, 0.4783, 0.5000]])


### matmul(): 
- The generic matrix multiplication function. It handles 1D, 2D, and high-dimensional tensors with broadcasting.



In [63]:
# Matrix (2x3) @ Matrix (3x2) -> (2x2)
mat1 = torch.randn(2, 3)
mat2 = torch.randn(3, 2)
res = torch.matmul(mat1, mat2) # Or mat1 @ mat2
print("Matrix multiplication result shape:", res.shape)

Matrix multiplication result shape: torch.Size([2, 2])


In [None]:
mat1 = torch.tensor([[1, 2, 3], [4, 5, 6]])
mat2 = torch.tensor([[7, 8], [9, 10], [11, 12]])
res = torch.matmul(mat1, mat2) # Shape: [2, 2]
res

tensor([[ 58,  64],
        [139, 154]])

In [69]:
mat1 = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]) # Shape: [4, 3]
mat2 = torch.tensor([[9, 8, 7], [6, 5, 4], [3, 2, 1]]) # Shape: [3, 3]
mat3 = torch.matmul(mat1, mat2) # Shape: [4, 3]
mat3

tensor([[ 30,  24,  18],
        [ 84,  69,  54],
        [138, 114,  90],
        [192, 159, 126]])

In [71]:
mat1 = torch.randn(4, 3)
mat2 = torch.randn(3, 5)
mat3 = torch.matmul(mat1, mat2) # Shape: [4, 5]
mat3

tensor([[ 3.8558,  1.7894, -2.6648, -3.5824,  5.1454],
        [-2.0461, -1.6274,  0.8376,  1.0412, -1.2940],
        [-3.8472, -0.6393,  0.0744,  3.2738, -0.8600],
        [ 1.1481,  0.1155, -0.9582, -1.5022,  2.0582]])

### mm(): 
- A stricter version of matrix multiplication that only works for two 2D tensors (matrices). Use this for speed when you know your data is 2D.

In [73]:
mat1 = torch.randn(2, 3) # 2D tensor
mat2 = torch.randn(3, 2) # 2D tensor
res = torch.mm(mat1, mat2) # 2D Matrix multiplication
res

tensor([[-1.9101, -0.2616],
        [ 0.7204,  0.8979]])

### bmm(): 
- Batch Matrix Multiplication. Used when you have a batch of matrices (e.g., shape $[B, N, M]$) and want to multiply them without a loop.

In [74]:
mat1 = torch.randn(2, 4, 3)
mat2 = torch.randn(2, 3, 5)
res = torch.bmm(mat1, mat2) # Shape: [2, 4, 5]
res

tensor([[[-2.4323, -1.5249, -0.9572, -1.9377,  0.2226],
         [-1.5481, -0.7953, -0.8928, -1.3014,  0.0399],
         [-0.4902,  0.5933,  0.1729,  0.4578, -0.4212],
         [-0.4300, -0.9472, -0.3385, -0.9112,  0.3932]],

        [[-2.7766,  2.9747, -1.2182,  4.4545, -2.1316],
         [-0.5441,  1.1734,  0.2189,  1.1327, -0.2883],
         [-0.3703,  0.5993,  0.7820,  0.2181, -0.4616],
         [ 0.2117, -0.0614, -0.4298,  0.1176,  0.3820]]])

### sum(), mean(), std(), var(): 
- Computes the sum, average, standard deviation, and variance. You can compute these across the whole tensor or along a specific dim.



In [77]:
a = torch.randn(4, 5)
a

tensor([[ 0.0703,  0.6337, -0.9541, -0.9055,  1.9880],
        [-0.9852, -1.6449, -0.5266, -0.1445, -1.8555],
        [ 1.4578, -0.6934,  0.8816, -0.2342,  1.0401],
        [ 0.0774,  1.9772, -0.8595,  0.5405,  0.8523]])

In [78]:
# SUM
print("a.sum():", a.sum())                          # sum all elements of 2D tensor 'a'
print("torch.sum(a, dtype=torch.float32):", torch.sum(a, dtype=torch.float32))  # specify dtype
print("a.sum(dim=1):", a.sum(dim=1))                # sum along columns -> per-row sums
print("res.sum(dim=(0,1)):", res.sum(dim=(0, 1)))   # sum over first two dims of 3D tensor -> shape [5]

# MEAN
print("mat3.mean():", mat3.mean())                  # global mean
print("mat3.mean(dim=1, keepdim=True):", mat3.mean(dim=1, keepdim=True))  # mean per-row, keepdim

# STD / VAR
print("z.std():", z.std())                          # standard deviation (unbiased=True by default)
print("z.std(unbiased=False):", z.std(unbiased=False))  # population std
print("z.var():", z.var())                          # variance (unbiased=True default)
print("z.var(unbiased=False):", z.var(unbiased=False))  # population variance

# Useful aggregated examples
print("weights.mean(dim=1):", weights.mean(dim=1))  # per-row mean for a 2D tensor with requires_grad
print("torch.from_numpy(n).sum():", torch.from_numpy(n).sum())  # convert numpy array 'n' and sum

a.sum(): tensor(0.7154)
torch.sum(a, dtype=torch.float32): tensor(0.7154)
a.sum(dim=1): tensor([ 0.8324, -5.1568,  2.4518,  2.5879])
res.sum(dim=(0,1)): tensor([-8.3798,  2.0119, -2.6627,  2.2303, -2.2650])
mat3.mean(): tensor(0.0159)
mat3.mean(dim=1, keepdim=True): tensor([[ 0.9087],
        [-0.6177],
        [-0.3997],
        [ 0.1723]])
z.std(): tensor(1.0034)
z.std(unbiased=False): tensor(1.0027)
z.var(): tensor(1.0067)
z.var(unbiased=False): tensor(1.0054)
weights.mean(dim=1): tensor([ 0.3623,  0.3211,  0.1957, -0.4624,  0.6392,  0.0647,  0.6338, -0.9345,
         0.2162, -0.3652], grad_fn=<MeanBackward1>)
torch.from_numpy(n).sum(): tensor(6)


### max() / min(): 
- Returns the maximum or minimum values.


In [79]:
print("Maximum and Minimum", a.max(), a.min())

Maximum and Minimum tensor(1.9880) tensor(-1.8555)


### argmax() / argmin(): 
- Returns the indices of the maximum or minimum values. This is how you determine the "predicted class" in classification.

- Note on keepdim: When reducing, the dimension usually disappears. Setting keepdim=True maintains the original number of dimensions, which is helpful for broadcasting later.

In [81]:
probs = torch.tensor([0.1, 0.7, 0.2])
prediction = torch.argmax(probs) # tensor(1)
prediction

tensor(1)

In [84]:
# argmax() example with a 2D tensor
print("a:\n", a)
predictions = torch.argmax(a, dim=1)  # Get index of max value per row
print("argmax along dim=1:", predictions)

a:
 tensor([[ 0.0703,  0.6337, -0.9541, -0.9055,  1.9880],
        [-0.9852, -1.6449, -0.5266, -0.1445, -1.8555],
        [ 1.4578, -0.6934,  0.8816, -0.2342,  1.0401],
        [ 0.0774,  1.9772, -0.8595,  0.5405,  0.8523]])
argmax along dim=1: tensor([4, 3, 0, 1])


In [None]:
probs = torch.tensor([0.1, 0.7, 0.2])
prediction = torch.argmin(probs) # tensor(0)
prediction

tensor(0)

### abs(): Computes the absolute value element-wise.

In [88]:
x = torch.tensor([[-1, -2, -3, -4, -5], [6, 7, 8, 9, 10]])
print("Absolutes value element-wise:", torch.abs(x))

Absolutes value element-wise: tensor([[ 1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10]])


In [87]:
x = torch.randn(2,3)
print("Original x:\n", x)
print("Absolutes value element-wise:", torch.abs(x))

Original x:
 tensor([[ 1.2701, -1.1278,  0.4384],
        [ 0.3268,  1.7379,  0.7190]])
Absolutes value element-wise: tensor([[1.2701, 1.1278, 0.4384],
        [0.3268, 1.7379, 0.7190]])


### exp(): Computes the exponential $e^x$.

In [89]:
x = torch.randn(2, 3) # Uninitialized tensor
print("Original x:\n", x)
print("Exponential value: ", torch.exp(x))

Original x:
 tensor([[-1.0309,  0.6312,  1.8066],
        [ 0.6729, -0.4591,  1.7287]])
Exponential value:  tensor([[0.3567, 1.8799, 6.0897],
        [1.9599, 0.6319, 5.6331]])


### log(): Computes the natural logarithm $\ln(x)$.

In [90]:
x = torch.randn(2,3)
print(x)
print("Natural logarithm: ", torch.log(x))

tensor([[ 0.5247, -0.7139, -0.5295],
        [-1.5583,  0.4529,  0.7406]])
Natural logarithm:  tensor([[-0.6450,     nan,     nan],
        [    nan, -0.7920, -0.3003]])


### pow(): Computes the power $x^y$.

In [91]:
x = torch.tensor([1, 2, 3])
print(torch.pow(x, 2)) # tensor([1, 4, 9])

tensor([1, 4, 9])


In [92]:
x = torch.tensor([1, 2, 3])
print(torch.pow(x, 3)) # tensor([1, 8, 27])

tensor([ 1,  8, 27])


### norm(): 
- Computes the vector or matrix norm.
    - L2 Norm (Euclidean): $\sqrt{\sum |x_i|^2}$ — the default.
    - L1 Norm (Manhattan): $\sum |x_i|$ — often used for regularization to encourage sparsity.

In [93]:
v = torch.tensor([3.0, 4.0])
print(torch.norm(v, p=2)) # tensor(5.0)

tensor(5.)


In [94]:
v = torch.tensor([1.0, -1.0, 2.0, -2.0])
print(torch.norm(v, p=1)) # tensor(6.0)

tensor(6.)


In [96]:
v = torch.randn(3, 4)
print(v)
print(torch.norm(v, p='fro')) # Frobenius norm for 2D tensor

tensor([[-0.3894, -1.4843,  1.1277, -0.3767],
        [ 0.6728,  0.2587, -1.5491, -1.2233],
        [ 0.6190,  0.8564, -0.1544, -1.9031]])
tensor(3.5981)


## Autograd
- It is the engine that automatically calculates derivatives (gradients) for your neural network.

### backward(): 
- This is the most famous method in PyTorch. When you call loss.backward(), PyTorch travels backward through the Computational Graph, calculating the gradient of the loss with respect to every tensor that has requires_grad=True.

### grad: 
- After calling .backward(), the calculated gradients are stored in this attribute of the tensor.

In [97]:
# If x is a weight, x.grad tells us: 
# "How much will the loss change if I move x slightly?"

### grad_fn: 
- Every tensor created by an operation (like y = x + 2) has a grad_fn. This is a reference to the function that created it, allowing PyTorch to trace the path back to the inputs.

### zero_grad(): 
- Gradients in PyTorch accumulate (add up) by default. Before you start a new calculation, you must call this to wipe the old gradients clean, otherwise your model will get "confused" by previous training steps.

### Efficiency & Inference 
- When you are just using a model to make predictions (Inference), you don't need gradients. Disabling them saves massive amounts of memory and speed.

#### torch.no_grad(): 
- A context manager used during validation or testing. It tells PyTorch: "Don't track anything here; I'm not training."

In [99]:
with torch.no_grad():
    # prediction = model(input_data)
    print("Prediction from model, use for inference")

Prediction from model, use for inference


#### torch.enable_grad(): 
- The opposite of no_grad, used if you need to force-enable tracking within a block that was otherwise disabled.

#### is_grad_enabled(): 
- A simple check to see if PyTorch is currently recording history for gradients.

### Graph Surgery
- Sometimes you need to "break" the graph or save specific parts of it.

#### detach(): 
- Returns a new tensor that is "cut off" from the computational graph. It shares the same data but will never track gradients. This is common when you want to use a value for a side-calculation without affecting the model's training.

#### clone(): 
- Creates a copy of the tensor that stays in the graph. If you modify the clone, the gradients will still flow back to the original source.

#### retain_grad(): 
- By default, PyTorch only saves gradients for "leaf" tensors (your weights). If you want to see the gradient for an intermediate activation (like the output of a hidden layer), you must call this on that tensor.