# Learning PyTorch
Here I will document my learning process of deep learning framework pytorch.
- **Tensors** – the core data structure in PyTorch
- **Autograd** – automatic differentiation for training
- **Building Models with nn.Module** – how to define neural network architectures
- **Common Layers & Activation Functions** – essential building blocks for networks
- **Loss Functions** – how to quantify model performance
- **Optimizers** – algorithms to update model parameters
- **Training Loop Structure** – putting it all together to train a model
- **Datasets & DataLoaders** – loading and batching data for training
- **Using GPUs** – leveraging CUDA for faster computation
- **Debugging & Common Pitfalls** – tips to avoid or fix common errors
## Tensors
**Tensors** are the fundamental data structure for storing and manipulating
data. Tensor Attributes: Every tensor has a **shape** (telling you its dimensions), a **dtype** (data type, e.g. float32, int64), and a **device** (CPU or GPU) where it’s stored

In [1]:
import torch
import numpy as np

# 1. Directly from data (e.g. list or nested lists)
data = [[1, 2], [3, 4]]
x_data = torch.tensor(data) # infers dtype automatically

# 2. From a NumPy array
np_array = np.array(data)
x_np = torch.from_numpy(np_array) # shares memory with NumPy when possible

# 3. Using built-in initializers
x_ones = torch.ones_like(x_data) # tensor of ones with same shape as x_data
x_rand = torch.rand_like(x_data, dtype=torch.float32) # random values, specifying dtype

# 4. With specific shapes and values
shape = (2, 3)
rand_tensor = torch.rand(shape) # random values in [0,1)
ones_tensor = torch.ones(shape) # all ones
zeros_tensor = torch.zeros(shape) # all zeros

tensor = torch.rand(3, 4)
print("Shape:", tensor.shape)
print("Datatype:", tensor.dtype)
print("Device:", tensor.device)

Shape: torch.Size([3, 4])
Datatype: torch.float32
Device: cpu


In [3]:
# Basic Operations

# Indexing and slicing
tensor = torch.ones(4, 4)
print(tensor[0]) # First row
print(tensor[:, 0]) # First column

# Elementwise operations
tensor = torch.tensor([[1.0, 2.0],[3.0, 4.0]])
tensor = tensor * 2 + 1
print(tensor)

# Matrix multiplication
A = torch.rand(2, 3)
B = torch.rand(3, 4)
C = A @ B # matrix product resulting in shape (2,4)
print(C)

tensor([1., 1., 1., 1.])
tensor([1., 1., 1., 1.])
tensor([[3., 5.],
        [7., 9.]])
tensor([[1.7984, 1.0619, 1.2744, 0.6247],
        [0.5072, 0.3412, 0.4018, 0.2094]])


## Automatic Differentiation (Autograd)
Autograd frees you from manually computing gradients. It records operations on tensors to build a computational graph, and then it can backpropagate gradients through this graph for you. If you have a tensor that requires gradients `requires_grad=True`, PyTorch will track all operations on it. When you call `.backward()` , it computes the gradient of a scalar output with respect to all tensors that have `requires_grad=True` and contributed to that output. Those gradients are then stored in the `.grad` attribute of each tensor. By default, Tensors do not compute gradients. You need to explicitly indicate which tensors require grad.


In [5]:
# Create a tensor and enable gradient tracking
x = torch.tensor([3.0], requires_grad=True) # a tensor with value 3.0
print(x.requires_grad) # True
# Define a simple function of x
y = x**2 + 2*x + 1 # y = x^2 + 2x + 1
# Compute gradient dy/dx by backpropagation
y.backward() # y is a scalar (1-element tensor), so we can call backward directly
print(x.grad) # prints the gradient (dy/dx) at x=3.0 (2*x + 2)

True
tensor([8.])
