# Tensor Basics Tutorial

## Introduction

This tutorial provides a comprehensive introduction to tensors, the fundamental data structure used in deep learning frameworks like PyTorch. Understanding tensors is crucial for working with neural networks and large language models.

### What You'll Learn
- What tensors are and how they differ from arrays
- Creating and manipulating tensors in PyTorch
- Tensor operations and broadcasting
- Memory management and performance considerations
- Practical applications in deep learning

In [None]:
# Import required libraries
import torch
import numpy as np
import matplotlib.pyplot as plt
import time

# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name()}")

## 1. What Are Tensors?

Tensors are multi-dimensional arrays that generalize scalars, vectors, and matrices to higher dimensions. In the context of deep learning:

- **0D Tensor**: Scalar (single number)
- **1D Tensor**: Vector (list of numbers)
- **2D Tensor**: Matrix (table of numbers)
- **3D+ Tensor**: Higher-dimensional arrays

Tensors in PyTorch are similar to NumPy arrays but with additional capabilities for GPU acceleration and automatic differentiation.

In [None]:
# Creating tensors of different dimensions

# 0D Tensor (Scalar)
scalar = torch.tensor(5.0)
print(f"0D Tensor (Scalar): {scalar}")
print(f"Shape: {scalar.shape}, Dimensions: {scalar.dim()}")

# 1D Tensor (Vector)
vector = torch.tensor([1.0, 2.0, 3.0, 4.0])
print(f"\n1D Tensor (Vector): {vector}")
print(f"Shape: {vector.shape}, Dimensions: {vector.dim()}")

# 2D Tensor (Matrix)
matrix = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
print(f"\n2D Tensor (Matrix):\n{matrix}")
print(f"Shape: {matrix.shape}, Dimensions: {matrix.dim()}")

# 3D Tensor
tensor_3d = torch.tensor([[[1.0, 2.0], [3.0, 4.0]], [[5.0, 6.0], [7.0, 8.0]]])
print(f"\n3D Tensor:\n{tensor_3d}")
print(f"Shape: {tensor_3d.shape}, Dimensions: {tensor_3d.dim()}")

## 2. Creating Tensors

PyTorch provides several ways to create tensors with different properties and initializations.

In [None]:
# Creating tensors with different initialization methods

# From Python lists
tensor_from_list = torch.tensor([1, 2, 3, 4])
print(f"From list: {tensor_from_list}")

# Zeros tensor
zeros_tensor = torch.zeros(3, 4)
print(f"\nZeros tensor:\n{zeros_tensor}")

# Ones tensor
ones_tensor = torch.ones(2, 3)
print(f"\nOnes tensor:\n{ones_tensor}")

# Random tensor
random_tensor = torch.rand(2, 3)
print(f"\nRandom tensor:\n{random_tensor}")

# Normal distribution tensor
normal_tensor = torch.randn(2, 3)
print(f"\nNormal distribution tensor:\n{normal_tensor}")

# Range tensor
range_tensor = torch.arange(0, 10, 2)
print(f"\nRange tensor: {range_tensor}")

# Linearly spaced tensor
linspace_tensor = torch.linspace(0, 1, 5)
print(f"\nLinearly spaced tensor: {linspace_tensor}")

## 3. Tensor Properties and Data Types

Tensors have various properties that define their behavior, including data types, devices, and memory layout.

In [None]:
# Examining tensor properties

tensor = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.float32)

print(f"Tensor:\n{tensor}")
print(f"Data type: {tensor.dtype}")
print(f"Shape: {tensor.shape}")
print(f"Device: {tensor.device}")
print(f"Number of elements: {tensor.numel()}")
print(f"Memory layout: {tensor.layout}")

# Changing data types
int_tensor = tensor.to(torch.int64)
print(f"\nInteger tensor:\n{int_tensor}")
print(f"Data type: {int_tensor.dtype}")

# Moving to GPU if available
if torch.cuda.is_available():
    gpu_tensor = tensor.cuda()
    print(f"\nGPU tensor device: {gpu_tensor.device}")
else:
    print("\nCUDA not available, skipping GPU example")

## 4. Tensor Operations

PyTorch provides a rich set of operations for manipulating tensors, from basic arithmetic to advanced linear algebra.

In [None]:
# Basic arithmetic operations

a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])

print(f"Tensor a: {a}")
print(f"Tensor b: {b}")

# Element-wise operations
print(f"\nAddition (a + b): {a + b}")
print(f"Subtraction (a - b): {a - b}")
print(f"Multiplication (a * b): {a * b}")
print(f"Division (a / b): {a / b}")

# Using PyTorch functions
print(f"\nUsing torch.add(): {torch.add(a, b)}")
print(f"Using torch.mul(): {torch.mul(a, b)}")

# In-place operations (modify tensor directly)
c = a.clone()
c.add_(b)  # In-place addition
print(f"\nIn-place addition result: {c}")

In [None]:
# Matrix operations

matrix_a = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
matrix_b = torch.tensor([[5.0, 6.0], [7.0, 8.0]])

print(f"Matrix A:\n{matrix_a}")
print(f"Matrix B:\n{matrix_b}")

# Matrix multiplication
matmul_result = torch.matmul(matrix_a, matrix_b)
print(f"\nMatrix multiplication (A @ B):\n{matmul_result}")

# Element-wise multiplication
elementwise_result = matrix_a * matrix_b
print(f"\nElement-wise multiplication:\n{elementwise_result}")

# Transpose
transposed = matrix_a.T
print(f"\nTransposed matrix A:\n{transposed}")

# Determinant
det = torch.det(matrix_a)
print(f"\nDeterminant of A: {det}")

## 5. Broadcasting

Broadcasting allows operations between tensors of different shapes under certain conditions.

In [None]:
# Broadcasting examples

# Adding a scalar to a matrix
matrix = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
scalar = 10.0

print(f"Matrix:\n{matrix}")
print(f"Scalar: {scalar}")
print(f"\nMatrix + scalar:\n{matrix + scalar}")

# Adding a vector to a matrix
vector = torch.tensor([1.0, 2.0, 3.0])
print(f"\nVector: {vector}")
print(f"\nMatrix + vector:\n{matrix + vector}")

# Broadcasting rules demonstration
a = torch.tensor([1.0, 2.0, 3.0])  # Shape: [3]
b = torch.tensor([[1.0], [2.0], [3.0]])  # Shape: [3, 1]

print(f"\nVector a shape: {a.shape}")
print(f"Vector b shape: {b.shape}")
print(f"\nBroadcasted addition result shape: {(a + b).shape}")
print(f"Broadcasted addition result:\n{a + b}")

## 6. Indexing and Slicing

Accessing and modifying specific elements or subsets of tensors is a common operation in deep learning.

In [None]:
# Indexing and slicing examples

tensor = torch.tensor([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(f"Original tensor:\n{tensor}")
print(f"Shape: {tensor.shape}")

# Accessing elements
print(f"\nFirst element [0,0,0]: {tensor[0, 0, 0]}")
print(f"Element at [1,0,2]: {tensor[1, 0, 2]}")

# Slicing
print(f"\nFirst 'page':\n{tensor[0]}")
print(f"\nFirst row of each page:\n{tensor[:, 0]}")
print(f"\nFirst column of first page:\n{tensor[0, :, 0]}")

# Advanced indexing
indices = torch.tensor([0, 1])
print(f"\nUsing indices {indices}:\n{tensor[indices]}")

# Boolean indexing
mask = tensor > 5
print(f"\nBoolean mask:\n{mask}")
print(f"Elements > 5: {tensor[mask]}")

## 7. Reshaping and View Operations

Reshaping tensors is essential for preparing data for neural networks and manipulating tensor dimensions.

In [None]:
# Reshaping and view operations

tensor = torch.arange(12)
print(f"Original tensor: {tensor}")
print(f"Shape: {tensor.shape}")

# Reshape to 3x4 matrix
reshaped = tensor.reshape(3, 4)
print(f"\nReshaped to 3x4:\n{reshaped}")

# Reshape to 2x2x3 tensor
reshaped_3d = tensor.reshape(2, 2, 3)
print(f"\nReshaped to 2x2x3:\n{reshaped_3d}")

# Using view (creates a view of the same data)
viewed = tensor.view(4, 3)
print(f"\nViewed as 4x3:\n{viewed}")

# Flatten
flattened = reshaped_3d.flatten()
print(f"\nFlattened: {flattened}")

# Squeeze and unsqueeze
single_dim = torch.tensor([[[1, 2, 3]]])
print(f"\nSingle dim tensor shape: {single_dim.shape}")
squeezed = single_dim.squeeze()
print(f"Squeezed shape: {squeezed.shape}")
unsqueezed = squeezed.unsqueeze(0).unsqueeze(0)
print(f"Unsqueezed shape: {unsqueezed.shape}")

## 8. Performance Considerations

Understanding memory management and performance optimization is crucial for working with large models.

In [None]:
# Performance comparison: In-place vs out-of-place operations

def time_operation(func, *args, iterations=1000):
    start = time.time()
    for _ in range(iterations):
        result = func(*args)
    end = time.time()
    return (end - start) / iterations

# Create large tensors for testing
large_a = torch.randn(1000, 1000)
large_b = torch.randn(1000, 1000)

# Out-of-place addition
def out_of_place_add(a, b):
    return a + b

# In-place addition
def in_place_add(a, b):
    a_copy = a.clone()
    a_copy.add_(b)
    return a_copy

# Time both operations
out_of_place_time = time_operation(out_of_place_add, large_a, large_b)
in_place_time = time_operation(in_place_add, large_a, large_b)

print(f"Out-of-place addition average time: {out_of_place_time:.6f} seconds")
print(f"In-place addition average time: {in_place_time:.6f} seconds")
print(f"In-place operations are {out_of_place_time/in_place_time:.2f}x faster")

# Memory contiguous example
transposed = large_a.T
print(f"\nTransposed tensor is contiguous: {transposed.is_contiguous()}")

contiguous = transposed.contiguous()
print(f"Contiguous version is contiguous: {contiguous.is_contiguous()}")

## 9. Practical Applications in Deep Learning

Let's see how tensors are used in practical deep learning scenarios.

In [None]:
# Example: Simple neural network forward pass

# Create sample data (batch_size=32, features=784 - like MNIST)
batch_size, input_features, hidden_features, output_features = 32, 784, 128, 10
x = torch.randn(batch_size, input_features)

# Initialize weights and biases
w1 = torch.randn(input_features, hidden_features)
b1 = torch.randn(hidden_features)
w2 = torch.randn(hidden_features, output_features)
b2 = torch.randn(output_features)

print(f"Input shape: {x.shape}")
print(f"First layer weights shape: {w1.shape}")
print(f"Second layer weights shape: {w2.shape}")

# Forward pass
h = torch.matmul(x, w1) + b1  # Linear transformation
h = torch.relu(h)  # Activation function
y = torch.matmul(h, w2) + b2  # Output layer

print(f"\nHidden layer shape: {h.shape}")
print(f"Output shape: {y.shape}")

# Softmax for probability distribution
probabilities = torch.softmax(y, dim=1)
print(f"\nProbabilities shape: {probabilities.shape}")
print(f"Probabilities sum to 1: {torch.allclose(probabilities.sum(dim=1), torch.ones(batch_size))}")

## 10. Best Practices

Here are some best practices for working with tensors in deep learning:

1. **Use appropriate data types**: Choose the right dtype for your application (float32 for most cases, float16 for memory efficiency)
2. **Leverage GPU acceleration**: Move tensors to GPU when available for faster computation
3. **Use in-place operations**: When possible, use in-place operations to save memory
4. **Ensure contiguous memory**: Use `.contiguous()` when needed for optimal performance
5. **Batch operations**: Process data in batches to leverage vectorization
6. **Memory management**: Be mindful of memory usage, especially with large tensors
7. **Profiling**: Profile your code to identify performance bottlenecks

## Summary

In this tutorial, we've covered the fundamentals of tensors in PyTorch:

- What tensors are and how they generalize scalars, vectors, and matrices
- How to create tensors with various initialization methods
- Tensor properties including data types, shapes, and devices
- Essential tensor operations from basic arithmetic to advanced linear algebra
- Broadcasting rules for operations between tensors of different shapes
- Indexing and slicing techniques for accessing tensor elements
- Reshaping operations for manipulating tensor dimensions
- Performance considerations including in-place operations and memory management
- Practical applications in neural network forward passes

Understanding these concepts is fundamental to working with deep learning frameworks and building complex models like the Large Language Models in this repository.