**This code prints the installed version of PyTorch to verify the environment is
correctly set up and to ensure compatibility with subsequent operations that rely on specific tensor behaviors or GPU support.**

In [1]:
import torch
print(torch.__version__)

2.8.0+cu126


**This code checks whether a CUDA-compatible GPU is available to PyTorch, and if so, confirms GPU usage by displaying the device name; otherwise, it notifies that computations will default to CPU — enabling conditional execution based on hardware capabilities for optimized performance.**

In [2]:
if torch.cuda.is_available():
  print("GPU is available")
  print(f"using GPU:{torch.cuda.get_device_name(0)}")
else:
  print("GPU is not available")

GPU is available
using GPU:Tesla T4


# Creating an Uninitialized Tensor with `torch.empty`

This code creates a 2×3 tensor (a two-dimensional array with 2 rows and 3 columns) using `torch.empty`, which allocates a block of memory for the tensor but does *not* initialize its values — meaning the tensor contains whatever random data was already in that memory location, making it suitable for performance-critical scenarios where values will be overwritten immediately, but unsafe to use without explicit assignment.

In [3]:
a=torch.empty(2,3)

In [4]:
type(a)

torch.Tensor

# Creating a Tensor Filled with Zeros Using `torch.zeros`

This code creates a 2×3 tensor filled entirely with zeros, initializing all elements to a value of 0.0 — commonly used to set up matrices or vectors that will be updated during training, such as bias terms, masks, or placeholder outputs, ensuring a clean, predictable starting state for computations.

In [5]:
torch.zeros(2,3)

tensor([[0., 0., 0.],
        [0., 0., 0.]])

# Creating a Tensor Filled with Ones Using `torch.ones`

This code creates a 2×3 tensor where every element is initialized to 1.0 — often used to initialize scaling factors, unit matrices, or as a baseline for operations like normalization, activation functions, or when constructing identity-like structures in neural network layers.

In [6]:
torch.ones(2,3)

tensor([[1., 1., 1.],
        [1., 1., 1.]])

# Creating a Random Tensor Using `torch.rand`

This code creates a 2×3 tensor populated with random values drawn from a uniform distribution between 0 and 1 — commonly used to initialize weights in neural networks when a simple, unbiased random starting point is desired, ensuring that neurons begin with diverse, non-zero activations to break symmetry during training.

In [7]:
torch.rand(2,3)

tensor([[0.5151, 0.1334, 0.1054],
        [0.6141, 0.7690, 0.6715]])

# Reproducible Random Tensor Initialization with `torch.manual_seed`

This code sets a fixed random seed using `torch.manual_seed(100)` to ensure that every subsequent call to `torch.rand(2,3)` generates the exact same 2×3 tensor of random values — enabling consistent and reproducible results across runs, which is essential for debugging, experimentation, and sharing model training outcomes in research or development.

In [8]:
torch.manual_seed(100)
torch.rand(2,3)

tensor([[0.1117, 0.8158, 0.2626],
        [0.4839, 0.6765, 0.7539]])

# Creating a Tensor from Python List Data Using `torch.tensor`

This code constructs a 2×3 tensor by directly converting a nested Python list into a PyTorch tensor, preserving the structure and values exactly as provided — commonly used to define small, hardcoded data such as labels, sample inputs, or fixed weights during prototyping or testing, ensuring precise control over initial tensor content without randomization or initialization.

In [9]:
torch.tensor([[1,2,3],[4,5,6]])

tensor([[1, 2, 3],
        [4, 5, 6]])

# Creating a Sequence Tensor with `torch.arange`

This code generates a 1D tensor containing evenly spaced values starting from 0 up to (but not including) 10, with a step size of 2 — producing `[0, 2, 4, 6, 8]` — commonly used to create index arrays, time steps, or uniform sampling grids for data manipulation, loss computation, or defining ranges in neural network operations without explicit listing.

In [10]:
print("using arange->",torch.arange(0,10,2))

using arange-> tensor([0, 2, 4, 6, 8])


# Creating a Linearly Spaced Tensor with `torch.linspace`

This code generates a 1D tensor with 10 evenly spaced values between 0 and 10 (inclusive), creating a smooth sequence that starts at 0 and ends at 10 — commonly used to define continuous ranges for plotting, sampling activation functions, or initializing parameters that require uniform coverage over an interval in mathematical or physical modeling within neural networks.

In [11]:
print("using linspace ->",torch.linspace(0,10,10))

using linspace -> tensor([ 0.0000,  1.1111,  2.2222,  3.3333,  4.4444,  5.5556,  6.6667,  7.7778,
         8.8889, 10.0000])


# Creating an Identity Matrix with `torch.eye`

This code generates a 5×5 identity matrix — a square tensor with ones on the main diagonal and zeros everywhere else — commonly used in linear algebra operations such as initializing weight matrices to preserve input magnitude, regularizing optimization, or serving as a baseline for matrix inversion and transformation tasks in neural network layers.

In [12]:
print("using eye->", torch.eye(5))

using eye-> tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]])


# Creating a Tensor Filled with a Constant Value Using `torch.full`

This code creates a 3×3 tensor where every element is set to the specified value of 5 — useful for initializing tensors with a uniform constant, such as scaling factors, bias offsets, or padding values, ensuring all elements start with the same predefined value for consistent behavior in mathematical operations or model initialization.

In [13]:
print("using full->",torch.full((3,3),5))

using full-> tensor([[5, 5, 5],
        [5, 5, 5],
        [5, 5, 5]])


In [14]:
x = torch.tensor([[1,2,3],[4,5,6]])
x

tensor([[1, 2, 3],
        [4, 5, 6]])

In [15]:
x.shape

torch.Size([2, 3])

# Creating a Tensor with the Same Shape as Another Using `torch.empty_like`

This code creates a new tensor that has the exact same shape and data type as the input tensor `x`, but fills it with uninitialized (random) values from memory — useful for efficiently allocating memory in-place when you need a placeholder tensor of matching dimensions, such as storing intermediate outputs or gradients during forward and backward passes, without copying or initializing data unnecessarily.

In [16]:
torch.empty_like(x)

tensor([[8675445202132104482, 7957695011165139568, 2318365875964093043],
        [7233184988217307170,     137839914269218,                  65]])

# Creating a Zero-initialized Tensor with the Same Shape as Another Using `torch.zeros_like`

This code creates a new tensor with the same shape and data type as the input tensor `x`, but initializes all its elements to zero — commonly used to initialize gradients, residuals, or target outputs during backpropagation or loss computation, ensuring that operations start from a clean, known state while preserving the structure of the original tensor for safe and compatible computations.

In [17]:
torch.zeros_like(x)

tensor([[0, 0, 0],
        [0, 0, 0]])

# Creating a One-initialized Tensor with the Same Shape as Another Using `torch.ones_like`

This code creates a new tensor with the same shape and data type as the input tensor `x`, but initializes all its elements to one — useful for initializing scaling factors, unit masks, or activation multipliers during operations like normalization, attention mechanisms, or residual connections, ensuring consistent element-wise behavior while maintaining compatibility with the original tensor’s structure.

In [18]:
torch.ones_like(x)

tensor([[1, 1, 1],
        [1, 1, 1]])

# Understanding `torch.rand_like(x)` and Why It Differs from `torch.rand()`

Unlike functions such as `torch.rand(2,3)` or `torch.zeros(3,3)`, which take explicit shape arguments to create a new tensor from scratch, `torch.rand_like(x)` requires an existing tensor `x` as input — it does *not* accept shape tuples like `(2,3)`. This is by design: `rand_like` is intended to replicate the *structure* (shape, dtype, device) of an existing tensor, not define a new one arbitrarily. If you pass a tuple or forget to define `x`, Python raises a `NameError` or `TypeError`, because the function expects a tensor object, not dimensions. To use `rand_like` correctly, always first define your reference tensor — for example, `x = torch.randn(2, 3)` — then call `torch.rand_like(x)` to generate a random tensor with identical properties, ensuring compatibility in operations like gradient initialization or layer-wise weight replacement without manual shape management.

In [19]:
torch.rand_like(x)

NotImplementedError: "check_uniform_bounds" not implemented for 'Long'

# Querying the Data Type of a Tensor Using `x.dtype`

This code retrieves the data type (dtype) of tensor `x`, such as `torch.float32`, `torch.int64`, or `torch.bool` — a critical property that determines how values are stored and computed, ensuring numerical precision, memory efficiency, and compatibility during operations like matrix multiplication or gradient updates in neural network training.

In [20]:
x.dtype

torch.int64

# Creating a Tensor with Explicit Data Type Using `dtype=torch.int32`

This code creates a 1D tensor from a list of floating-point values but explicitly casts them to 32-bit integers (`torch.int32`), truncating decimal parts and converting `1.0`, `2.0`, `3.0` into the integers `1`, `2`, `3` — used when memory efficiency or integer-based operations (e.g., indexing, classification labels) are required, ensuring the tensor’s internal representation matches the computational needs of the downstream task rather than defaulting to floating-point precision.

In [21]:
torch.tensor([1.0,2.0,3.0],dtype=torch.int32)

tensor([1, 2, 3], dtype=torch.int32)

# Creating a Tensor with Explicit Double-Precision Floating Point Using `dtype=torch.float64`

This code creates a 1D tensor from integer values but explicitly casts them to 64-bit floating-point numbers (`torch.float64`), preserving full numerical precision for applications requiring high accuracy—such as scientific computing, numerical stability in optimization, or interoperability with libraries that expect double-precision inputs—while maintaining the original values as `1.0`, `2.0`, and `3.0` in memory.

In [22]:
torch.tensor([1,2,3],dtype=torch.float64)

tensor([1., 2., 3.], dtype=torch.float64)

# Converting Tensor Data Type In-Place with `x.to(torch.float32)`

This code converts the data type of tensor `x` to 32-bit floating-point (`torch.float32`) without changing its shape or device, enabling compatibility with models or operations that require single-precision floats—commonly used for efficient memory usage and faster computation on GPUs during neural network training, while preserving the underlying values through precise type casting.

In [23]:
x.to(torch.float32)

tensor([[1., 2., 3.],
        [4., 5., 6.]])

# Using `torch.rand_like()` with Explicit `dtype` Override

This code generates a random tensor with the same shape and device as `x`, but explicitly overrides its data type to `torch.float32` — allowing precise control over numerical precision during initialization, even when the source tensor `x` has a different dtype (e.g., `float64` or `int64`), ensuring compatibility with standard deep learning operations that assume single-precision floating-point inputs while maintaining structural consistency with existing tensors.

In [24]:
torch.rand_like(x,dtype=torch.float32)

tensor([[0.2627, 0.0428, 0.2080],
        [0.1180, 0.1217, 0.7356]])

In [25]:
x=torch.rand(2,2)
x

tensor([[0.7118, 0.7876],
        [0.4183, 0.9014]])

# Element-wise Arithmetic Operations on Tensors

This sequence demonstrates basic element-wise arithmetic operations on a PyTorch tensor `x`: adding 2, subtracting 2, multiplying by 3, dividing by 3, performing integer division by 3 after scaling by 100, and computing the modulo 2 remainder — illustrating how PyTorch broadcasts scalar values across all tensor elements for efficient, vectorized computations without loops, commonly used in data normalization, feature scaling, and discrete transformation tasks in machine learning pipelines.

In [26]:
# Add 2 to every element in tensor x (broadcasting scalar)
x + 2

# Subtract 2 from every element in tensor x (broadcasting scalar)
x - 2

# Multiply every element in tensor x by 3 (element-wise scalar multiplication)
x * 3

# Divide every element in tensor x by 3 (element-wise scalar division)
x / 3

# Scale all elements by 100, then perform integer division by 3 (truncates decimals)
(x * 100) // 3

# Take the result of the integer division, then compute modulo 2 to get parity (even/odd)
((x * 100) // 3) % 2

tensor([[1., 0.],
        [1., 0.]])

# Element-wise Tensor Operations with Random Matrices

This code generates two random 2×3 tensors, `a` and `b`, using `torch.rand`, and prints their values to visualize randomly initialized data — commonly used for testing tensor operations, debugging broadcasting behavior, or demonstrating element-wise computations in neural network layers where inputs are sampled from a uniform distribution.

In [27]:
a=torch.rand(2,3)
b=torch.rand(2,3)
print(a)
print(b)

tensor([[0.9969, 0.7565, 0.2239],
        [0.3023, 0.1784, 0.8238]])
tensor([[0.5557, 0.9770, 0.4440],
        [0.9478, 0.7445, 0.4892]])


# Element-wise Tensor Arithmetic Between Two Tensors

This sequence performs standard element-wise arithmetic operations—addition, subtraction, multiplication, division, and modulo—between two tensors of identical shape, applying each operation pair-wise across corresponding elements; these are foundational for neural network computations such as residual connections, attention scaling, normalization, and gradient updates, where tensor compatibility in shape ensures safe and efficient broadcasting without explicit loops.

In [28]:
a+b
a-b
a*b
a/b
a%b

tensor([[0.4411, 0.7565, 0.2239],
        [0.3023, 0.1784, 0.3346]])

In [29]:
c=torch.tensor([1,-2,3,-4])

# Computing Absolute Values of a Tensor with `torch.abs`

This code computes the element-wise absolute value of tensor `c`, converting all negative values to their positive counterparts while leaving non-negative values unchanged — commonly used to measure magnitude, compute loss functions like L1 norm, or enforce non-negativity in gradient-based optimizations where sign-invariant magnitudes are required.

In [32]:
torch.abs(c)

tensor([1, 2, 3, 4])

# Negating Tensor Elements with `torch.neg`

This code computes the element-wise negation of tensor `c`, flipping the sign of every value (positive becomes negative, negative becomes positive) — commonly used to reverse gradients, invert transformations, or prepare target signals in loss functions where signed opposition is required, such as in adversarial training or contrastive learning objectives.

In [30]:
torch.neg(c)

tensor([-1,  2, -3,  4])

In [31]:
d=torch.tensor([1.9,2.3,3.7,4.4])

# Rounding Tensor Elements to Nearest Integer with `torch.round`

This code rounds each element of tensor `d` to the nearest integer, preserving the tensor’s shape and dtype (typically converting to float if not already), commonly used to discretize continuous predictions—such as probabilities or outputs from regression layers—into hard class labels or quantized values for post-processing, evaluation, or compatibility with discrete downstream tasks.

In [33]:
torch.round(d)

tensor([2., 2., 4., 4.])

# Rounding Tensor Elements Up to the Nearest Integer with `torch.ceil`

This code computes the ceiling of each element in tensor `d`, rounding every value up to the smallest integer greater than or equal to it — commonly used to enforce minimum thresholds, discretize continuous values upward (e.g., in resource allocation or batch sizing), or prepare data for integer-based indexing and quantization where upward precision is required.

In [34]:
torch.ceil(d)

tensor([2., 3., 4., 5.])

# Rounding Tensor Elements Down to the Nearest Integer with `torch.floor`

This code computes the floor of each element in tensor `d`, rounding every value down to the largest integer less than or equal to it — commonly used to enforce maximum thresholds, discretize continuous values downward (e.g., in binning or quantization), or prepare data for integer-based indexing where truncation toward zero is required for deterministic behavior.

In [35]:
torch.floor(d)

tensor([1., 2., 3., 4.])

# Clamping Tensor Values to a Specified Range with `torch.clamp`

This code restricts all elements of tensor `d` to lie within the range [2, 3], setting any value below 2 to 2 and any value above 3 to 3 — commonly used to prevent extreme activations, stabilize gradients, or enforce physical or logical constraints in neural network outputs, such as pixel intensities, probabilities, or control signals, ensuring robustness and adherence to bounded domains during inference or training.

In [36]:
torch.clamp(d,min=2,max=3)

tensor([2.0000, 2.3000, 3.0000, 3.0000])

# Generating a Random Integer Tensor with `torch.randint`

This code creates a 2×3 tensor filled with random integers sampled uniformly from the range [0, 10), meaning each element is an integer between 0 (inclusive) and 10 (exclusive) — commonly used to generate discrete labels, mask indices, or synthetic categorical data for testing classification models, embedding layers, or operations that require integer-valued inputs rather than continuous values.

In [51]:
e=torch.randint(size=(2,3),low=0,high=10,dtype=torch.float32)


e

tensor([[6., 1., 5.],
        [5., 0., 4.]])

# Computing Summations Along Tensor Dimensions with `torch.sum`

This sequence computes the total sum of all elements in tensor `e`, then breaks it down along specific dimensions: summing across columns (`dim=0`) produces a 1D tensor with one value per column (reducing rows), while summing across rows (`dim=1`) produces a 1D tensor with one value per row (reducing columns) — essential for aggregating features, computing loss over batches, or reducing activations in neural networks to obtain summary statistics while preserving structural intent through explicit dimension control.

In [52]:
torch.sum(e)

#sum along columns
torch.sum(e,dim=0)
#sum along rows
torch.sum(e,dim=1)

tensor([12.,  9.])

# Computing Mean Values Along Tensor Dimensions with `torch.mean`

This sequence computes the global mean of all elements in tensor `e`, then calculates the mean along specific dimensions: averaging along `dim=0` (columns) produces a row vector where each element is the mean of its corresponding column, effectively reducing the number of rows while preserving column structure — commonly used to compute feature-wise averages across a batch, enabling normalization, statistical analysis, or activation pooling in neural network layers.

In [53]:
#mean
torch.mean(e)
#mean along col
torch.mean(e,dim=0)


tensor([5.5000, 0.5000, 4.5000])

# Computing the Median of All Elements in a Tensor with `torch.median`

This code computes the median value of all elements in tensor `e`, returning the middle value when all elements are sorted in ascending order — used to obtain a robust central tendency measure that is less sensitive to outliers than the mean, making it useful for anomaly detection, robust statistics in training data, or stabilizing outputs in noisy or skewed distributions within neural network inference.

In [54]:
#median
torch.median(e)

tensor(4.)

# Finding Global Maximum and Minimum Values in a Tensor with `torch.max` and `torch.min`

This sequence retrieves the single largest and smallest values across all elements in tensor `e`, providing the extreme bounds of the data — commonly used to monitor activation ranges, detect saturation or vanishing signals in neural networks, normalize inputs, or set clipping thresholds for stable training and inference.

In [55]:
#max and min
torch.max(e)
torch.min(e)

tensor(0.)

# Computing the Product of All Elements in a Tensor with `torch.prod`

This code calculates the product of all elements in tensor `e`, multiplying every value together to produce a single scalar result — used in probabilistic models for joint likelihood computation, attention scaling, or geometric mean approximations, where cumulative multiplication across features or time steps is required to capture multiplicative interactions in neural network outputs or loss functions.

In [56]:
#product
torch.prod(e)

tensor(0.)

# Computing Standard Deviation of a Tensor with `torch.std`

This code computes the standard deviation of all elements in tensor `e`, measuring the spread or variability of the values around their mean — essential for analyzing activation distributions, detecting training instability, normalizing inputs via standardization, or initializing weights with controlled variance to maintain signal flow in deep neural networks.

In [57]:
#standard deviation
torch.std(e)

tensor(2.4290)

# Computing Variance of a Tensor with `torch.var`

This code computes the variance of all elements in tensor `e`, measuring the average squared deviation from the mean — used to quantify the spread of activations or weights, monitor training dynamics, and inform initialization strategies (e.g., Xavier/Glorot) that rely on controlled variance to prevent vanishing or exploding gradients in deep neural networks.

In [58]:
#variance
torch.var(e)

tensor(5.9000)

# Finding the Index of the Maximum Value in a Flattened Tensor with `torch.argmax`

This code returns the single flat index corresponding to the largest element in tensor `e`, treating the tensor as a flattened 1D array — commonly used to identify the position of the dominant activation, such as in classification tasks where the highest score corresponds to the predicted class, or for locating peak values in attention maps, feature responses, or optimization signals during model analysis.

In [59]:
#argmax
torch.argmax(e)

tensor(0)

# Finding the Index of the Minimum Value in a Flattened Tensor with `torch.argmin`

This code returns the single flat index corresponding to the smallest element in tensor `e`, treating the tensor as a flattened 1D array — commonly used to locate the weakest activation, identify outliers, or find minimal error regions in optimization and attention mechanisms, providing a way to pinpoint critical locations in data without returning the value itself.

In [60]:
#argmin
torch.argmin(e)

tensor(4)

In [63]:
f=torch.randint(size=(2,3),low=0,high=10)
g=torch.randint(size=(3,2),low=0,high=10)

print(f)
print(g)

tensor([[7, 1, 1],
        [5, 4, 4]])
tensor([[1, 1],
        [2, 4],
        [7, 2]])


# Matrix Multiplication with `torch.matmul`

This code performs standard matrix multiplication between tensors `f` and `g`, computing the dot product of rows from `f` with columns from `g` — the foundational operation in dense neural network layers, where weights are applied to inputs to produce transformed outputs; it enables efficient linear transformations at scale, supporting batched computations and automatic broadcasting for compatibility with convolutional, recurrent, and fully connected architectures.

In [64]:
torch.matmul(f,g)

tensor([[16, 13],
        [41, 29]])

# Computing the Dot Product of Two Vectors with `torch.dot`

This code calculates the dot product (scalar product) of two 1D tensors, `vector1` and `vector2`, by multiplying corresponding elements and summing the results — a fundamental operation in linear algebra used to measure similarity between vectors, compute projections, apply attention weights, or evaluate inner products in neural networks, where the output represents the aligned magnitude of two vectors in shared space.

In [65]:
vector1=torch.tensor([1,2])
vector2=torch.tensor([3,4])

torch.dot(vector1, vector2)

tensor(11)

# Transposing a Tensor with `torch.transpose`

This code swaps the dimensions of tensor `f` along axes 0 and 1 — converting rows into columns and columns into rows — commonly used to align tensor shapes for matrix multiplication, prepare data for layer inputs (e.g., switching batch and feature dimensions), or match expected formats in operations like attention mechanisms or convolutional layers that require specific axis ordering.

In [67]:
torch.transpose(f, 0,1)

tensor([[7, 5],
        [1, 4],
        [1, 4]])

# Generating a Random 3×3 Float Tensor with `torch.randint`

This code creates a 3×3 tensor filled with random integers between 0 (inclusive) and 10 (exclusive), then explicitly casts them to `torch.float32` — commonly used to simulate real-valued data with discrete sampling, such as noisy measurements or discretized features, while ensuring compatibility with floating-point operations in neural networks that require single-precision arithmetic for efficiency and numerical stability.

In [71]:
h=torch.randint(size=(3,3),low=0,high=10,dtype=torch.float32)
h

tensor([[5., 6., 1.],
        [4., 5., 6.],
        [9., 2., 3.]])

# Computing the Determinant of a Square Tensor with `torch.det`

This code calculates the determinant of the 3×3 floating-point tensor `h` — a scalar value that represents the scaling factor of the linear transformation described by the matrix, used to assess invertibility (non-zero determinant = invertible), measure volume change in coordinate transformations, or detect degenerate matrices in optimization and geometric deep learning applications.

In [72]:
torch.det(h)

tensor(230.0000)

# Computing the Matrix Inverse with `torch.inverse`

This code calculates the inverse of the 3×3 invertible tensor `h`, producing a new matrix that, when multiplied by `h`, yields the identity matrix — essential for solving linear systems, undoing transformations, or computing precision matrices in probabilistic models; it requires that `h` be square and non-singular (determinant ≠ 0), making it critical for stability in neural network operations involving normalization, whitening, or geometric corrections.

In [73]:
torch.inverse(h)

tensor([[ 0.0130, -0.0696,  0.1348],
        [ 0.1826,  0.0261, -0.1130],
        [-0.1609,  0.1913,  0.0043]])

NameError: name 'i' is not defined