# PyTorch Tensor Basics & GPU Fundamentals

This notebook is a **practical walkthrough of PyTorch tensor fundamentals**, focusing on how tensors are created, manipulated, and computed on both **CPU (RAM)** and **GPU (VRAM)**.

The emphasis is on **hands-on understanding**, not abstraction. Each concept is demonstrated using direct code examples to make tensor behavior, data types, memory usage, and device placement explicit and observable.

---

## Scope of This Notebook

This notebook intentionally focuses on:
- Tensor creation and initialization
- Data types and precision trade-offs
- Element-wise, reduction, and matrix operations
- In-place vs out-of-place operations
- Memory behavior when copying tensors
- Moving tensors between CPU and GPU
- Performance comparison between CPU and GPU execution
- Basic interoperability with NumPy

---

## Important Concepts Not Covered (By Design)

To keep the focus clear and beginner-friendly, the following topics are **intentionally not included**:

- Autograd and gradient computation
- Neural network modules (`torch.nn`)
- Optimizers and training loops
- Dataset and DataLoader pipelines
- Model serialization and checkpointing
- Distributed or multi-GPU training
- Advanced CUDA optimization and custom kernels

These topics build on tensor fundamentals and are better addressed **after** mastering the concepts shown here.

---

## Why This Notebook Matters

Understanding tensors at this level is critical because:
- All PyTorch models are built on tensor operations
- Incorrect dtype or device placement leads to silent bugs or performance issues
- GPU acceleration only works correctly when tensors, operations, and dtypes are aligned
- Memory behavior (views, clones, in-place ops) directly affects correctness and efficiency

This notebook establishes the **mental model required** to write correct and performant PyTorch code before moving on to model building.

---

## Intended Audience

- Beginners starting with PyTorch
- Students learning deep learning foundations
- Practitioners who want a **clear reference** for tensor behavior
- Anyone transitioning from NumPy to PyTorch


In [100]:
import  torch
print(torch.__version__)


2.9.0+cu126


In [101]:
if torch.cuda.is_available():
  print("gpu is availabel")
  print(f"using gpu: {torch.cuda.get_device_name(0)}")
else:
  print("no gpu availabl  using cpu onley")

gpu is availabel
using gpu: Tesla T4


# creating  a Tensor

In [102]:
# using empty
a=torch.empty(2,3) # it  creates  a tensor of  dim 2X3   with  random value  alredy exist at teh memoreylocation
a

tensor([[-1.5033e+00,  4.4226e-41,  2.8458e+03],
        [ 0.0000e+00,  1.2671e-14,  8.7201e+02]])

In [103]:
type(a)

torch.Tensor

In [104]:
# using zeros
torch.zeros(5,5) # create tensor   of dim  mxn   with   0 inetlized value

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

In [105]:
#using ones
torch.ones(3,4)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [106]:
#using rand
torch.rand(4,5)

tensor([[0.1884, 0.4755, 0.2448, 0.0025, 0.6842],
        [0.1131, 0.8636, 0.7975, 0.0506, 0.6266],
        [0.5625, 0.2531, 0.0442, 0.1462, 0.8433],
        [0.2583, 0.6419, 0.1036, 0.9077, 0.3090]])

###   sometime  we ned  same random value assingent (reproducablity)
so  we use seed

In [107]:
# manual seed
torch.manual_seed(100)
torch.rand(2,3)

tensor([[0.1117, 0.8158, 0.2626],
        [0.4839, 0.6765, 0.7539]])

In [108]:
# costum tensors  using torch.tensor
torch.tensor([[1,2,3],[4,5,6]]) #it took  list dict  tup


tensor([[1, 2, 3],
        [4, 5, 6]])

In [109]:
# other ways

# arange
print("using arange ->", torch.arange(0,10,2))

# using linspace  (linearley spaced values)
print("using linspace ->", torch.linspace(0,10,10))

# using eye  (it creates  identaty matric  (dig elemsen  one  others are zero))
print("using eye ->", torch.eye(5))

# using full
print("using full ->", torch.full((3, 3), 5))

using arange -> tensor([0, 2, 4, 6, 8])
using linspace -> tensor([ 0.0000,  1.1111,  2.2222,  3.3333,  4.4444,  5.5556,  6.6667,  7.7778,
         8.8889, 10.0000])
using eye -> tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]])
using full -> tensor([[5, 5, 5],
        [5, 5, 5],
        [5, 5, 5]])


#Tensors  Shape

In [110]:
x=torch.rand(2,6,4)
x.shape

torch.Size([2, 6, 4])

# tensor data Type

In [111]:
# finde data type of    values of  tensor
x.dtype

torch.float32

In [112]:
# assign data type
torch.tensor([1.0,2.0,3.0], dtype=torch.int32)
# using to()
x.to(torch.float32)

tensor([[[0.2627, 0.0428, 0.2080, 0.1180],
         [0.1217, 0.7356, 0.7118, 0.7876],
         [0.4183, 0.9014, 0.9969, 0.7565],
         [0.2239, 0.3023, 0.1784, 0.8238],
         [0.5557, 0.9770, 0.4440, 0.9478],
         [0.7445, 0.4892, 0.2426, 0.7003]],

        [[0.5277, 0.2472, 0.7909, 0.4235],
         [0.0169, 0.2209, 0.9535, 0.7064],
         [0.1629, 0.8902, 0.5163, 0.0359],
         [0.6476, 0.3430, 0.3182, 0.5261],
         [0.0447, 0.5123, 0.9051, 0.5989],
         [0.4450, 0.7278, 0.4563, 0.3389]]])

In [113]:
# to create specific data  type tensor
print(torch.tensor([[1,2,3],[4,5,6]],dtype=torch.float64))

torch.tensor([[1,2,3],[4,5,6]],dtype=torch.float32)


tensor([[1., 2., 3.],
        [4., 5., 6.]], dtype=torch.float64)


tensor([[1., 2., 3.],
        [4., 5., 6.]])

| **Data Type**             | **Dtype**         | **Description**                                                                                                                                                                |
|---------------------------|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **32-bit Floating Point** | `torch.float32`   | Standard floating-point type used for most deep learning tasks. Provides a balance between precision and memory usage.                                                         |
| **64-bit Floating Point** | `torch.float64`   | Double-precision floating point. Useful for high-precision numerical tasks but uses more memory.                                                                               |
| **16-bit Floating Point** | `torch.float16`   | Half-precision floating point. Commonly used in mixed-precision training to reduce memory and computational overhead on modern GPUs.                                            |
| **BFloat16**              | `torch.bfloat16`  | Brain floating-point format with reduced precision compared to `float16`. Used in mixed-precision training, especially on TPUs.                                                |
| **8-bit Floating Point**  | `torch.float8`    | Ultra-low-precision floating point. Used for experimental applications and extreme memory-constrained environments (less common).                                               |
| **8-bit Integer**         | `torch.int8`      | 8-bit signed integer. Used for quantized models to save memory and computation in inference.                                                                                   |
| **16-bit Integer**        | `torch.int16`     | 16-bit signed integer. Useful for special numerical tasks requiring intermediate precision.                                                                                    |
| **32-bit Integer**        | `torch.int32`     | Standard signed integer type. Commonly used for indexing and general-purpose numerical tasks.                                                                                  |
| **64-bit Integer**        | `torch.int64`     | Long integer type. Often used for large indexing arrays or for tasks involving large numbers.                                                                                  |
| **8-bit Unsigned Integer**| `torch.uint8`     | 8-bit unsigned integer. Commonly used for image data (e.g., pixel values between 0 and 255).                                                                                    |
| **Boolean**               | `torch.bool`      | Boolean type, stores `True` or `False` values. Often used for masks in logical operations.                                                                                      |
| **Complex 64**            | `torch.complex64` | Complex number type with 32-bit real and 32-bit imaginary parts. Used for scientific and signal processing tasks.                                                               |
| **Complex 128**           | `torch.complex128`| Complex number type with 64-bit real and 64-bit imaginary parts. Offers higher precision but uses more memory.                                                                 |
| **Quantized Integer**     | `torch.qint8`     | Quantized signed 8-bit integer. Used in quantized models for efficient inference.                                                                                              |
| **Quantized Unsigned Integer** | `torch.quint8` | Quantized unsigned 8-bit integer. Often used for quantized tensors in image-related tasks.                                                                                     |


#Mathmatical operations

##Scaler operations

In [114]:
x=torch.rand(3,4,dtype=torch.float64)
print(x)
# addition
x + 2
# substraction
x - 2
# multiplication
x * 3
# division
x / 3
# int division
(x * 100)//3
# mod
((x * 100)//3)%2
# power
x**2

tensor([[0.9692, 0.5168, 0.2422, 0.4312],
        [0.5917, 0.3425, 0.2202, 0.7030],
        [0.5629, 0.9259, 0.7612, 0.8381]], dtype=torch.float64)


tensor([[0.9393, 0.2671, 0.0586, 0.1860],
        [0.3501, 0.1173, 0.0485, 0.4942],
        [0.3168, 0.8573, 0.5794, 0.7024]], dtype=torch.float64)

# Element  wise operation on two tensors

In [115]:
a=torch.rand(3,4,dtype=torch.float64)
b=torch.rand(3,4,dtype=torch.float64)

In [116]:
# add
a + b
# sub
a - b
# multiply
a * b
# division
print(a / b)
# power
a ** b
# mod
a % b

tensor([[ 2.7529,  2.1994,  1.3316,  0.9862],
        [ 1.2111,  0.1894,  0.6451,  0.1684],
        [ 3.4296,  0.9104, 10.7696,  6.7235]], dtype=torch.float64)


tensor([[0.1352, 0.0542, 0.1536, 0.6128],
        [0.1498, 0.1344, 0.5146, 0.1479],
        [0.0531, 0.5144, 0.0504, 0.0875]], dtype=torch.float64)

In [117]:
# absolute vlaue
c=torch.tensor([1,-2,3,-4])
torch.abs(c)

tensor([1, 2, 3, 4])

In [118]:
## negative
torch.neg(c)

tensor([-1,  2, -3,  4])

In [119]:
d = torch.tensor([1.9, 2.3, 3.7, 4.4])
# round
torch.round(d)

tensor([2., 2., 4., 4.])

In [120]:
# ceil
torch.ceil(d)

tensor([2., 3., 4., 5.])

In [121]:
# floor
torch.floor(d)

tensor([1., 2., 3., 4.])

In [122]:
# clamp
torch.clamp(d, min=2, max=3)

tensor([2.0000, 2.3000, 3.0000, 3.0000])

In [123]:

### 3. Reduction operation

### 3. Reduction operation

In [124]:
e = torch.randint(size=(2,3), low=0, high=10, dtype=torch.float32)
e

tensor([[2., 7., 0.],
        [2., 4., 4.]])

In [125]:
# sum
torch.sum(e)
# sum along columns
torch.sum(e, dim=0)
# sum along rows
torch.sum(e, dim=1)

tensor([ 9., 10.])

In [126]:
# mean
torch.mean(e)
# mean along col
torch.mean(e, dim=0)

tensor([2.0000, 5.5000, 2.0000])

In [127]:
# median
torch.median(e)

tensor(2.)

In [128]:
# max and min
torch.max(e)
torch.min(e)

tensor(0.)

In [129]:
# product
torch.prod(e,dim=0)

tensor([ 4., 28.,  0.])

In [130]:
# standard deviation
torch.std(e)

tensor(2.4014)

In [131]:
# variance
torch.var(e)

tensor(5.7667)

In [132]:
# argmax
torch.argmax(e)

tensor(1)

In [133]:
# argmin
torch.argmin(e)

tensor(2)

### 4. Matrix operations

In [134]:
f = torch.randint(size=(2,3), low=0, high=10)
g = torch.randint(size=(3,2), low=0, high=10)

print(f)
print(g)

tensor([[6, 2, 1],
        [0, 5, 1]])
tensor([[0, 0],
        [2, 2],
        [3, 7]])


In [135]:
# matrix multiplcation
torch.matmul(f, g)

tensor([[ 7, 11],
        [13, 17]])

In [136]:
vector1 = torch.tensor([1, 2])
vector2 = torch.tensor([3, 4])

# dot product
torch.dot(vector1, vector2)

tensor(11)

In [137]:
# transpose
torch.transpose(f, 0, 1)

tensor([[6, 0],
        [2, 5],
        [1, 1]])

In [138]:
h = torch.randint(size=(3,3), low=0, high=10, dtype=torch.float32)
h

tensor([[1., 5., 0.],
        [2., 3., 7.],
        [0., 2., 5.]])

In [139]:
# determinant
torch.det(h)

tensor(-49.)

In [140]:
# inverse
torch.inverse(h)

tensor([[-0.0204,  0.5102, -0.7143],
        [ 0.2041, -0.1020,  0.1429],
        [-0.0816,  0.0408,  0.1429]])

### 5. Comparison operations

In [141]:
i = torch.randint(size=(2,3), low=0, high=10)
j = torch.randint(size=(2,3), low=0, high=10)

print(i)
print(j)

tensor([[2, 8, 0],
        [2, 5, 3]])
tensor([[6, 7, 8],
        [1, 2, 3]])


In [142]:
# greater than
i > j
# less than
i < j
# equal to
i == j
# not equal to
i != j
# greater than equal to

# less than equal to

tensor([[ True,  True,  True],
        [ True,  True, False]])

### 6. Special functions

In [143]:
k = torch.randint(size=(2,3), low=0, high=10, dtype=torch.float32)
k

tensor([[2., 7., 9.],
        [3., 9., 7.]])

In [144]:
# log
torch.log(k)

tensor([[0.6931, 1.9459, 2.1972],
        [1.0986, 2.1972, 1.9459]])

In [145]:
# exp
torch.exp(k)

tensor([[7.3891e+00, 1.0966e+03, 8.1031e+03],
        [2.0086e+01, 8.1031e+03, 1.0966e+03]])

In [146]:
# sqrt
torch.sqrt(k)

tensor([[1.4142, 2.6458, 3.0000],
        [1.7321, 3.0000, 2.6458]])

In [147]:
# sigmoid
torch.sigmoid(k)

tensor([[0.8808, 0.9991, 0.9999],
        [0.9526, 0.9999, 0.9991]])

In [148]:
# softmax
torch.softmax(k, dim=0)

tensor([[0.2689, 0.1192, 0.8808],
        [0.7311, 0.8808, 0.1192]])

In [149]:
# relu
torch.relu(k)

tensor([[2., 7., 9.],
        [3., 9., 7.]])

## Inplace Operations

In [150]:
m = torch.rand(2,3)
n = torch.rand(2,3)

print(m)
print(n)

tensor([[0.6818, 0.1953, 0.9991],
        [0.1133, 0.0135, 0.1450]])
tensor([[0.7819, 0.3134, 0.2983],
        [0.3436, 0.2028, 0.9792]])


In [151]:
m.add_(n)   #  that  underscore   make  operation  inplace  (that  is operation applid on   obj(tensor) and  save teh result on that tensor)

tensor([[1.4637, 0.5087, 1.2974],
        [0.4569, 0.2163, 1.1242]])

In [152]:
m

tensor([[1.4637, 0.5087, 1.2974],
        [0.4569, 0.2163, 1.1242]])

In [153]:
n

tensor([[0.7819, 0.3134, 0.2983],
        [0.3436, 0.2028, 0.9792]])

In [154]:
torch.relu(m)

tensor([[1.4637, 0.5087, 1.2974],
        [0.4569, 0.2163, 1.1242]])

In [155]:
m.relu_()

tensor([[1.4637, 0.5087, 1.2974],
        [0.4569, 0.2163, 1.1242]])

In [156]:
m

tensor([[1.4637, 0.5087, 1.2974],
        [0.4569, 0.2163, 1.1242]])

## Copying a Tensor

In [157]:
a = torch.rand(2,3)
a

tensor([[0.4947, 0.3617, 0.9687],
        [0.0359, 0.3041, 0.9867]])

In [158]:
b = a

In [159]:
b

tensor([[0.4947, 0.3617, 0.9687],
        [0.0359, 0.3041, 0.9867]])

In [160]:
a[0][0] = 0

In [161]:
a

tensor([[0.0000, 0.3617, 0.9687],
        [0.0359, 0.3041, 0.9867]])

In [162]:
b

tensor([[0.0000, 0.3617, 0.9687],
        [0.0359, 0.3041, 0.9867]])

In [163]:
id(a)

135549255661312

In [164]:
id(b)

135549255661312

In [165]:
b = a.clone()

In [166]:
a

tensor([[0.0000, 0.3617, 0.9687],
        [0.0359, 0.3041, 0.9867]])

In [167]:
b

tensor([[0.0000, 0.3617, 0.9687],
        [0.0359, 0.3041, 0.9867]])

In [168]:
a[0][0] = 10

In [169]:
a

tensor([[10.0000,  0.3617,  0.9687],
        [ 0.0359,  0.3041,  0.9867]])

# Tensor Operations on GPU

In [170]:
# first check wather   gpu is availabel or not
torch.cuda.is_available()

True

In [171]:
device=torch.device('cuda')

##now   all the  tensors we  create  wil be created on gpu ram -V-Ram
previousley  all the tensor  were created on ram

In [172]:
# creating new tensors on GPU
torch.rand(2,3,4,device=device)


tensor([[[0.3563, 0.0303, 0.7088, 0.2009],
         [0.0224, 0.9896, 0.3737, 0.0823],
         [0.2851, 0.4433, 0.2989, 0.4175]],

        [[0.2687, 0.0927, 0.2317, 0.9207],
         [0.1794, 0.8246, 0.7332, 0.8699],
         [0.5887, 0.0160, 0.9177, 0.5260]]], device='cuda:0')

In [173]:
# oving  existing    tensor present on normal ram to V-RAM
a=torch.rand(2,3)  #  this tensor  is present on cpu (RAM)
#  now lets move it to  V-RAM
b=a.to(device)
# now the  operation we performed on   runs on GPU
b+5  #this operation happend on gpu

tensor([[5.1290, 5.6887, 5.1637],
        [5.0899, 5.3139, 5.1219]], device='cuda:0')

##compare   speed  of opeation on gpu  and cpu

In [174]:
import time
start=time.time()
a=torch.randint(size=(100,200,100,100),low=1,high=100)  #  size of this  tensor  would be
#100x200x100x100=200000000=2x10^8 x 8(float64=8 byts)=16x10^8=almost 1.6 gb
#and  this  will  occupy   that much  storege in rame

b=torch.randint(size=(100,200,100,100),low=1,high=100)
c=torch.matmul(a,b)
end=time.time()
cpu_time=end-start
print(cpu_time)

13.382670402526855


In [175]:
import time
start=time.time()
a=torch.randint(size=(100,200,100,100),low=1,high=100,device=device,dtype=torch.float64)
b=torch.randint(size=(100,200,100,100),low=1,high=100,device=device,dtype=torch.float64)
c=torch.matmul(a,b)
end=time.time()
gpu_time=end-start
print(gpu_time)

print(f"\n\nGPU is {cpu_time//gpu_time} times faster than cpu")

0.2217388153076172


GPU is 60.0 times faster than cpu


#reshaping tensors

In [176]:
a=torch.ones(4,4)
#reshaping
print(a.reshape(2,2,2,2))
#flatten
print(a.flatten())

tensor([[[[1., 1.],
          [1., 1.]],

         [[1., 1.],
          [1., 1.]]],


        [[[1., 1.],
          [1., 1.]],

         [[1., 1.],
          [1., 1.]]]])
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])


#numpy and pytorch
we can easely  convert   tensor to  numpy   array   and  vice-versa

In [177]:
import numpy as np
a=torch.rand(3,4)
b=a.numpy()  # converting pytorch tensor to numpy array
print(type(a),type(b))


<class 'torch.Tensor'> <class 'numpy.ndarray'>


In [178]:
#converting numpy array to  torch.tensor
c=torch.from_numpy(b)
print(type(c))


<class 'torch.Tensor'>
