## Complete Guide to Numpy

Hi All ✌️, continuing to write articles on Kaggle after a very long time.

In my previous kernel, I summarized basic [Pandas](https://www.kaggle.com/code/rajmehra03/a-complete-pandas-tutorial) operations.

In this one we cover all the basic **Numpy** operations.

For this, we will use the Titanic dataset, if required. I have added a few more data sources so that the kernel reaches the mass and also you can try the operations on those data sources w/o much of a change.

As usual, hope you find it useful, and if you do, make sure to drop a 👍.

### Let's get started!

## Table of Contents(ToC):


#### 1. [Creating a Numpy array.](#content1)
 
#### 2. [The Basics.](#content2)
 
#### 3. [Some Standard Matrices.](#content3)
 
#### 4. [Indexing and Slicing.](#content4)
 
#### 5. [Iterating over a Numpy array.](#content5)
 
#### 6. [Linspace and arange.](#content6)
 
#### 7. [Generating Random Numbers/Arrays.](#content7)
 
#### 8. [Common Matrix Operations.](#content8)

#### 9. [Arithmetic operations between matrices.](#content9)
 
#### 10. [Arithmetic operations of matrix with a scalar (element-wise).](#content10)
 
#### 11. [Mathematical Operation(s).](#content11)
 
#### 12. [Statisitcal Operation(s).](#content12)

#### 13. [Concatenating numpy arrays.](#content13)

#### Importing modules

In [2]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
# for dirname, _, filenames in os.walk('/kaggle/input'):
#     for filename in filenames:
#         print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

#### 

In [4]:
train=pd.read_csv(r'./data/titanic/train.csv')
df=train.copy()
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [5]:
df.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


##### Just save it for future reference. Will use it if a data source is required to demonstrate any operation(s).

<a id="content1"></a>
## 1. Creating a Numpy array

In [6]:
# Empty array. Note the values are randomly initialized.
emp=np.empty((4,2))
emp

array([[0.0000000e+000, 0.0000000e+000],
       [0.0000000e+000, 0.0000000e+000],
       [0.0000000e+000, 9.7034493e-321],
       [1.2116596e-311, 1.2116596e-311]])

In [7]:
# Creating numpy array from array.
arr=np.array([1,2,3])
print(arr.shape)
arr

(3,)


array([1, 2, 3])

In [8]:
#Creating 2-D array
arr2=np.array([[1,4,3],[2,4,6]])
print(arr2.shape)
arr2

(2, 3)


array([[1, 4, 3],
       [2, 4, 6]])

##### To check number of dimesnsions in numpy array, use '.ndim' attribute: 

In [9]:
arr2.ndim

2

<a id="content2"></a>
## 2. The Basics

Let's see some basic attributes and operations of a numpy array.

In [10]:
arr=np.array([[2,3,4],[7,3,1],[2,5,6],[5,1,2]])
arr

array([[2, 3, 4],
       [7, 3, 1],
       [2, 5, 6],
       [5, 1, 2]])

In [11]:
# Shape of a numpy array
arr.shape

(4, 3)

In [12]:
# Resize a numpy array
arr2=arr.reshape(3,4)
arr2

array([[2, 3, 4, 7],
       [3, 1, 2, 5],
       [6, 5, 1, 2]])

Note that while re-shaping elements are put in row-first order i.e. **all elements of row 'i' occur before that of row 'i+1'.**

In [13]:
# Reshaping with only one dimension known.
arr3=arr.reshape(-1,2)
arr3.shape


(6, 2)

Note that for this to run no of elements in the array **arr** must be divisible by the specified dimension, else an error will be thrown.

In [14]:
# Dimensions of an array
print(arr)
arr.ndim

[[2 3 4]
 [7 3 1]
 [2 5 6]
 [5 1 2]]


2

In [15]:
# Size of each element in bytes
print(arr.dtype)
arr.itemsize #(64/8=8bytes)

int32


4

In [16]:
# Lets try with int 16
arr=arr.astype('int16') # now this will be 2 bytes. (16/8=2bytes)
print(arr.dtype)
arr.itemsize

int16


2

<a id="content3"></a>
##  3. Some Standard Matrices

##### Matrix of all zeros

In [17]:
zeros=np.zeros((2,3))
zeros

array([[0., 0., 0.],
       [0., 0., 0.]])

##### Matrix of all ones

In [18]:
#ones
ones=np.ones((3,2))
ones

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

##### Matrix initialised with some constant value.

In [19]:
# From some constant value
full=np.full((2,3),9)
full

array([[9, 9, 9],
       [9, 9, 9]])

##### Identity Matrix

In [20]:
order=3
I=np.eye(order)
I

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

<a id="content4"></a>
##  4. Indexing and Slicing

In [21]:
arr=np.array([[1,2,3],[5,7,6],[2,4,9]])
arr

array([[1, 2, 3],
       [5, 7, 6],
       [2, 4, 9]])

In [22]:
# First element
arr[0]

array([1, 2, 3])

In [23]:
# third element in second row
arr[1,2]

6

#### Slicing

In [24]:
# All but 1st element(0 based indexing)
arr[1:]

array([[5, 7, 6],
       [2, 4, 9]])

In [25]:
# All but first element of every row
arr[:,1:]

array([[2, 3],
       [7, 6],
       [4, 9]])

In [26]:
# Negative Indices. this one gives last element.
arr[-1]

array([2, 4, 9])

In [27]:
# Interesting
arr[:,-2:]

array([[2, 3],
       [7, 6],
       [4, 9]])

<a id="content5"></a>
##  5. Iterating over a Numpy array

In [28]:
arr

array([[1, 2, 3],
       [5, 7, 6],
       [2, 4, 9]])

In [29]:
for x in np.nditer(arr):
    print(x)

1
2
3
5
7
6
2
4
9


The order of iteration is chosen to match the memory layout of an array, without considering a particular ordering. This can be seen by iterating over the transpose of the above array.

In [30]:
for x in np.nditer(arr.T):
    print(x)

1
2
3
5
7
6
2
4
9


#### 'nditer()' is a tricky function to explain. Apart from [documentation](https://numpy.org/doc/stable/reference/generated/numpy.nditer.html), here are the links you can follow to understand about in in more depth: [link1](https://www.tutorialspoint.com/numpy/numpy_iterating_over_array.htm) & [link2](https://www.geeksforgeeks.org/numpy-iterating-over-array/#:~:text=nditer%20.,using%20Python's%20standard%20Iterator%20interface.&text=The%20order%20of%20iteration%20is,without%20considering%20a%20particular%20ordering.).

In [31]:
# Iterating in normal way
for row in arr:
    for x in row:
        print(x)

1
2
3
5
7
6
2
4
9


<a id="content6"></a>
##  6. Linspace and arange

#### Linspace
Linspace(low,high,count) generates **'evenly spaced'**   **count** no of elements **b/w low and high. (BOTH INCLUSIVE).**

In [32]:
arr_lin=np.linspace(1,250,5)
print(arr_lin.shape)
arr_lin

(5,)


array([  1.  ,  63.25, 125.5 , 187.75, 250.  ])

In [33]:
arr_lin=np.linspace(10,90,8)
print(arr_lin.shape)
arr_lin

(8,)


array([10.        , 21.42857143, 32.85714286, 44.28571429, 55.71428571,
       67.14285714, 78.57142857, 90.        ])

#### arange
Generates elements from low to high in given step size. Starts with low and increases step size till high is reached (EXCLUSIVE).

In [34]:
ar1=np.arange(1,100,1) # here third arg is the step size
ar1

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
       52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
       69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
       86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

In [35]:
ar2=np.arange(10,1,-2) #  in reverse order
ar2

array([10,  8,  6,  4,  2])

<a id="content7"></a>
## 7. Generating Random Numbers/Arrays

#### Random integers

In [36]:
x=np.random.randint(11)
x

3

If we give only one number as input, it by default assumes it as the high limit. So, in above case run the cell multiple times, everytime the value will be between 0 and 11.

In [37]:
x=np.random.randint(low=11,high=21)
x

14

In [38]:
# Array of given size
rand_arr=np.random.randint(11,121,(2,3))
rand_arr

array([[92, 44, 50],
       [31, 24, 66]])

#### Random number between 0 and 1

In [39]:
# Random float b/w 0 and 1 [0,1)
rand_float=np.random.random((3,3))
rand_float

array([[0.71272062, 0.18435383, 0.02944915],
       [0.17819509, 0.38822963, 0.10437157],
       [0.19767378, 0.58085752, 0.63580187]])

#### Random floats in a range


In [40]:
low=10
high=20
rand_float=np.random.random((3,2))
rand_float=low+rand_float*(high-low)
rand_float

array([[12.5912289 , 19.69326183],
       [18.72179841, 18.48128421],
       [10.70618273, 16.58592818]])

Or, we can use the random.uniform **function**.

In [41]:
rand_arr=np.random.uniform(low,high,(2,3))
print(rand_arr)

[[10.86289081 19.95590169 14.10973163]
 [14.99733632 12.11410546 16.82045584]]


#### Random numbers belonging to Normal(Gaussian) distribution.

In [42]:
# Random values from standard normal dist
rnd=np.random.randn(3,2)

mu=np.mean(rnd)
var=np.var(rnd)
print(mu,var)
rnd

-0.26666179318147293 0.3074243432741574


array([[-1.03925848,  0.25706247],
       [-0.23295627, -0.78320938],
       [-0.36861477,  0.56700567]])

#### Note one thing here. 'randn' in python numpy doesn't takes shape input as tuple but as a sequence of integers representing dimensions.

In [43]:
# A Specified mean(loc) and std dev(scale).
rnd=np.random.normal(2,.9,(1000,3)) # last is shape first is mean and second is variance

mu=np.mean(rnd)
var=np.var(rnd)
print(mu,var)
rnd

1.9783486874144203 0.7900281903538113


array([[2.25354228, 1.39347419, 0.7488919 ],
       [0.94637772, 1.06958194, 1.66202798],
       [0.75555728, 0.89237055, 2.36393756],
       ...,
       [2.54081297, 1.81549407, 2.61238018],
       [3.02328641, 1.40222207, 2.07475486],
       [2.33564878, 0.760004  , 4.09975172]])

<a id="content8"></a>
## 8. Common Matrix Operations

In [44]:
mat=np.array([[2,3,1],[4,5,6],[9,7,3]])
mat

array([[2, 3, 1],
       [4, 5, 6],
       [9, 7, 3]])

##### Transpose of a matrix

In [45]:
#Transpose
mat.T

array([[2, 4, 9],
       [3, 5, 7],
       [1, 6, 3]])

##### Multiplicative inverse (A^(-1))

In [46]:
mat_inv=np.linalg.inv(mat)
print(mat_inv)

[[-0.49090909 -0.03636364  0.23636364]
 [ 0.76363636 -0.05454545 -0.14545455]
 [-0.30909091  0.23636364 -0.03636364]]


##### Matrix multiplication.

In [47]:
# To check if inverse is right, just multiply it again with mat. Should return identity matrix
np.matmul(mat,mat_inv)

array([[ 1.00000000e+00,  0.00000000e+00,  4.85722573e-17],
       [-2.22044605e-16,  1.00000000e+00,  9.71445147e-17],
       [ 6.66133815e-16,  0.00000000e+00,  1.00000000e+00]])

<a id="content9"></a>
## 9. Arithmetic operations between matrices

In [48]:
mat

array([[2, 3, 1],
       [4, 5, 6],
       [9, 7, 3]])

In [49]:
mat2=mat.T
mat2

array([[2, 4, 9],
       [3, 5, 7],
       [1, 6, 3]])

In [50]:
# Addition
mat+mat2

array([[ 4,  7, 10],
       [ 7, 10, 13],
       [10, 13,  6]])

In [51]:
# Subtraction
mat-mat2

array([[ 0, -1, -8],
       [ 1,  0, -1],
       [ 8,  1,  0]])

#### Note that this is NOT MATRIX MULTIPLICATION. FOR That USE np.matmul().

In [52]:
mat*mat2 # Incorrect

array([[ 4, 12,  9],
       [12, 25, 42],
       [ 9, 42,  9]])

#### This is correct matrix multiplication:


In [53]:
np.matmul(mat,mat2)

array([[ 14,  29,  42],
       [ 29,  77,  89],
       [ 42,  89, 139]])

In [54]:
# Division
div=np.matmul(mat,np.linalg.inv(mat2))
div

array([[-0.85454545,  1.21818182,  0.05454545],
       [-0.72727273,  1.90909091, -0.27272727],
       [-3.96363636,  6.05454545, -1.23636364]])

<a id="content10"></a>
## 10. Arithmetic operations of matrix with a scalar (element-wise)

In [55]:
arr=np.array([10,12,14,15,17,21])
arr

array([10, 12, 14, 15, 17, 21])

In [56]:
# Addition
arr2=10+arr 
# works the other way round also 
arr2

array([20, 22, 24, 25, 27, 31])

In [57]:
#sub
arr_sub=arr-10 # works the other way round also
arr_sub

array([ 0,  2,  4,  5,  7, 11])

In [58]:
#mul
arr_mul=arr*2
arr_mul

array([20, 24, 28, 30, 34, 42])

In [59]:
#division
arr_div=arr/2
arr_div

array([ 5. ,  6. ,  7. ,  7.5,  8.5, 10.5])

#### Note that all the operations till now, works the other way round also ie numpy array and scalar can occur on either sides of operator.

#### Miscellaneous

In [60]:
arr*arr # element wise square

array([100, 144, 196, 225, 289, 441])

In [61]:
arr+arr # Note this not appending, this is equilavent to 2*arr.

array([20, 24, 28, 30, 34, 42])

<a id="content11"></a>
## 11. Mathematical Operation(s)

Let' see some common mathematical operations in numpy

In [62]:
# Log
x=2.73
ans=np.log(x)
ans


1.0043016091968684

In [63]:
# On a  list (element wise)
l=[2,3,4,5,6,1,2.73]
np.log(l)

array([0.69314718, 1.09861229, 1.38629436, 1.60943791, 1.79175947,
       0.        , 1.00430161])

In [64]:
# Expo
x=np.log(2)
ans=np.exp(x)
print(ans)

print('\n\n')

#On a list
l=[np.log(x) for x in range(1,10)]
ans=[np.exp(i) for i in l]
ans

2.0





[1.0,
 2.0,
 3.0000000000000004,
 4.0,
 4.999999999999999,
 6.0,
 6.999999999999999,
 7.999999999999998,
 9.000000000000002]

#### Sin , Cos and Tan

In [65]:
factor=[1,2,3,4]
rads=[np.pi/i for i in factor]
sins=[np.sin(rads)]
cos=[np.cos(rads)]
tans=[np.tan(rads)]

print("sins: ",sins)
print("cos: ",cos)
print("tans: ",tans)

sins:  [array([1.22464680e-16, 1.00000000e+00, 8.66025404e-01, 7.07106781e-01])]
cos:  [array([-1.00000000e+00,  6.12323400e-17,  5.00000000e-01,  7.07106781e-01])]
tans:  [array([-1.22464680e-16,  1.63312394e+16,  1.73205081e+00,  1.00000000e+00])]


<a id="content12"></a>
## 12. Statistical Operation(s)

Let' see some common statistical operations in numpy

In [66]:
mat

array([[2, 3, 1],
       [4, 5, 6],
       [9, 7, 3]])

In [67]:
# minimum
print(mat.min())
print(mat.min(axis=0))
print(mat.min(axis=1))

1
[2 3 1]
[1 4 3]


In [68]:
# maximum
print(mat.max())
print(mat.max(axis=0))
print(mat.max(axis=1))

9
[9 7 6]
[3 6 9]


In [69]:
# Mean
print(mat.mean()) # of whole array.
print(mat.mean(axis=0))
print(mat.mean(axis=1))

4.444444444444445
[5.         5.         3.33333333]
[2.         5.         6.33333333]


In [70]:
# Std Dev
print(mat.std()) # of whole array.
print(mat.std(axis=0))
print(mat.std(axis=1))

2.4088314876309775
[2.94392029 1.63299316 2.05480467]
[0.81649658 0.81649658 2.49443826]


In [71]:
# Variance
print(mat.var()) # of whole array.
print(mat.var(axis=0))
print(mat.var(axis=1))

5.802469135802469
[8.66666667 2.66666667 4.22222222]
[0.66666667 0.66666667 6.22222222]


<a id="content13"></a>
## 13. Concatenating numpy arrays

In [72]:
mat

array([[2, 3, 1],
       [4, 5, 6],
       [9, 7, 3]])

In [73]:
mat2

array([[2, 4, 9],
       [3, 5, 7],
       [1, 6, 3]])

In [74]:
result=np.concatenate((mat2,mat),axis=1)
result

array([[2, 4, 9, 2, 3, 1],
       [3, 5, 7, 4, 5, 6],
       [1, 6, 3, 9, 7, 3]])

In [75]:
result=np.concatenate((mat2,mat),axis=0)
result

array([[2, 4, 9],
       [3, 5, 7],
       [1, 6, 3],
       [2, 3, 1],
       [4, 5, 6],
       [9, 7, 3]])

#### Note that this can be achieved using 'hstack' and 'vstack' functions as well.

In [76]:
np.hstack([mat2,mat])

array([[2, 4, 9, 2, 3, 1],
       [3, 5, 7, 4, 5, 6],
       [1, 6, 3, 9, 7, 3]])

In [77]:
np.vstack([mat2,mat])

array([[2, 4, 9],
       [3, 5, 7],
       [1, 6, 3],
       [2, 3, 1],
       [4, 5, 6],
       [9, 7, 3]])

#### Hope you liked it, and as always if you did, make sure to drop a 👍.
Thanks, Bye!

## Link to previous kernels:

i. [Pandas: A Complete Pandas Tutorial](https://www.kaggle.com/code/rajmehra03/a-complete-pandas-tutorial) 

ii. [Python for DSA interviews.](https://www.kaggle.com/rajmehra03/python-for-dsa-interviews/)