# Numpy basics
In this notebook, I will be covering the basics of numpy. The topics that will be covered here are  
- Array creation
- Array properties
- Array reshaping
- Indexing and slicing
- Some useful functions
Let's get started.

First we will add the library

In [2]:
import numpy as np

# Array creation
## From python lists
The easiest way to create an array in numpy is to use the `np.array()` while passing a python list.

In [3]:
arr = np.array([1, 2, 3])
arr

array([1, 2, 3])

If we have a predefined python list, it could also be used as follows.

In [4]:
a = [3, 5, 7]
arr2 = np.array(a)
arr2

array([3, 5, 7])

## Using numpy functions to generate arrays
### Creating an array of ones or zeros

In [5]:
o_arr = np.ones(3)  #1D ones array with 3 elements
o_arr

array([1., 1., 1.])

In [6]:
z_arr = np.zeros([2, 2])    #2D zeroes array with 2*2=4 elements
z_arr

array([[0., 0.],
       [0., 0.]])

### Creating arrays from a range
We can use `np.arange()` and `np.linspace()` to generate an array from a range.

In [7]:
arr3 = np.arange(1, 10, 3)    #Will generate numbers from 1 to 10 with step 3
arr3

array([1, 4, 7])

In [8]:
arr4 = np.linspace(0, 10, 10)   #Start, stop and number of points
arr4

array([ 0.        ,  1.11111111,  2.22222222,  3.33333333,  4.44444444,
        5.55555556,  6.66666667,  7.77777778,  8.88888889, 10.        ])

### Creating array of random numbers

In [9]:
arr5 = np.random.rand(2,4)   #2x4 array with random numbers from 0 to 1
arr5

array([[0.92475342, 0.42902163, 0.08963816, 0.5137294 ],
       [0.66814528, 0.45691217, 0.79266873, 0.65555954]])

In [10]:
arr6 = np.random.randint(1, 10, (3,3))  #3x3 array with random numbers from
arr6                                    #0 to 9

array([[4, 8, 8],
       [5, 3, 2],
       [2, 1, 3]])

## Array properties
We can get the size and data types of a numpy array using various builtin methods. We can find the data type, the total number of elements, the shape of the array, the number of bytes an item of the array takes and the total number of bytes used by the array.

In [11]:
arr7 = np.array([[1, 3], [2, -1]])
arr7.dtype  #Gives the data type of the array

dtype('int64')

In [12]:
arr7.size   #Total number of elements

4

In [13]:
arr7.shape  #Shape of the array ie 2x2

(2, 2)

In [14]:
arr7.ndim   #Prints the number of dimensions, here 2 since it's a 2x2 matrix

2

In [15]:
arr7.itemsize   #How much memory each entry takes

8

In [16]:
arr7.nbytes #How much memory the total array is taking up

32

In [17]:
# Alternative method to get the total number of bytes the array is taking
arr7.size * arr7.itemsize

32

Let's try defining a 3D array and test how the properties change for them.

In [18]:
arr8 = np.array(
    [[[1, 2, 4.5],[2, 3, 'cat']],[[-3, True, False],['red', 'name', 4]],[[1, 2, 3],[4, 5, 6]]]
)
arr8

array([[['1', '2', '4.5'],
        ['2', '3', 'cat']],

       [['-3', 'True', 'False'],
        ['red', 'name', '4']],

       [['1', '2', '3'],
        ['4', '5', '6']]], dtype='<U32')

In [19]:
arr8.shape

(3, 2, 3)

The shape, in this case is a 3-tuple. The first number represents the slices, or how many groups of arrays there are. The second number represents the number of arrays in one group (each group must have same number of arrays). The last number represents the number of items for each array.

In [20]:
arr8.dtype

dtype('<U32')

This means that the entire array is considered to consist of Unicode characters of 32 characters in each string.

In [21]:
arr8.itemsize

128

In [22]:
arr8.nbytes

2304

# Reshaping arrays

We can reshape any given array into any desired shape, provided that the total item number of the new shape is equal to the previous one.

## Using `reshape()` method

In [23]:
arr9 = np.arange(100)   #Note: 100 elements
arr9

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

In [24]:
arr9.reshape((4,25))    #Works because 4*25=100

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
        41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
        66, 67, 68, 69, 70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
        91, 92, 93, 94, 95, 96, 97, 98, 99]])

In [25]:
arr9.reshape((10,10))   #Works because 10*10=100

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

## Flattening an array
One could also convert a multi-dimensional array to a one dimensional array, which is called flattening.

In [26]:
arr10 = np.array([[1, 2], [3, 4]])
arr10

array([[1, 2],
       [3, 4]])

In [27]:
arr10.flatten()

array([1, 2, 3, 4])

In [28]:
arr10.ravel()   #Alternative way to flatten an array

array([1, 2, 3, 4])

What's the difference between `flatten` and `ravel`? `flatten` creates a new array from the original, while `ravel` returns a reference to the original array. This means any change done to the array returned by `ravel` also affects the original array.

In [29]:
ra = np.arange(10).reshape((2, 5))
ra

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [30]:
rb = ra.ravel()
rb

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [31]:
rb[1] = -1
rb

array([ 0, -1,  2,  3,  4,  5,  6,  7,  8,  9])

In [32]:
ra

array([[ 0, -1,  2,  3,  4],
       [ 5,  6,  7,  8,  9]])

Note that the original array has $-1$ instead of $1$, even though we changed the element in the new array.

## Transposing
Transposing an array is as easy as follows

In [33]:
ra.T

array([[ 0,  5],
       [-1,  6],
       [ 2,  7],
       [ 3,  8],
       [ 4,  9]])

# Indexing and slicing
Now, we move on to indexing and slicing. Let's take a look at indexing first.
## Indexing
We use indexing when we need to find values from multiple positions of an array. For example, let's define array of length $10$, and get the first, fifth and the last numbers from them using indexing.

In [34]:
arr11 = np.arange(10)
arr11

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [35]:
arr11[[0, 4, -1]]   #First, fifth and last respectively

array([0, 4, 9])

## Boolean masking
When we use one or more conditions to find the elements of an array, we can use boolean masking. The simplest way to make a boolean mask is to create a boolean array.

In [36]:
barr = np.arange(5)
barr

array([0, 1, 2, 3, 4])

In [37]:
mask = np.array([True, True, False, True, False])   #Boolean array
barr[mask]                                          
#This will only print the entries in the True position of mask of the
#respective array of barr

array([0, 1, 3])

Next, we could bypass creation of the mask array by using conditions directly inside the indexing of the array.

In [38]:
barr2 = np.arange(10)
barr2

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Let's get all even numbers from this.

In [39]:
barr2[barr2 % 2 == 0]

array([0, 2, 4, 6, 8])

Let's try finding all the elements that are not multiples of $3$.

In [40]:
barr2[~(barr2 % 3 == 0)]

array([1, 2, 4, 5, 7, 8])

We could even create multiple conditions in the same indexing, for example, we can do the following to find all even multiple of $3$ in the array.

In [41]:
barr2[(barr2 % 2 == 0) & (barr2 % 3 == 0)]

array([0, 6])

It is important to put parenthesis around the conditions if we are using logical and `&` and logical or `|`.

## Slicing

What if we want the first 4 elements? We will use the `:` operator.

In [42]:
arr11[:4]

array([0, 1, 2, 3])

Similarly, we can try and get the values starting from the 3rd to the 6th using it.

In [43]:
arr11[3:7]  #The colon operator returns a list upto, not including 
            #the right limit

array([3, 4, 5, 6])

To get every other element, we can use the `:` operator and specify the increment

In [44]:
arr11[0::2] #First element: starting index, second element: ending index
            #if ommited, meaning till the end of array
            #Third element: increment. 

array([0, 2, 4, 6, 8])

We will now try slicing on a 2D matrix.

In [45]:
arr12 = np.arange(25).reshape((5,5))
arr12

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

Say we want the smaller 2D matrix right in the middle of the array.

In [46]:
arr12[1:4, 1:4] #First index chooses the rows, so row 2, 3 and 4
                #Similarly for the column we have column 2, 3 and 4

array([[ 6,  7,  8],
       [11, 12, 13],
       [16, 17, 18]])

We can also try to get the diagonal from here as following.

In [47]:
arr12[np.arange(5), np.arange(5)]
#First index is the row numbers, so from 0 to 4
#Second index is the column numbers, similar manner

array([ 0,  6, 12, 18, 24])

# Useful functions
Now we will take a look at numpy functions that could be used on all arrays.


## Aggreagation functions
Considering an array with numerical data, we can use aggregate functions on it to get useful information.

In [None]:
arr13 = np.random.random(50)*10 #Generating random array
arr13

array([1.36591242, 9.09598805, 0.44381671, 6.62074608, 6.52979401,
       6.27282218, 1.45699211, 4.50020788, 0.83478324, 0.74320824,
       1.75727351, 5.96233335, 2.0570793 , 9.35161229, 1.73367458,
       1.32788278, 8.86852786, 0.39464955, 0.06570473, 3.05985114,
       0.33999743, 3.50656392, 7.51745128, 4.6593452 , 6.77545483,
       4.78398453, 9.38625987, 1.67038483, 5.46733438, 3.33199027,
       4.19360378, 9.57125291, 9.15313632, 4.41705896, 6.59683087,
       2.67648355, 3.42397326, 1.18063586, 4.65445548, 0.57948341,
       5.22791529, 1.89667252, 3.71520449, 7.54859426, 7.3439027 ,
       0.91185951, 7.36229279, 0.95727742, 6.16244166, 1.92376646])

In [None]:
arr13.sum() #Sum of all the elements

np.float64(209.37846804761895)

In [None]:
arr13.mean()    #Arithmetic mean

np.float64(4.187569360952379)

In [None]:
arr13.min() #Minimum value

np.float64(0.06570472824767637)

In [None]:
arr13.max() #Maximum value

np.float64(9.571252906563902)

In [None]:
arr13.std() #Standard deviation of the data

np.float64(2.9106416309232426)

In [None]:
arr13.var() #Variance, Square of the standard deviation

np.float64(8.471834703663513)

## Element wise operations
Most of the common element wise operations can be done in a similar fashion. 

In [58]:
arr14 = np.arange(1, 20)
arr14

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19])

In [None]:
np.sqrt(arr14)  #Square root

array([1.        , 1.41421356, 1.73205081, 2.        , 2.23606798,
       2.44948974, 2.64575131, 2.82842712, 3.        , 3.16227766,
       3.31662479, 3.46410162, 3.60555128, 3.74165739, 3.87298335,
       4.        , 4.12310563, 4.24264069, 4.35889894])

In [61]:
np.exp(arr14)   #Exponential, e^x

array([2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 5.45981500e+01,
       1.48413159e+02, 4.03428793e+02, 1.09663316e+03, 2.98095799e+03,
       8.10308393e+03, 2.20264658e+04, 5.98741417e+04, 1.62754791e+05,
       4.42413392e+05, 1.20260428e+06, 3.26901737e+06, 8.88611052e+06,
       2.41549528e+07, 6.56599691e+07, 1.78482301e+08])

In [62]:
np.log(arr14)   #Natural log

array([0.        , 0.69314718, 1.09861229, 1.38629436, 1.60943791,
       1.79175947, 1.94591015, 2.07944154, 2.19722458, 2.30258509,
       2.39789527, 2.48490665, 2.56494936, 2.63905733, 2.7080502 ,
       2.77258872, 2.83321334, 2.89037176, 2.94443898])

In [63]:
np.log10(arr14) #10 based log

array([0.        , 0.30103   , 0.47712125, 0.60205999, 0.69897   ,
       0.77815125, 0.84509804, 0.90308999, 0.95424251, 1.        ,
       1.04139269, 1.07918125, 1.11394335, 1.14612804, 1.17609126,
       1.20411998, 1.23044892, 1.25527251, 1.2787536 ])

## Linear algebra
We can do dot products, vector norm, matrix inversion and much more using numpy!

In [67]:
arr15 = np.random.randint(-5, 5, (1, 3))
arr15

array([[-4, -2,  3]])

In [68]:
arr16 = np.random.randint(-5, 5, (1, 3))
arr16

array([[ 2, -3, -5]])

In [None]:
arr15.dot(arr16.T)  #Remember that for dot products of matrices
                    #Row of the first matrix must match the column of second

array([[-17]])

In [71]:
np.linalg.norm(arr15)

np.float64(5.385164807134504)

In [76]:
arr17 = np.random.randint(-7, 7, (3, 3))
arr17

array([[ 6, -6, -7],
       [ 4,  5, -3],
       [-4,  2,  4]])

In [77]:
np.linalg.det(arr17)   #Determinant, if non zero, then invertible

np.float64(-15.999999999999991)

In [78]:
np.linalg.inv(arr17)

array([[-1.625 , -0.625 , -3.3125],
       [ 0.25  ,  0.25  ,  0.625 ],
       [-1.75  , -0.75  , -3.375 ]])

# Summary  
- **Array Creation**: `np.array()`, `np.arange()`, `np.random.rand()`.  
- **Properties**: `shape`, `dtype`, `size`, `nbytes`.  
- **Reshaping**: `reshape()`, `flatten()` (copy) vs. `ravel()` (reference).  
- **Indexing/Slicing**: Boolean masks, multi-axis slicing.  
- **Functions**: Aggregates (`sum()`, `mean()`), math ops (`np.sqrt()`), linear algebra (`np.linalg.inv()`).  

# Good Job!
You have made great progress! Now you can try working with a real dataset and see how you could use numpy methods to play around with it. Tinker around and see what you can find!