# Numpy
Numpy is one of the most important foundational packages for Numerical computing in python<br>
Some of the important things in Numpy<br>
1. ndrray - multidimensional array for fast array-oriented and broadcasting operations
2. Fast mathematical opertions without for loops
3. reading/writing array data to the disk
4. memory-mapped files
5. Linear Algebra, random number generation

### Usage for Data Analysis operations
1. Fast vectorized multidimentioanl array operation for data cleanup, subsetting, filtering and other operations
2. Sorting, unique and set operations
3. Descriptive statistics, aggregating and summarizing data
4. Merging and Joining heterogenous datasets
5. conditional logic as array expressions
6. group-wise data manipulations

### What is an array?
1. A numpy array looks similar to a list
1. An array is a grid of values, indexed by a tuple of positive integers
1. It usually contains numeric values. However it can contain string values
1. They work faster than lists
1. An array can be n-dimensional

In [1]:
import numpy as np

## Create NumPy array using list

In [18]:
# create 2 new lists height and weight of 6 people
person_height = [5.2,  5.4, 4.4, 4.5, 5.6, 6]
person_weight = [81, 55, 65, 70, 45, 44]

# create 2 numpy arrays from height and weight from the above lists
person_height = np.array(person_height)
person_weight = np.array(person_weight)

In [3]:
person_height

array([5.2, 5.4, 4.4, 4.5, 5.6, 6. ])

In [19]:
person_height, person_weight

(array([5.2, 5.4, 4.4, 4.5, 5.6, 6. ]), array([81, 55, 65, 70, 45, 44]))

In [4]:
person_weight

array([81, 55, 65, 70, 45, 44])

Print type of 'person_weight'

In [20]:
type(person_weight)

numpy.ndarray

## Advantages of numpy arrays over lists

Python numpy arrays are more compact as compared to lists.

In [21]:
# consider a list of numbers
num_list = [4, 5, 6, 1, 2, 3]

# add 1 to each element
num_modified_list = [i + 1 for i in num_list]

# print the list
num_modified_list

[5, 6, 7, 2, 3, 4]

In [22]:
# create an array from a list
num_array = np.array([4, 5, 6, 1, 2, 3])

# add 1 to each element
num_modified_array = num_array + 1

# print the array
num_modified_array

array([5, 6, 7, 2, 3, 4])

In [8]:
num_list + 1

TypeError: can only concatenate list (not "int") to list

In [2]:
# create a Numpy array using arange of size 1000000
my_array = np.arange(1000000)

# create a python list using range of size 1000000
my_list = list(range(1000000))

In [3]:
# print out execution time for the following

# time required for computations in an array
%time for _ in range(10): my_modified_array = my_array ** 2

# time required for computation in an array    
%time for _ in range(10): my_modified_list = [x ** 2 for x in my_list]

Wall time: 21 ms
Wall time: 2.28 s


## Functions to Create Array

#### An array of Zeros
The zeros() will create a new array of the specified size with the contents filled with zero values.

In [25]:
# create a numpy ray with 3 rows and 2 columns
my_array = np.zeros([3, 2])

# print the array
print(my_array)

[[0. 0.]
 [0. 0.]
 [0. 0.]]


#### An array of Ones
The ones() will create a new array of the specified size with its elements as unity.

In [11]:
# create a numpy ray with 5 elements
my_array = np.ones([5,2])

# print the array
print(my_array)

[[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]


**An Empty array**

The empty() wull create an empty matrix.

In [12]:
# create a numpy ray with 3 rows and 2 columns
my_array = np.empty([3, 2])

# print the array
print(my_array)

[[0. 0.]
 [0. 0.]
 [0. 0.]]


In [13]:
# create a numpy ray with 3 rows and 2 columns
my_array = np.empty((2, 2), dtype = float)

# print the array
print(my_array)

[[1.00930138e-311 4.67296746e-307]
 [6.06712613e-321 3.79442416e-321]]


#### An array of random numbers
The random() function returns random numbers in the half-open interval [0.0, 1.0). The half-open interval includes 0 but excludes 1. The required number of random numbers is passed through the ‘size’ parameter.

In [27]:
# create 2D array of 3 rows and 2 columns of random numbers
my_array = np.random.random(size = (3, 2))

# print the array
my_array

array([[0.38989327, 0.95497496],
       [0.2137153 , 0.87176376],
       [0.66338031, 0.61846389]])

The randn() creates an array of the given shape with random variables from a uniform distribution between (0, 1)

In [15]:
# create 2D array of 4 rows and 2 columns of random numbers
my_array = np.random.randn(4, 2)

# print the array
my_array

array([[ 0.96028445, -0.84442195],
       [ 1.19298895, -0.47365227],
       [ 0.69337921, -1.98909522],
       [ 0.54153558, -1.05088839]])

The randint() returns random integers from low (inclusive) to high (exclusive)

In [16]:
# create an array of 3 of random numbers between 2 and 9
# the randint() will not include 10, the numbers will be generated between 2 and 9 (both inclusive)
my_array = np.random.randint(2, 10, (3, 6))

# print the array
my_array

array([[7, 9, 3, 3, 3, 3],
       [2, 6, 6, 3, 6, 9],
       [7, 3, 8, 9, 5, 2]])

#### Create an array using arange()
arange() function creates an array of numbers between the given range

In [28]:
# create an array of integers between 0 to 10  
# start: start value of interval (inclusive). 0 is the default value
# stop: stop value of interval (exclusive)
# step: returns numbers with given step_size
sequence = np.arange(start = 10, stop = 40, step = 1)

# print the sequence
sequence

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
       27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39])

In [30]:
sequence = np.arange(start = 10, stop = 40, step = 3)

# print the sequence
sequence

array([10, 13, 16, 19, 22, 25, 28, 31, 34, 37])

In [31]:
sequence = np.arange(10, 40)

# print the sequence
sequence

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
       27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39])

## Attributes of Array
It is defined as a specification that defines a property of an object. Attributes of the numpy array do not have parentheses following them

**ndim:**<br>
Prints the dimension (number of axes) of the numpy array

In [23]:
quantity = np.array([(11, 32, 23), (34, 55, 46)])
print(quantity.ndim)

2


In [22]:
quantity

array([[11, 32, 23],
       [34, 55, 46]])

**itemsize:**<br>
Prints the memory occupied by each element.

In [34]:
quantity = np.array([(4, 5, 6, 2, 75, 3)])
print(quantity.itemsize)

4


In [35]:
x = np.array([126556562516518135121515124654,154645231352,745120456231,76431])
print(x.itemsize)

8


In the above example, each element occupies 4 bytes.

**dtype:**<br>
Prints the data type along with the size in bytes.

In [33]:
quantity = np.array([(1, 2, 3)])
print(quantity.dtype)

int32


**size:**<br>
Prints the number of elements in the array.

**shape**<br>
Prints the number of rows and columns of the array.

In [39]:
quantity = np.array([(1, 2, 3, 4, 5, 6), (1,2,3,4,5,6,)])
print(quantity.size)
print(len(quantity))
print(quantity.shape)

12
2
(2, 6)


## Functions/method:
A method is a function that belongs to an object. It takes parameters in the parentheses and returns the modified array.

**reshape:**<br>
The reshape() is used to change the number of rows and columns of an array without changing the data.<br>

In [41]:
# create an array
quantity_1 = np.array([(8, 9, 10), (11, 12, 13)])

# print the array
print("The original array:\n", quantity_1)

The original array:
 [[ 8  9 10]
 [11 12 13]]


In [42]:
# change the shape
quantity_2 = quantity_1.reshape(3, 2)

# print the array
print("The reshaped array:\n", quantity_2)

The reshaped array:
 [[ 8  9]
 [10 11]
 [12 13]]


In [3]:
my_array = np.arange(1,21)
my_array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20])

In [5]:
my_array.reshape(2,10)

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])

In [8]:
my_array.reshape(4,-1)

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

## Indexing Array

**Indexing in 1 dimension**

In [9]:
# declare an array
quantity = np.array([11, 22, 33, 24, 57, 473])

# print the array
print(quantity)

[ 11  22  33  24  57 473]


Each element in the array can be accessed by passing the positional index of the element.

In [10]:
# get the 1st element
print(quantity[0]) 

11


In [None]:
# get the third element
print(quantity[2]) 

**Indexing in 2 dimensions**

In [11]:
# declare a 2D array
quantity = np.array([[101, 231, 321],
                     [412, 512, 622],
                     [712, 821, 912]])

# print the array
quantity

array([[101, 231, 321],
       [412, 512, 622],
       [712, 821, 912]])

We can retrieve an element of the 2D array using two indices i and j - i selects the row, and j selects the column:

In [12]:
# get the element in 3rd row and 2nd column
print(quantity[2, 1])

821


In [13]:
# we can pass ith row and jth column in separate brackets ([])
print(quantity[2][1])

821


**Picking a row or column**

We can also select a single row or column 

In [46]:
# pick the second row from the array
print(quantity[2])

[712 821 912]


In [47]:
# pick the second column from the array
print(quantity[:, 2])

[321 622 912]


In [16]:
print(quantity[0:3, 1])

[231 512 821]


In [14]:
print(quantity[0,1])

231


## Slicing Array

**Slicing a 1D array**

In [17]:
# declare an array
price = np.array([101, 121, 112, 123, 114])

# pick the second, third, and fourth element from the array 
new_price = price[1:4]

# print the array
new_price

array([121, 112, 123])

The slice notation specifies a start and end value [start:end], where 'start' is inclusive but 'end' is exclusive.

In [53]:
# first three elements
price[:3]

array([101, 121, 112])

In [54]:
# all the elements from 112
price[2:]

array([112, 123, 114])

In [55]:
# get the complete array
price[:]

array([101, 121, 112, 123, 114])

**Slicing a 2D array**

In [56]:
# declare a 2D array
prices = np.array([[101, 131, 122, 113, 143],
                   [145, 165, 137, 318, 193],
                   [240, 241, 252, 253, 324],
                   [225, 126, 727, 928, 129]])

# print the array
prices

array([[101, 131, 122, 113, 143],
       [145, 165, 137, 318, 193],
       [240, 241, 252, 253, 324],
       [225, 126, 727, 928, 129]])

In [57]:
# select all rows except 1st
# select 3rd and 4th column
prices[1:, 2:4]

array([[137, 318],
       [252, 253],
       [727, 928]])

**Note:** The index returns an element of the array, the slice returns a list of elements.

## 6. Operations on a 1D Array

In [58]:
# declare two arrays
# first array
prices = np.array([120, 230, 310, 410, 150])

# print the array
print("The first array is", prices, "has length", len(prices))

The first array is [120 230 310 410 150] has length 5


In [59]:
# second array
quantity = np.arange(5)

# print the array
print("The second array is", quantity, "has length", len(quantity))

The second array is [0 1 2 3 4] has length 5


In [60]:
#Add the two arrays

num_array = prices + quantity
num_array

array([120, 231, 312, 413, 154])

In [61]:
# declare another array
num_array = np.array([1, 2, 3, 5])

# print the array
print("The new array is", num_array, "has length", len(num_array))

The new array is [1 2 3 5] has length 4


In [62]:
num_array + quantity

ValueError: operands could not be broadcast together with shapes (4,) (5,) 

**Remark:** If you try to add arrays with different dimensions, you get an error.  

In [63]:
# multiply each element in the array by 4 
quantity * 4

array([ 0,  4,  8, 12, 16])

In [64]:
# get square of each element
quantity ** 2

array([ 0,  1,  4,  9, 16], dtype=int32)

**Using Numpy with Comparison Expressions**

In [65]:
price_pens = np.array([34, 45, 67, 45, 23])

# check which elements are greater than or equal to 40
# the comparison condition gives boolean output
output_array = price_pens >= 40
output_array

array([False,  True,  True,  True, False])

Pass the above boolean array to the main array to fetch the values that satisfy the comparison condition.

In [66]:
# elements greater than or equal to 40
price_pens[output_array]

array([45, 67, 45])

Rather than creating a separate array of booleans, you can specify the comparison operation directly on the main array.

In [67]:
# print my_array
price_pens

# print the elements greater than or equal to 40
price_pens[price_pens >= 40]

array([45, 67, 45])

## 7. Operations on a 2D Array

Let’s create 2 two-dimensional arrays, array_a and array_b.

In [68]:
# declare two arrays
# first array
quanitity_A = np.array([[33, 12], [70, 61]])

# print the array
print("The first array is \n", quanitity_A)

The first array is 
 [[33 12]
 [70 61]]


In [69]:
# second array
quanitity_B  = np.array([[43, 51], [72, 81]])

# print the array
print("The second array is \n", quanitity_B)

The second array is 
 [[43 51]
 [72 81]]


Add the two array

In [70]:
# add the 2 arrays
quanitity_A + quanitity_B

array([[ 76,  63],
       [142, 142]])

Perform multiplication on array_a and array_b.

In [71]:
# multiply the 2 arrays
quanitity_A * quanitity_B

array([[1419,  612],
       [5040, 4941]])

'+=' operation on the `quanitity_A` is equivalent to adding a specified value to each element of the array. The original array gets modified using this operation.


In [None]:
# add 2 to each element
quanitity_A += 2
quanitity_A 

Similarly, you can use other arithmetic operations like -= and *=

In [None]:
# subtract 2 from each element 
quanitity_A -= 2

# print the array
quanitity_A 

In [None]:
# multiply 2 to each element 
quanitity_A *= 2

# print the array
quanitity_A 

**Matrix Multiplication**<br>

Use '@' operator to perform matrix multiplication (or matrix product).

In [72]:
quanitity_A @ quanitity_B

array([[2283, 2655],
       [7402, 8511]])

The same output can also be obtained by the dot().

In [73]:
quanitity_A.dot(quanitity_B)

array([[2283, 2655],
       [7402, 8511]])

## 8. Arithmetic Functions in Numpy

In [74]:
# given array
performance_scores = np.array([5, 7, 8, 2, 4])

# print the array
performance_scores

array([5, 7, 8, 2, 4])

**sum():**<br>
The sum() function adds all the values in the array and gives a scalar output.

In [75]:
# add all the elements of 'performance_scores'
performance_scores.sum()

26

**min():**<br>
The min function finds the lowest value in the array.

In [76]:
# find minimum of 'performance_scores'
performance_scores.min()

2

**power():**<br>
The power function raises the numbers in the array to the given value.

In [77]:
# get cube of elements of 'performance_scores'
np.power(performance_scores, 3)

array([125, 343, 512,   8,  64], dtype=int32)

**Using the axis parameter**<br>
Use the axis parameter with value 0, to find the sum of all the values in a single column.

In [78]:
# create a 2D array of random integers
num_array = np.random.randint(1, 10, (2, 3))
num_array

array([[4, 8, 3],
       [8, 7, 4]])

In [79]:
# obtain the sum over columns
num_array.sum(axis = 0)

array([12, 15,  7])

Similarly, to find the lowest value across a particular row, use the axis parameter with a value ‘1'.

In [80]:
# obtain the sum over rows
num_array.sum(axis = 1)

array([15, 19])

In [None]:
# obtain the min over rows
num_array.min(axis = 1)

In [None]:
# obtain the max over columns
num_array.max(axis = 0)

## 9. Concatenation of Array

The arrays can be concatenated only if they have the same shape, except in the dimension corresponding to the axis of concatenation.

**Concatenate 1D array**

In [3]:
# concatenate two 1D arrays
# production on day 1 
production_day_1 = np.array([11, 22, 13])

# production on day 2
production_day_2 = np.array([23, 22, 12])

# concatenate the two array giving the production for 2 days
np.concatenate([production_day_1, production_day_2])

array([11, 22, 13, 23, 22, 12])

You can also concatenate more than two arrays at once.

In [82]:
# production on day 3
production_day_3 = np.array([23, 45])

# concatenate the two array giving the production for 3 days
total_produce = np.concatenate([production_day_1, production_day_2, production_day_3])

# print total_produce
total_produce

array([11, 22, 13, 23, 22, 12, 23, 45])

**Concatenate 2D array**

In [3]:
# create a 2D array
scores_player_1 = np.array([[1, 2, 3],
                 [4, 5, 6]])
scores_player_1

array([[1, 2, 3],
       [4, 5, 6]])

In [4]:
# create a 2D array
scores_player_2 = np.array([[5, 6, 3],
                 [4, 9, 6]])
scores_player_2

array([[5, 6, 3],
       [4, 9, 6]])

**concatenate along the first axis**

In [85]:
# by default concatenate() is along 'axis = 0'
np.concatenate([scores_player_1, scores_player_2])

array([[1, 2, 3],
       [4, 5, 6],
       [5, 6, 3],
       [4, 9, 6]])

**concatenate along the second axis**

In [5]:
np.concatenate([scores_player_1, scores_player_2], axis=1)

array([[1, 2, 3, 5, 6, 3],
       [4, 5, 6, 4, 9, 6]])

**Concatenate 1D and 2D array**

In [87]:
# concatenate the 1D and 2D arrays
# consider a 1D array -- 'num_array_1D'
num_array_1D = np.array([23, 45])

# consider a 2D array -- 'num_array_2D'
num_array_2D = np.array([[5, 6, 3],
                 [4, 9, 6]])

np.concatenate((num_array_1D, num_array_2D), axis = 0)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s)

**Note:** One can not concatenate the arrays with different dimensions

## 10. Stacking of Array

Stacking can be used to join 2 or more arrays along the different axes. This method can also be used to create higher-dimensional arrays using lower-dimensional arrays. 
Numpy vstack, Numpy hstack, and Numpy concatenate are all similar functions to join the arrays.

In [6]:
# create two arrays
num_array_1 = np.array([[1, 2, 3], [4, 5, 6]])
num_array_2 = np.array([[7, 8, 9], [10, 11, 12]])
print(num_array_1) 
print(num_array_2)

[[1 2 3]
 [4 5 6]]
[[ 7  8  9]
 [10 11 12]]


**NumPy stack** enables you to combine arrays along the specified axis. We can create a higher-dimensional array using stack().

In [7]:
# stack arrays along the 1st axis
array_stack = np.stack((num_array_1, num_array_2), axis=1)
array_stack

array([[[ 1,  2,  3],
        [ 7,  8,  9]],

       [[ 4,  5,  6],
        [10, 11, 12]]])

**NumPy hstack** enables you to combine arrays horizontally.

In [8]:
# stack arrays in sequence horizontally (column wise)
array_hstack = np.hstack((num_array_1, num_array_2))
array_hstack

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

**Numpy vstack** enables you to combine arrays vertically.

In [8]:
# stack arrays in sequence vertically (row wise)
array_vstack = np.vstack([num_array_1, num_array_2])
array_vstack

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

**Numpy dstack** will stack arrays along the third axis. It converts each array as the column of the modified array.

In [9]:
# stack arrays in sequence depth wise (along the third axis)
array_dstack = np.dstack((num_array_1, num_array_2))
array_dstack

array([[[ 1,  7],
        [ 2,  8],
        [ 3,  9]],

       [[ 4, 10],
        [ 5, 11],
        [ 6, 12]]])

**Numpy column_stack** enables you to stack 1D array as 2D array.

In [10]:
# create two 1D array
num_array_1 = np.array([1, 4, 8])
num_array_2 = np.array([2, 5, 7])

In [11]:
# stack arrays as 2D array
array_vstack = np.column_stack([num_array_1, num_array_2])
array_vstack

array([[1, 2],
       [4, 5],
       [8, 7]])

In [12]:
# check the dimension of stacked array
np.ndim(array_vstack)

2

**block()** is used to assemble array from a nested list of blocks

In [4]:
# create a block matrix using 1D array

m1 = np.array(['4','1','2','3'])
m2 = np.array([7,4,5,6])
m3 = np.array([6,2,6,4])
m4 = np.array([3,2,6,9])

# 2D matrix using block()
np.block([[m1, m2],
          [m3, m4]])

array([['4', '1', '2', '3', '7', '4', '5', '6'],
       ['6', '2', '6', '4', '3', '2', '6', '9']], dtype='<U11')

## 11. Splitting of Array

Splitting is used to split the array into multiple sub-arrays. It is the opposite of concatenation, which is implemented by functions like split(), hsplit(), and so on.


**split():**

In [18]:
# declare an array
array_x = np.arange(8) 

# split the array into sub-arrays
np.split(array_x, 2)

[array([0, 1, 2, 3]), array([4, 5, 6, 7])]

In [14]:
# declare an array
array_x = np.arange(7) 

# split the array into sub-arrays
np.split(array_x, 2)

ValueError: array split does not result in an equal division

**Split the 1D array at positions indicated**

In [19]:
# the split occurs at 5th and 7th indices
array_y = np.split(array_x, [5, 7])

# print array
print(array_y)

[array([0, 1, 2, 3, 4]), array([5, 6]), array([7])]


In [16]:
# split 'array_x' into 3 sub-arrays
np.split(array_x, 3)

ValueError: array split does not result in an equal division

**Remark:** The split() function does not allow the integer (N) as the number of splits if N does not divide the array into sub-arrays of equal length 
                    

**array_split():** It is used to split the array into sub-arrays. It takes the integer 'N' as the input for the number of splits, even if 'N' does not divide the array into sub-arrays of equal length.<br>

In [20]:
# split 'array_x' into 3 sub-arrays using 'array_split'
np.array_split(array_x, 3)

[array([0, 1, 2]), array([3, 4, 5]), array([6, 7])]

We split the array of length 8 into 3 sub-arrays; the function 'array_split()' returns <i>8 % 3 (=2)</i> sub-arrays of size <i>8//3 + 1 (=3)</i> and the rest (i.e. one sub-array) of size <i>8//3 (=2)</i>.

**vsplit():**<br>
The vsplit() function is used to split an array into multiple sub-arrays vertically (row-wise).

In [None]:
# declare an array
my_array = np.arange(20.0)

print("The array is:\n", my_array)

In [None]:
# reshape the array in 4 rows and 5 columns
my_reshaped_array = my_array.reshape(4, 5)

# print the array
print("The reshaped array is:\n", my_reshaped_array)

In [None]:
# split the array vertically
np.vsplit(my_reshaped_array, 2)

**hsplit():**<br>
The hsplit() function is used to split an array into multiple sub-arrays horizontally (column-wise).

In [None]:
# declare an array
my_array = np.arange(16.0).reshape(4, 4)

# print an array
my_array

In [None]:
# split the array horizontally
np.hsplit(my_array, 2)

**dsplit():**<br>
The dsplit() function is used to split an array into multiple sub-arrays.

In [None]:
# declare an array
my_array = np.arange(12.0).reshape(2, 2, 3)

# print it
my_array

In [None]:
# split the array along the depth in 3 subgroups
np.dsplit(my_array, 3)