# Data Analytics

## Python NumPy, and Broadcasting

We are going to learn about ...

- What we learned so far
- Concatenation
- Sorting
- Unpacking Array Tuples
- Indexing and slicing
- NumPy and Broadcasting
- Arithmetic, Logic, Comparison, and statistical operations
- Other NumPy Functions

<br>

---


### What we learned so far ...

**NumPy** (_Numerical Python_) is an open-source library for the Python programming language. It is used for scientific computing and working with arrays.

About NumPy Arrays ...
- Usually a fixed-size container of items of the same type and size
- An array is the central data structure of the NumPy library
- The elements are all the same, referred to as the array `dtype`
- The number of `dimensions` and items in an array defines its `SHAPE`
- The `shape` is a _**tuple** of non-negative integers_ that specify the dimentions, length and size of each array
- The rank of the array is the number of its `dimensions`

### Useful NumPy keywords

- **Ndarray** == N-dimensional array
- **1-D** == one dimensional array
- **2-D** == two-dimensional array
- **3-D** == three-dimensional array
- **Vector** == an array with a single dimension
- **Matrix** == an array with two dimensions
- **Tensor** == an array with 3+ dimensions
- **Dimensions** == the number of axes of an array

<br>

![NumPy 3D array creation](images/NumPy_Generic_3D_Creation.png)

<br>

---

### Finding the shape and size of NumPy arrays ...

- Find the number of dimensions of an array with -- `.ndim()`,
- Find the shape (number of elements in each dimension) with -- `.shape()`,
- Find the size (number of all elements) of the a NumPy array with -- `.size()`

The built-in function `.len()` returns the size of the **_first dimension_**; not to be confused with the number of dimensions in the NumPy array.

In [None]:
# Another Example: - Discovering the shape and size of a 3-D NumPy Array
import numpy as np

# this is a very stylized layout of an array
# note the Tuple with a list of lists, with lists
array_example = np.array(
                            [  
                                [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9] ],
                                [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9] ],
                                [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9] ],
                                [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9] ],
                                [ [0 ,1 ,2, 3, 4], [5, 6, 7, 8, 9] ] 
                            ]
                        )

# 3 dimensions/axes
print("Array dimensions / axes ...", array_example.ndim)

# 50 total size/number of elements of array
print("Array number of elements ...", array_example.size)  

# shape is the number of elements in each dimension = 5, 2, 5
print("Array shape ...", array_example.shape)

print("\narray_example ...\n", array_example)

# .len() returns the size of the first dimension
print("\narray_example length ...", len(array_example))

In [None]:
# Creating an array of 6 dimension using ndim
# with a vector containing values [2,4,6,8,10]
# and verifying the shape
import numpy as np

# creating an array of 6 dimension using ndim
arrTest = np.array([2, 4, 6, 8, 10], ndmin=6)
print(arrTest)

# verifying the shape
print('\nshape of arrTest ...\n', arrTest.shape)

# .len() returns the size of the first dimension
print("\nlength of arrTest ...", len(arrTest))

#### **Reshaping NumPy Arrays**

As the name suggests, reshape means 'changes in shape'. The `numpy.reshape()` function helps us to get a new shape to an array without changing its data.

Sometimes, we need to reshape the data from wide to long. So in this situation, we have to reshape the array using `.reshape()` function.

This function returns a ndarray. It is a new view object if possible; otherwise, it will be a copy. There is no guarantee of the memory layout of the returned array.

        Syntax: -   numpy.reshape(arr, new_shape)


In [None]:
# reshaping a NumPy array
aArray = np.array([[1,2,3], [4,5,6]])
bArray = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
print("aArray ...\n", aArray)

newShape1 = np.reshape(aArray, 6)
print("newShape1 ...", newShape1)

newShape2 = np.reshape(aArray, (3, 2))
print("\nnewShape2 ...\n", newShape2)

newShape3 = bArray.reshape(3, 2, 2)
print("\nnewShape3 ...\n", newShape3)


![NumPy Array Reshaping](images/NumPy_Array_Reshaping.png)

#### Reshaping using an "Unknown" dimension / size

The `numpy.array.reshape()` function can be called with an “unknown dimension”. This is possible by specifying `-1` as the unspecified dimension.

In NumPy, -1 in `.reshape(3,-1)` refers to an unknown dimension that the `.reshape()` function calculates for you, based on the context of the array.

A common use case is to flatten a nested array of an unknown number of elements to a 1D array.

In [None]:
# reshaping using an "Unknown" dimension
import numpy as np

aArray = np.array([[1,2,3,4,5,6,7,8], [11,12,23,34,45,56,67,78]])
print("newShape1 ...\n", aArray)

newShape3 = aArray.reshape(4,-1)
print("\nnewShape3 ...\n", newShape3)

#### NumPy methods to help reshape Arrays ...

There are several functions to help reshaping / resizing NumPy arrays like ...

- `np.vstack()` = used to stack arrays in sequence vertically (row wise) [NumPy: vstack() function](https://www.w3resource.com/numpy/manipulation/vstack.php)

- `np.hstack()` = used to stack arrays in sequence horizontally (column wise) [NumPy: hstack() function](https://www.w3resource.com/numpy/manipulation/hstack.php)

- `np.hsplit()` = used to split an array into multiple sub-arrays horizontally (column-wise) [NumPy: hsplit() function](https://www.w3resource.com/numpy/manipulation/hsplit.php)


---

### **NumPy Concatenation**

Concatenation refers to joining. This function is used to join two or many more arrays _of the same shape_, along a specified axis. 

The `concatenation()` function takes the following parameters.

        Syntax: -- numpy.concatenate((a1, a2, ...), axis)

- **a1,a2, ...**   represents a sequence of arrays of the _same type_
- **axis**    represents the axis along which arrays have to be joined. _Default is axis 0_

In [None]:
# NumPy concatenation
import numpy as np

arrayA= np.array([1, 2, 3, 4])
arrayB = np.array([5, 6, 7, 8])

arrCombo = np.concatenate((arrayA, arrayB))
print("Print arrCombo ...", arrCombo)


# More Concatenation
arrayX = np.array([[2, 1], [3, 4]])
arrayY = np.array([[5, 6]])
arrayZ = np.array([[8, 7], [9, 10], [12, 11]])

arrayC = np.concatenate((arrayZ, arrayX, arrayY), axis=0)
print("\nAnd arrayC ...\n", arrayC)

arrayCSsort = np.sort(arrayC)
# print ("\narrayC sorted ...\n", arrayCSsort)

newShape = arrayCSsort.reshape(2, 2, 3)
# print ("\nNew shape ...\n", newShape)



---

### **Sorting NumPy arrays**

The NumPy nd-array object has a function called `.sort()`, that will sort a specified array.

Due to speed and memory management during sort, NumPy's `.sort()` functions are much more efficient and useful for data analytics than the build-in Python sort, and even the array module sort methods.

**Note**: The NumPy `.sort()` method _returns a sorted **copy** of the array_, leaving the original array unchanged. So, to save the new sorted array, assign it to a variable.

One can sort all sort of different NumPy arrays -- arrays of strings, or many other data types.

Syntax: -

    --  ndarray.sort(axis= -1, kind=None, order=None)

To sort a 2D NumPy array by a row or column, in the sort function you can set the **axis** parameter to 0 or 1, respectively. The axis is used when working with multidimensional arrays. It takes an integer as an argument. If no argument is passed, it uses the default value which is -1.

Sorting a 3D array is quite similar to sorting a 2D array. Here we have 3 axes: - 0, 1, 2.


In [None]:
# Sorting NumPy arrays ...
import numpy as np

# sorting a 1-D array
arr1 = np.array([2, 1, 5, 3, 7, 4, 6, 8])
arr1sort = np.sort(arr1)
print("arr1 ...", arr1sort)

# Sorting a 2-D array:
# on a 2-D array both tuples will be sorted
arr2 = np.array([[3, 2, 4], [5, 0, 1]])
print("arr2 ...\n", np.sort(arr2))

# Sorting an array of strings alphabetically:
arr3 = np.array(['banana', 'cherry', 'apple'])
print("arr3 ...", np.sort(arr3))

# Sorting a boolean array:
arr4 = np.array([True, False, True])
print("arr4 ...", np.sort(arr4))

# sorting a 2D array
arr5 = np.array([[101, 112, 133, 224],
                [235, 76, 207, 148],
                [319, 110, 338, 174]])
print("\narr5 ...\n", np.sort(arr5, axis= 1))

# sorting a 3D array
arr6 = np.array([[[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]],
                [[12, 11, 13, 23], [23, 7, 12, 14], [31, 34, 33, 17]],
                [[10, 6, 13, 22], [34, 7, 20, 14], [31, 34, 33, 7]]])
print("\narr6 ...\n", np.sort(arr6, axis= 2))


#### **`argsort()`**
A related sort function is `argsort()`, which instead of sorting the array elements, returns another array with a list of the **INDEXES** of the sorted elements.

The first element of this result gives the index of the smallest element, the second value gives the index of the second smallest, and so on. 

These indices can then be used (via fancy indexing) to construct the sorted data array if desired.

In [None]:
# `argsort` returns the Index Numbers of elements in order
import numpy as np

xx = np.array([2, 1, 4, 3, 5])
xx1sorted = np.argsort(xx)
print(xx1sorted)


#### **`lexsort()`**

NumPy `.lexsort()` performs an indirect stable sort using a sequence of keys.

        Syntax:-` numpy.lexsort(keys, axis= -1, , kind = None, order = None)`

This function is used for sorting with multiple sort keys, involving more than one array. 

For example, we first sort data in Column A and then sort the values in column B. 

In the below example we take two arrays representing column A and column B. 

On applying the `lexsort()` for sorting first by column A and then by column B, we get the result returned as a list array containing the indices of the elements in column A (like argsort). 

This sort order is then applied to both data columns.

In [None]:
# using NumPy `lexsort()`
import numpy as np

colA = [2,1,1,8,1] # First column
colB = [9,0,3,2,0] # Second column
# Sort by ColA and then by colB
colC = [19,11,11,12,30] #  column

sorted_index = np.lexsort((colB, colC, colA))
print(sorted_index)

#print the result showing the column values as pairs
# listAB = [(colA[i],colB[i]) for i in sorted_index]
# print("\nCombined column values by index list ...\n", listAB)


#### Other sort Methods

Actually, NumPy provides a variety of functions for sorting and searching. There are various sorting algorithms like `quicksort`, `merge sort` and `heapsort` which are implemented using the `numpy.sort()` method, and by specifying the `kind` parameter.

Then there are other sort methods like the `searchsorted()` method and the `partition()` methods. 

You can read about these in the NumPy documentation at ... [NumPy - Sorting, searching, and counting](https://numpy.org/doc/stable/reference/routines.sort.html)



---

### **Unpacking array Tuples**

One can unpack array tuples and assign the elements to different variables.

In Python, you can assign elements of a tuple or list to multiple variables. This is called **_sequence unpacking_**.

If you write variables on the left side separated by commas , then elements of a tuple (or list) on the right side are assigned to each variable. 

The following examples use tuples, but the same is true for lists:

In [None]:
# Assigning variables using multi assignment
tt = ([ [ [0, 1, 2, 3], [4, 5, 6, 7] ],
        [ [10, 11, 12, 13], [14, 15, 16, 17] ],
        [[20, 21, 22, 23], [24, 25, 26, 27] ] ])

a, b, c = tt

print("a = ", a)
print("b = ", b)
print("c = ", c)
print("The data type of c ...", type(c))

# further unpack the "a" variable
a0, a1 = a
print("\nFurther unpacking a ...")
print("a0 = ", a0)
print("a1 = ", a1)
print("The data type of a1 ...", type(a1))


#### **Unpack a nested tuple or list**

You can also unpack nested tuples or lists. 

If you want to expand the inner elements, enclose the variables with `()` or `[]` to match the shape of the array.

In [None]:
# Unpacking a series of nested elements
example = ([ [ [0, 1, 2, 3], [14, 15, 16, 17] ],
        [ [20, 21, 22, 23], [34, 35, 36, 37] ],
        [[40 ,41 ,42, 43], [54, 55, 56, 57] ] ])

[[[a,b,c,d], [e,f,g,h]],
    [[i,j,k,l],[m,n,o,p]],
    [[q,r,s,t],[u,v,w,x]]] = example

print("1st group ...", a, b, c, d)
print("2nd group ...", m, n, o, p)
print("last bunch ...", q, r, s, t, u, v, w, x)

# else using the overflow astrix 
[[[aa,bb,cc,dd], [ee,ff,gg,hh]], *zz] = example

print("\nThe variables aa, bb, to hh ...")
print(aa,bb,cc,dd,ee,ff,gg, hh)

print("\nThe collective variable zz ...")
print(zz)



---

### **NumPy Array Indexing and Slicing**

Item elements in the NumPy ndarray object follow zero-based indexing.

Python NumPy array slicing is used to extract some portion of data from the actual array. Slicing in python means extracting data from one given index to another given index, however, NumPy slicing is slightly different.

The syntax of Python NumPy slicing is [start : stop : step]

Using indexing rules, three types of indexing methods are available − field access, basic slicing, and advanced indexing.

Basic slicing is an extension of Python's basic concept of slicing to n dimensions




In [None]:
#   Indexing and Slicing Arrays ...
import numpy as np 

# slice single item 
arr1 = np.arange(10) 
bee = arr1[5]
print("\nOriginal data arr1 ...", arr1)
print("bee is element at index 5 ...", bee)

# more slicing options
data = np.array([1, 2, 3, 4, 5])
aa2 = data[1]
bb2 = data[2:4]
cc2 = data[3:]
dd2 = data[-2:]

print("\nThe data array ...", data)
print("aa2 element [1] ...", aa2)
print("bb2 slice 2:4 ...", bb2)
print("cc2 slice 3: ...", cc2)
print("dd2 slice -2: ...", dd2)


In [None]:

# slicing a more complex array
arr2 = np.array([[1,2,3],[3,4,5],[4,5,6]]) 

print("\nThe arr2 array ...", arr2)
print("\nSlicing the array from the index arr2[1:]") 
print(arr2[1:])


#### Slicing 2D and 3D arrays

The process of slicing multi-dimensional arrays is more complex, as slicing can be done on multiple axis at the same time.

When slicing a 2D array, use both axes to obtain a rectangular subset of the original array.

Example:- You can use `arr[1:,1:3]` to select rows `1:` to the end of the bottom of the array, and columns `1:3` (columns 1 and 2).

To extract elements of a 3D NumPy array using slice operation, there are 3 dimensions we should consider for specifying the tranche or cube we want to cut.

Example:- You could specify `arr[:2, 1:, :2]` to select dimensions `:2` (the first 2 dimensions); rows `1:` (the last 2 rows); and columns `:2` (the first 2 columns).


In [None]:
# slicing 2D Array
import numpy as np

arr = np.array([[1, 2, 3, 4], 
                [5, 6, 7, 8]])

# from the second list, slice elements from index 1 to 3 (not included)
print(arr[1, 1:3])

In [None]:
# slicing a 3D array
import numpy as np

arr = np.array([[[21, 56, 12], [13, 46, 15], [16, 18, 18]],
                [[20, 19, 22], [23, 29, 25], [26, 18, 28]],
                [[30, 26, 32], [33, 6, 35], [36, 10, 38]]])

print(arr[:2, 1:, :2])


Here is an in-depth article on Slicing Multi-Dimensional NumPy Arrays ... [Indexing and slicing NumPy arrays](https://www.pythoninformer.com/python-libraries/numpy/index-and-slice/)


---

### **Broadcasting NumPy Arrays**

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations.

NumPy operations are usually done element-by-element which requires two arrays to have exactly the same shape. NumPy’s broadcasting rule relaxes this constraint when the arrays’ shapes meet certain requirements. 

Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

To determine if two arrays are broadcast-compatible, align the entries of their shapes such that their **_trailing dimensions_** are aligned, and then check that each pair of aligned dimensions satisfy either of the following 2 conditions:

- the aligned dimensions have the same size
- one of the dimensions has a size of 1

The two arrays are broadcast-compatible if either of these conditions are satisfied for each pair of aligned dimensions.

> **The Broadcasting Rule: -** In order to broadcast, the size of the trailing axes for both arrays in an operation, must either be the same size, or one of them must be one.


In [None]:
# simplest broadcasting  - an array and a scalar value are combined
from numpy import array

a = array([1.0,2.0,3.0])
b = 2.0     # size of 1

print(a * b)


In [None]:
# add a 1-D array broadcast to a 2-D array
from numpy import array

arr2D = array([[ 0.0,  0.0,  0.0],
            [10.0, 10.0, 10.0],
            [20.0, 20.0, 20.0],
            [30.0, 30.0, 30.0]])

arr1D = array([1.0, 2.0, 3.0])

print(arr2D + arr1D)


Broadcasting provides a convenient way of taking the outer product (or any other outer operation) of two arrays. 

The following example shows an outer addition operation of two 1-D arrays


In [None]:
# transforming arrays for Broadcasting with`newaxis`
from numpy import array, newaxis

aa = array([0.0, 10.0, 20.0, 30.0])
bb = array([1.0, 2.0, 3.0])

print("The sum of aa and bb ...\n")
print(aa[:,newaxis] + bb)


> Here the `newaxis` index operator inserts a new axis into `aa`, making it a two-dimensional 4x1 array. Then `bb` is broadcast / streached over `aa`.

This image illustrates the stretching of both arrays to produce the desired 4x3 output array.

![NumPy Broadcasting Example](images/NumPy_Broadcasting.png)


In [None]:
## another example of Broadcasting arrays
import numpy as np

# a (3, 4)-shaped array
x = np.array(
        [[-0.0, -0.1, -0.2, -0.3], 
        [-0.4, -0.5, -0.6, -0.7], 
        [-0.8, -0.9, -1.0, -1.1]]
)

# a (4,)-shaped array
y = np.array([1, 2, 3, 4])

# multiplying a shape-(4,) array --> y
# with a shape-(3, 4) array --> x
# `y` is multiplied by each row of `x`

print("arrays x * y ...\n", x * y)


> NOTE: Not all mathematical operations always work for all array broadcasting. Of not subtraction `(-)` could often be tricky. Refer to the NumPy Arithmetic Functions below for more use options .


---

### NumPy - Arithmetic Operations

Arrays for performing arithmetic operations, such as `add()`, `subtract()`, `multiply()`, and `divide()`, must be either of the same shape, or should conform to the array broadcasting rules.
- the trailing dimensions have the same size
- one of the trailing dimensions has a size of 1

Read full details on NumPy Mathematical operations in this article --[NumPy Arithmetic Operations and Functions](https://data-flair.training/blogs/numpy-arithmetic-operations/) 




In [None]:
# basic array Arithmetic operations 
import numpy as np

# Initializing the arrays
arr = np.arange(12, dtype = np.float_).reshape(3, 4) 
arr1 = np.array([6, 12, 15, 18]) 
print("arr ...\n", arr)
print("\narr1 ...\n", arr1)

# Example 1: Adding the two arrays
arr2 = np.add(arr, arr1)
print("\nAddition ...\n", arr2)

# Example 2: Subtracting the two arrays
arr2 = np.subtract(arr, arr1)
print("\nSubtraction ...\n", arr2)

# Example 3: Multiplying the two arrays
arr2 = np.multiply(arr, arr1)
print("\nMultiplication ...\n", arr2)

# Example 4: Dividing the two arrays
arr2 = np.divide(arr, arr1)
print("\nDivision ...\n", arr2)

# Example 5: Use numpy.power() 
# to calculating exponents of an array of numbers
arr = np.array([5, 3, 6, 9, 2, 4]) 
arr2 = np.power(arr,3)
print("\nPower 3 ...\n", arr2)

# Example 6: Use numpy.power() with two arrays
arr = [2, 4, 6, 5, 3]
arr1 = [2, 3, 5, 4, 1]
arr2 = np.power(arr,arr1)
print("\nPower of arrays ...\n", arr2)

# Example 7: Use numpy.reciprocal() function
arr = np.array([50, 1.34, 3, 1, 25]) 
arr2 = np.reciprocal(arr)
print("\nReciprocal1 ...\n", arr2)

# Example 8: Use reciprocal function
arr = np.array([75], dtype = int) 
arr2 = np.reciprocal(arr) 
print("\nReciprocal2 ...\n", arr2)

# Example 9: Use numpy.mod() function
arr = np.array([7,16, 25]) 
arr1 = np.array([4,8,6]) 
arr2 = np.mod(arr,arr1) 
print("\nMod function ...\n", arr2)

# Example 10: Use numpy.remainder() function
arr = np.array([7,16, 25]) 
arr1 = np.array([4,8,6]) 
arr2 = np.remainder(arr,arr1) 
print("\nRemainder ...\n", arr2)



---

### NumPy Comparison Operators

The NumPy comparison operators and functions are used to compare array items, and returns Boolean True or false results. 

The NumPy **_comparison functions_** are greater, greater_equal, less, less_equal, equal, and not_equal. 

Like with any other such operations, the NumPy **_comparison operators_** are <, <=, >, >=, == and !=

Learn more about NumPy Comparisons read this article -- [Python NumPy Comparison Operations](https://www.tutorialgateway.org/python-numpy-comparison-operators/)


In [None]:
# comparison operations
import numpy as np

data2 = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print("\ndata2 original array ...", data2)
# find all elements < 5
print("data2 elements < 5 ...", data2[data2 < 5])

# find all elements >= 5
five_up = data2 >= 5
print("data2 elements >= 5 ...", data2[five_up])

# using modulo, find all elements divisible by 2
divisible_by_2 = data2[data2 % 2 == 0]
print("data2 elements divisible_by_2 ...", divisible_by_2)



---

#### NumPy Logic functions routines

Logical operations are used to find the logical relation between two arrays or lists or variables. We can perform logical operations on NumPy array to explore data relationships.

You can explore NumPy Logic function routines in this article -- [NumPy: Logic functions routines](https://www.w3resource.com/numpy/logic-functions/index.php)




---

### NumPy Aggregate or Statistical functions

Python NumPy library has many aggregate or statistical functions for doing different types of tasks with one-dimensional or multi-dimensional arrays. 

Some of the useful aggregate functions are `mean()`, `min()`, `max()`, `average()`, `sum()`, `median()`, `percentile()`, etc.

Information on NumPy statistical functions and operations can be found in this article -- [Python numpy Aggregate Functions](https://www.tutorialgateway.org/python-numpy-aggregate-functions/)




---

### **Other NumPy Information**

It is very strongly recommended that, at some time, you work through the following tutorial at https://numpy.org/ as it will give you a different perspective on the materials we covered -- [NumPy: the absolute basics for beginners](https://numpy.org/doc/stable/user/absolute_beginners.html)

Here are some other interesting NumPy methods to look at: -

- To be able to import and save different data files in NumPy, please review -- [Reading and writing files](https://numpy.org/devdocs/user/how-to-io.html)

- `np.nonzero()` = used to count the number of nonzero elements present in array [NumPy Count Nonzero Values in Python arrays](https://sparkbyexamples.com/numpy/numpy-count-nonzero-values-in-python/)

- `np.view()` = create new array object that looks at the same data as the original array [What is the array.view() method from NumPy in Python?](https://www.educative.io/answers/what-is-the-arrayview-method-from-numpy-in-python)



 ---

 ### Closing note ...

NumPy is huge, and it's application is wide.

NumPy is fundamental for working with data, and we'll be using it as we learn about Matplotlib and Pandas.

Never be afraid to Google for more info, or to find an answer for something. Only thing to remember is to compare a few resources before moving with any one answer.

It is through working with data, and continually exploring new options, that we will grow in our understanding of the uses of NumPy.

Here is a website that claims to offer [101 NumPy Exercises for Data Analysis (Python)](https://www.machinelearningplus.com/python/101-numpy-exercises-python/)
