# Data Analytics

## Python NumPy, and Broadcasting

We are going to learn about ...

- Working with the shape and size of an array
- Concatenation
- Sorting
- Unpacking Array Tuples
- Indexing and slicing
- NumPy and Broadcasting
- Arithmetic, Logic, Comparison, and statistical operations
- Other NumPy Functions

<br>

---


### What we learned so far ...

**NumPy** (_Numerical Python_) is an open-source library for the Python programming language. It is used for scientific computing and working with arrays.

About NumPy Arrays ...
- Are usually fixed-size containers of items of the same type and size
- A grid of values that contains information about raw data, how to locate an element, and how to interpret it
- The elements are all the same, referred to as the array `dtype`
- The number of dimensions and items in an array is defined by its shape. 
- The shape of an array is a **TUPLE** of non-negative integers that specify the sizes of each dimension (Remember Tuples are formed with parentheses -- round brackets).
- The rank of the array is the number of its `dimensions`

### Useful NumPy keywords

- **Ndarray** == N-dimensional array
- **1-D** == one dimensional array
- **2-D** == two-dimensional array
- **3-D** == three-dimensional array
- **Vector** == an array with a single dimension
- **Matrix** == an array with two dimensions
- **Tensor** == an array with 3+ dimensions
- **Dimensions** == the number of axes of an array

<br>

<br>

---

### **Working with the shape and size of arrays**

Remember we learned that in NumPy arrays ...
- The elements are all the same, referred to as the array `dtype`
- The number of `dimensions` and items in an array defines its `SHAPE`
- The shape is a _**Tuple** of non-negative integers_ that specify the `sizes` of each `dimension`
- The rank of the array is the number of its `dimensions`

You can get the number of dimensions, shape (length of each dimension), and size (number of all elements) of the NumPy array with `.ndim`, `.shape`, and `.size` attributes of numpy.ndarray. 

The built-in function `.len()` returns the size of the first dimension; not to be confused with the number of dimensions in the NumPy array.

In [1]:
# Example of discovering the shape and size of a Numpy Array
import numpy as np

array_example = np.array([ [ [0, 1, 2, 3], [4, 5, 6, 7] ],
                            [ [0, 1, 2, 3], [4, 5, 6, 7] ],
                            [[0 ,1 ,2, 3], [4, 5, 6, 7] ] ])

exDimension = array_example.ndim
print("Array dimensions ...", exDimension)  # 3 (dimensions)

exSize = array_example.size
print("Array size and length ...", exSize)  # 2,4 (2 axis w/ length of 4)

exShape = array_example.shape
print("Array shape ...", exShape)   # 3, 2, 4

exLength = len(array_example)
print("Array length ...", exLength)   # 3


Array dimensions ... 3
Array size and length ... 24
Array shape ... (3, 2, 4)
Array length ... 3


#### **Reshaping NumPy Arrays**

As the name suggests, reshape means 'changes in shape'. The `numpy.reshape()` function helps us to get a new shape to an array without changing its data.

Sometimes, we need to reshape the data from wide to long. So in this situation, we have to reshape the array using `.reshape()` function.

This function returns a ndarray. It is a new view object if possible; otherwise, it will be a copy. There is no guarantee of the memory layout of the returned array.

        Syntax: -   numpy.reshape(arr, new_shape)


In [None]:
# reshaping a NumPy array
aArray = np.array([[1,2,3], [4,5,6]])
newShape1 = np.reshape(aArray, 6)
print("newShape1 ...", newShape1)

newShape2 = np.reshape(aArray, (3,-1))
# the unspecified value is inferred to be 2
print("\nnewShape2 ...\n", newShape2)

bArray = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newShape3 = bArray.reshape(2, 3, 2)
print("\nnewShape3 ...\n", newShape3)



---

### **NumPy Concatenation**

Concatenation refers to joining. This function is used to join two or many more arrays _of the same shape_, along a specified axis. 

The `concatenation()` function takes the following parameters.

        Syntax: -- numpy.concatenate((a1, a2, ...), axis)

- **a1,a2, ...**   represents a sequence of arrays of the _same type_
- **axis**    represents the axis along which arrays have to be joined. _Default is axis 0_

In [None]:
# NumPy concatenation
import numpy as np

arrayA= np.array([1, 2, 3, 4])
arrayB = np.array([5, 6, 7, 8])

arrCombo = np.concatenate((arrayA, arrayB))
print("Print arrCombo ...", arrCombo)


# More Concatenation
arrayX = np.array([[1, 2], [3, 4]])
arrayY = np.array([[5, 6]])
arrayZ = np.array([[7, 8], [9, 10], [11, 12]])

arrayC = np.concatenate((arrayX, arrayY, arrayZ), axis=0)
print("\nAnd arrayC ...\n", arrayC)



---

### **Sorting NumPy arrays**

The NumPy nd-array object has a function called `.sort()`, that will sort a specified array.

Due to speed and memory management during sort, NumPy's `.sort()` functions are much more efficient and useful for data analytics than the build-in Python sort, and even the array module sort methods.

**Note**: The NumPy `.sort()` method _returns a sorted **copy** of the array_, leaving the original array unchanged.

One can sort all sort of different NumPy arrays -- arrays of strings, or many other data types.


In [None]:
# Sorting NumPy arrays ...
import numpy as np

# sorting a 1-D array
arr1 = np.array([2, 1, 5, 3, 7, 4, 6, 8])
i1 = np.sort(arr1)
print(i1)

# Sort a 2-D array:
# on a 2-D array both tuples will be sorted
arr2 = np.array([[3, 2, 4], [5, 0, 1]])
print(np.sort(arr2))

# Sort an array of strings alphabetically:
arr3 = np.array(['banana', 'cherry', 'apple'])
print(np.sort(arr3))

# Sort a boolean array:
arr4 = np.array([True, False, True])
print(np.sort(arr4))


#### **`argsort()`**
A related function is `argsort()`, which instead of sorting the array elements, returns another array with a list of the **INDEXES** of the sorted elements.

The first element of this result gives the index of the smallest element, the second value gives the index of the second smallest, and so on. 

These indices can then be used (via fancy indexing) to construct the sorted data array if desired.

In [None]:
# `argsort` returns the Index Numbers of elements in order
import numpy as np
xx = np.array([2, 1, 4, 3, 5])
ii12 = np.argsort(xx)
print(ii12)


#### **`lexsort()`**

NumPy `.lexsort()` performs an indirect stable sort using a sequence of keys.

        Syntax:-` numpy.lexsort(keys, axis=- 1)`

This function is used for sorting using multiple sort keys, involving more than one array. 

For example, we first sort data in Column A and then sort the values in column B. 

In the below example we take two arrays representing column A and column B. 

On applying the `lexsort()` function for sorting first by column A and then by column B we get the result of sorting returned as a list array containing the indices of the elements in column A (like argsort). 

This sort order is then applied to both data columns.

In [None]:
# using NumPy `lexsort()`
import numpy as np

# First column -- will act as Indexes of our data
colA = [2,5,1,8,1]
# Second column -- the 2nd column of our data
colB = [9,0,3,2,0]

# Sort by ColA and then by colB
sorted_index = np.lexsort((colA, colB))
print("Here is the result of the col A sort")
print("Sorted Index ...", sorted_index)

#print the result showing the column values as pairs
listAB = [(colA[i],colB[i]) for i in sorted_index]
print("\nCombined column values by index list ...\n", listAB)


#### Other sort Methods

Actually, NumPy provides a variety of functions for sorting and searching. There are various sorting algorithms like `quicksort`, `merge sort` and `heapsort` which are implemented using the `numpy.sort()` method.

Then there are other sort methods like the `searchsorted()` method and the `partition()` methods. 

You can read about these in the NumPy documentation at ... [NumPy - Sorting, searching, and counting](https://numpy.org/doc/stable/reference/routines.sort.html)



---

### **Unpacking array Tuples**

You can also unpack array tuples and assign them to different variables.

In Python, you can assign elements of a tuple or list to multiple variables. It is called **_sequence unpacking_**.

If you write variables on the left side separated by commas , then elements of a tuple (or list) on the right side are assigned to each variable. 

The following examples use tuples, but the same is true for lists:

In [None]:
# Assigning variables using multi assignment
tt = ([ [ [0, 1, 2, 3], [4, 5, 6, 7] ],
        [ [0, 1, 2, 3], [4, 5, 6, 7] ],
        [[0 ,1 ,2, 3], [4, 5, 6, 7] ] ])

a, b, c = tt

print("a = ", a)
print("b = ", b)
print("c = ", c)
print("The data type of c ...", type(c))

# further unpack the "a" variable
a0, a1 = a
print("\nFurther unpacking a ...")
print("a0 = ", a0)
print("a1 = ", a1)
print("The data type of a1 ...", type(a1))


#### **Unpack a nested tuple or list**

You can also unpack nested tuples or lists. If you want to expand the inner elements, enclose the variables with `()` or `[]` to match the shape of the array.

In [None]:
# Unpacking a series of nested elements
example = ([ [ [0, 1, 2, 3], [14, 15, 16, 17] ],
        [ [20, 21, 22, 23], [34, 35, 36, 37] ],
        [[40 ,41 ,42, 43], [54, 55, 56, 57] ] ])

[[[a,b,c,d], [e,f,g,h]],[[i,j,k,l],[m,n,o,p]],[[q,r,s,t],[u,v,w,x]]] = example

print(a, b, c, d)
print(i, j, k, l)
print(q, r, s, t, u, v, w, x)

# else using the overflow astrix 
[[[aa,bb,cc,dd], [ee,ff,gg,hh]], *zz] = example
print("\nThe variables aa, bb, cc, etc...")
print(aa,bb,cc,dd,ee,ff,gg, hh)
print("\nThe collective variable zz ...")
print(zz)



---

### **NumPy Array Indexing and Slicing**

As mentioned earlier, items in ndarray object follows zero-based index. Three types of indexing methods are available − field access, basic slicing and advanced indexing.

Basic slicing is an extension of Python's basic concept of slicing to n dimensions



In [None]:
#   Indexing and Slicing Arrays ...
import numpy as np 

# # slice single item 
arr1 = np.arange(10) 
bee = arr1[5]
print("\nOriginal data arr1 ...", arr1)
print("bee is element at index 5 ...", bee)

# more slicing options
data = np.array([1, 2, 3, 4, 5])
aa2 = data[1]
bb2 = data[2:4]
cc2 = data[3:]
dd2 = data[-2:]

print("\nThe data array ...", data)
print("a2 element [1] ...", aa2)
print("b2 slice 2:4 ...", bb2)
print("c2 slice 3: ...", cc2)
print("d2 slice -2: ...", dd2)

# slicing a more complex array
arr2 = np.array([[1,2,3],[3,4,5],[4,5,6]]) 

print("\nThe arr2 array ...", arr2)
print("\nNow we'll slice the array from the index arr2[1:]") 
print(arr2[1:])



---

### **Broadcasting NumPy Arrays**

A mechanism for performing math operations on arrays of unequal shapes

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations.

NumPy operations are usually done element-by-element which requires two arrays to have exactly the same shape. NumPy’s broadcasting rule relaxes this constraint when the arrays’ shapes meet certain constraints. 

Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

To determine if two arrays are broadcast-compatible, align the entries of their shapes such that their trailing dimensions are aligned, and then check that each pair of aligned dimensions satisfy either of the following conditions:

- the aligned dimensions have the same size
- one of the dimensions has a size of 1

The two arrays are broadcast-compatible if either of these conditions are satisfied for each pair of aligned dimensions.

> **The Broadcasting Rule: -** In order to broadcast, the size of the trailing axes for both arrays in an operation, must either be the same size, or one of them must be one.


In [None]:
# simplest broadcasting  - an array and a scalar value are combined
from numpy import array
a = array([1.0,2.0,3.0])
b = 2.0
print(a * b)


In [None]:
# add a 1-D array broadcast to a 2-D array
from numpy import array
arr2D = array([[ 0.0,  0.0,  0.0],
            [10.0, 10.0, 10.0],
            [20.0, 20.0, 20.0],
            [30.0, 30.0, 30.0]])
arr1D = array([1.0, 2.0, 3.0])
print(arr2D + arr1D)


Broadcasting provides a convenient way of taking the outer product (or any other outer operation) of two arrays. The following example shows an outer addition operation of two 1-d arrays


In [None]:
# transforming arrays for Broadcasting with`newaxis`
from numpy import array, newaxis
aa = array([0.0, 10.0, 20.0, 30.0])
bb = array([1.0, 2.0, 3.0])

print("The sum of aa and bb ...\n")
print(aa[:,newaxis] + bb)


Here the `newaxis` index operator inserts a new axis into `aa`, making it a two-dimensional 4x1 array. Then `bb` is broadcast / streached over `aa`.

This image illustrates the stretching of both arrays to produce the desired 4x3 output array.

![NumPy Broadcasting Example](./images/NumPy_Broadcasting.png)


In [None]:
## another example of Broadcasting arrays

# a shape-(3, 4) array
x = np.array(
        [[-0.0, -0.1, -0.2, -0.3], 
        [-0.4, -0.5, -0.6, -0.7], 
        [-0.8, -0.9, -1.0, -1.1]]
)

# a shape-(4,) array
y = np.array([1, 2, 3, 4])

# multiplying a shape-(4,) array --> y
# with a shape-(3, 4) array --> x
# `y` is multiplied by each row of `x`

print("arrays x * y ...\n", x * y)



---

### NumPy - Arithmetic Operations

Arrays for performing arithmetic operations, such as `add()`, `subtract()`, `multiply()`, and `divide()`, must be either of the same shape, or should conform to the array broadcasting rules.



In [None]:
# basic array Arithmetic operations 
import numpy as np

a = np.arange(9, dtype = np.float_).reshape(3,3) 
b = np.array([10,10,10]) 

print('Add the two arrays:')
print(np.add(a,b))

print('\nSubtract the two arrays:')
print(np.subtract(a,b))

print('\nMultiply the two arrays:')
print(np.multiply(a,b))

print('\nDivide the two arrays:')
print(np.divide(a,b))



---

### NumPy Comparison Operators

The NumPy comparison operators and functions are used to compare array items, and returns Boolean True or false results. 

The NumPy **_comparison functions_** are greater, greater_equal, less, less_equal, equal, and not_equal. 

Like with any other such operations, the NumPy **_comparison operators_** are <, <=, >, >=, == and !=

Learn more about the NumPy Comparison Operators read this article -- [Python numpy Comparison Operators](https://www.tutorialgateway.org/python-numpy-comparison-operators/)


In [None]:
# comparison operations
import numpy as np

data2 = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print("\ndata2 original array ...", data2)
print("data2 elements < 5 ...", data2[data2 < 5])

five_up = data2 >= 5
print("data2 elements >= 5 ...", data2[five_up])

divisible_by_2 = data2[data2 % 2 == 0]
print("data2 elements divisible_by_2 ...", divisible_by_2)



---

#### NumPy Logic functions routines

Logical operations are used to find the logical relation between two arrays or lists or variables. We can perform logical operations on NumPy array to explore data relationships.

You can explore NumPy Logic functions in this article -- [NumPy: Logic functions routines](https://www.w3resource.com/numpy/logic-functions/index.php)




---

### NumPy Aggregate or Statistical functions
- Max / Min / Sum / Mean / etc.

Python NumPy library has many aggregate or statistical functions for doing different types of tasks with one-dimensional or multi-dimensional arrays. Some of the useful aggregate functions are `mean()`, `min()`, `max()`, `average()`, `sum()`, `median()`, `percentile()`, etc.

Information on NumPy statistical functions and operations can be found in this article -- [Python numpy Aggregate Functions](https://www.tutorialgateway.org/python-numpy-aggregate-functions/)




---

### **Other NumPy functions**

- `np.nonzero()` = used to count the number of nonzero elements present in array [NumPy Count Nonzero Values in Python arrays](https://sparkbyexamples.com/numpy/numpy-count-nonzero-values-in-python/)
- `np.vstack()` = used to stack arrays in sequence vertically (row wise) [NumPy: vstack() function](https://www.w3resource.com/numpy/manipulation/vstack.php)
- `np.hstack()` = used to stack arrays in sequence horizontally (column wise) [NumPy: hstack() function](https://www.w3resource.com/numpy/manipulation/hstack.php)
- `np.hsplit()` = used to split an array into multiple sub-arrays horizontally (column-wise) [NumPy: hsplit() function](https://www.w3resource.com/numpy/manipulation/hsplit.php)
- `np.view()` = create new array object that looks at the same data as the original array [What is the array.view() method from NumPy in Python?](https://www.educative.io/answers/what-is-the-arrayview-method-from-numpy-in-python)
- `.copy()` = There are different 'degrees' at which Python objects can be copied [Shallow vs Deep Copying of Python Objects](https://realpython.com/copying-python-objects/)



 ---

 ### Closing note ...

NumPy is huge, and it's application is wide.

NumPy is fundamental for working with data,  and we'll be using it as we learn about Matplotlib and Pandas.

It is through working with data, and continually exploring new options, that we will grow in our understanding of the uses of NumPy.
