# Data Analytics

## Python NumPy, and Broadcasting

We are going to learn about ...

- More NumPy Arrays
- NumPy and Broadcasting

<br>

---


<br>

### What we learned ...

**NumPy** (_Numerical Python_) is an open-source library for the Python programming language. It is used for scientific computing and working with arrays.

About NumPy Arrays ...
- Are usually fixed-size containers of items of the same type and size
- A grid of values that contains information about raw data, how to locate an element, and how to interpret it
- The elements are all the same, referred to as the array `dtype`
- The number of dimensions and items in an array is defined by its shape. 
- The shape of an array is a tuple of non-negative integers that specify the sizes of each dimension.
- The rank of the array is the number of its `dimensions`

### Useful NumPy keywords

- **Ndarray** == N-dimensional array
- **1-D** == one dimensional array
- **2-D** == two-dimensional array
- **3-D** == three-dimensional array
- **Vector** == an array with a single dimension
- **Matrix** == an array with two dimensions
- **Tensor** == an array with 3+ dimensions
- **Dimensions** == the number of axes of an array

<br>

![NumPy Multi-Dimensional Arrays](./images/numpy_array_t.png)

<br>


#### **Install NumPy**

With Pip setup on your system, you can use the command line for installing NumPy.

Install NumPy with Python by typing:

```bash
    pip install numpy
```

#### **Importing NumPy into your projects**

To use NumPy we need to import it like any other modules
You can import 2 ways

```python
    import numpy
    # OR ...
    import numpy as np

```

#### **Creating NumPy Arrays**


In [None]:
# creating Numpy Arrays
import numpy as np

a1 = np.array([1, 2, 3, 4, 5, 6])
print(a1)

a2 = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(a2)


In [None]:
# creating arrays filled with eros or Ones
import numpy as np

# build an array with 4x zeros
b = np.zeros(4)
print("b ...", b)

# build an array with 3x ones
c = np.ones(3)
print("c ...", c)

# create an empty array
d = np.empty(3)
print("d ...", d)


#### **We can create arrays with a range of elements**:

In [None]:
import numpy as np

# create an array with numbers from 0 upto, but not including 5
e = np.arange(5)
print("e =", e)

# Contains `a range`` of evenly spaced intervals
# between 1st num, and 2nd num, in steps of 3rd num
f = np.arange(2, 9, 2)
print("f ...",  f)

# create an array with linearly spaced values at a specified interval
# range from 0 up to 14 in 5 intervals
g = np.linspace(0, 14, num=5)
print("g ...", g)


#### **Specifying the data type of an array**

In [None]:
# explicitly specify which data type -- dtype --  to use
import numpy as np

h = np.ones(2, dtype=np.int64)
print(h)


#### **Sorting NumPy arrays**

The NumPy nd-array object has a function called `sort()`, that will sort a specified array. 

Due to speed and memory management during sort, NumPy's `.sort()` function is much more efficient and useful for data analytics than the build-in Python sort, and the array sort methods.

**Note**: The NumPy `.sort()` method returns a sorted copy of the array, leaving the original array unchanged.

One can sort al sort of different NumPy arrays -- arrays of strings, or any other data type:


In [None]:
import numpy as np

# sorting a 1-D array
arr1 = np.array([2, 1, 5, 3, 7, 4, 6, 8])
i1 = np.sort(arr1)
print(i1)

# Sort a 2-D array:
# on a 2-D array both tuples will be sorted
arr2 = np.array([[3, 2, 4], [5, 0, 1]])
print(np.sort(arr2))

# Sort the array alphabetically:
arr3 = np.array(['banana', 'cherry', 'apple'])
print(np.sort(arr3))

# Sort a boolean array:
arr4 = np.array([True, False, True])
print(np.sort(arr4))


#### **`argsort()`**
A related function is `argsort()`, which instead returns an array of the **INDEXES** of the sorted elements.

The first element of this result gives the index of the smallest element, the second value gives the index of the second smallest, and so on. 

These indices can then be used (via fancy indexing) to construct the sorted data array if desired.

In [None]:
# argsort returns the Index Numbers of elements in order
import numpy as np
xx = np.array([2, 1, 4, 3, 5])
ii12 = np.argsort(xx)
print(ii12)


#### **`lexsort()`**

NumPy `lexsort()` perform an indirect stable sort using a sequence of keys.

        Syntax:-` numpy.lexsort(keys, axis=- 1)`

This function is used for sorting using multiple sort keys, involving more than one array. 

For example, we first sort data in Column A and then sort the values in column B. 

In the below example we take two arrays representing column A and column B. 

On applying the `lexsort()` function for sorting first by column A and then by column B we get the result of sorting returned as an array containing the indices of the elements in column A(like argsort). 

This sort order is then applied to both data columns.

In [None]:
import numpy as np

# First column -- will act as Indexes of our data
colA = [2,5,1,8,1]
# Second column -- the 2nd column of our data
colB = [9,0,3,2,0]

# Sort by ColA and then by colB
sorted_index = np.lexsort((colB,colA))
print("Here is the result of the col A sort")
print("Sorted Index ...", sorted_index)

#print the result showing the column values as pairs
listAB = [(colA[i],colB[i]) for i in sorted_index]
print("\nCombined column values by index list ...\n", listAB)


#### Other sort Methods

There are other sort methods like the `searchsorted()` method and the `partition()` methods. You can read about these in the NumPy documentation at ... [NumPy - Sorting, searching, and counting](https://numpy.org/doc/stable/reference/routines.sort.html)

#### **NumPy Concatenation**

Concatenation refers to joining. This function is used to join two or many more arrays of the same shape along a specified axis. 

The `concatenation()` function takes the following parameters.

        Syntax: -- numpy.concatenate((a1, a2, ...), axis)

- **a1,a2, ...**   represents a sequence of arrays of the _same type_
- **axis**    represents the axis along which arrays have to be joined. _Default is axis 0_

In [None]:
# NumPy concatenation
import numpy as np

arrayA= np.array([1, 2, 3, 4])
arrayB = np.array([5, 6, 7, 8])

arrCombo = np.concatenate((arrayA, arrayB))
print("Print arrCombo ...", arrCombo)


# More Concatenation
arrayX = np.array([[1, 2], [3, 4]])
arrayY = np.array([[5, 6]])
arrayZ = np.array([[7, 8], [9, 10], [11, 12]])

arrayC = np.concatenate((arrayX, arrayY, arrayZ), axis=0)
print("\nAnd arrayC ...\n", arrayC)


#### **Knowing shape and size of arrays**

Remember we learned that in NumPy arrays ...
- The elements are all the same, referred to as the array `dtype`
- The number of `dimensions` and items in an array defines its `SHAPE`
- The shape is a _**tuple** of non-negative integers_ that specify the `sizes` of each `dimension`
- The rank of the array is the number of its `dimensions`

You can get the number of dimensions, shape (length of each dimension), and size (number of all elements) of the NumPy array with `.ndim`, `.shape`, and `.size` attributes of numpy.ndarray. 

The built-in function `len()` returns the size of the first dimension.

In [None]:
# Example of discovering the shape and size of a Numpy Array
import numpy as np

array_example = np.array([ [ [0, 1, 2, 3], [4, 5, 6, 7] ],
                            [ [0, 1, 2, 3], [4, 5, 6, 7] ],
                            [[0 ,1 ,2, 3], [4, 5, 6, 7] ] ])

exDimension = array_example.ndim
print("Array dimensions ...", exDimension)  # 3 (dimensions)

exSize = array_example.size
print("Array size and length ...", exSize)  # 2,4 (2 axis w/ length of 4)

exShape = array_example.shape
print("Array shape ...", exShape)   # 3, 2, 4


**NOTE: -** In the case of a two-dimensional array, shape is (number of rows, number of columns). 

If you only want to get either the number of rows or number of columns, you can get the shape of each element of the tuple.

In [None]:
# Example of discovering the shape and size of a Numpy Array
import numpy as np

# build an array
a_2d = np.arange(12).reshape((3, 4))
print("The a_2d array ...\n", a_2d)

print("\nNumber of dimension ...", a_2d.ndim)
print("number of rows ...", a_2d.shape[0])
print("number of columns ...", a_2d.shape[1])


#### **Unpacking array Tuples**

You can also unpack array tuples and assign them to different variables.

In Python, you can assign elements of a tuple or list to multiple variables. It is called **_sequence unpacking_**.

If you write variables on the left side separated by commas , then elements of a tuple (or list) on the right side are assigned to each variable. 

The following examples use tuples, but the same is true for lists:

In [None]:
# Assigning variables using multi assignment
tt = ([ [ [0, 1, 2, 3], [4, 5, 6, 7] ],
        [ [0, 1, 2, 3], [4, 5, 6, 7] ],
        [[0 ,1 ,2, 3], [4, 5, 6, 7] ] ])

a, b, c = tt

print("a = ", a)
print("b = ", b)
print("c = ", c)
print("The data type of c ...", type(c))

# further unpack the a variable
a0, a1 = a
print("\nFurther unpacking a ...")
print("a0 = ", a0)
print("a1 = ", a1)
print("The data type of a1 ...", type(a1))


#### **Unpack a nested tuple and list**

You can also unpack a nested tuple or list. If you want to expand the inner elements, enclose the variable with `()` or `[]`.

In [None]:
# Unpacking a series of nested elements
example = ([ [ [0, 1, 2, 3], [14, 15, 16, 17] ],
        [ [20, 21, 22, 23], [34, 35, 36, 37] ],
        [[40 ,41 ,42, 43], [54, 55, 56, 57] ] ])

[[[a,b,c,d], [e,f,g,h]],[[i,j,k,l],[m,n,o,p]],[[q,r,s,t],[u,v,w,x]]] = example

print(a, b, c, d)
print(i, j, k, l)
print(q, r, s, t, u, v, w, x)

# using the overflow astrix 
[[[aa,bb,cc,dd], [ee,ff,gg,hh]], *zz] = example
print("\nThe variables aa, bb, cc, etc...")
print(aa,bb,cc,dd,ee,ff,gg, hh)
print("\nThe collective variable zz ...")
print(zz)


#### **Reshaping NumPy Arrays**



#### **Indexing and Slicing**



In [None]:
#   Indexing and Slicing Arrays ...
data = np.array([1, 2, 3])
aa2 = data[1]
bb2 = data[0:2]
cc2 = data[1:]
dd2 = data[-2:]

print("\nOriginal data ...", data)
print("a2 element ...", aa2)
print("b2 slice 0:2 ...", bb2)
print("c2 1: ...", cc2)
print("d2 -2: ...", dd2)


#### **Other functions **

- Can use “&” and “|” to get Booleans
- np.nonzero() = select elements or indices from an array
- np.vstack() = stack vertically
- np.hstack() = stack horizontally
- np.hsplit() = split arrays
- np.view() = create new array object that looks at same data as the original array
- copy() = complete copy of the array and its data.
- Addition/Subtraction
- Multiplication/Division
- Max/min
- Sum/mean
- Product
- Standard deviation


In [None]:
# comparison operations
data2 = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print("\ndata2 original array ...", data2)
print("data2 elements < 5 ...", data2[data2 < 5])

five_up = data2 >= 5
print("data2 elements >= 5 ...", data2[five_up])

divisible_by_2 = data2[data2 % 2 == 0]
print("data2 elements divisible_by_2 ...", divisible_by_2)


#### **Broadcasting NumPy Array**

A mechanism for performing math operations on arrays of unequal shapes

A mechanism for performing math operations on arrays of unequal shapes

To determine if two arrays are broadcast-compatible, align the entries of their shapes such that their trailing dimensions are aligned, and then check that each pair of aligned dimensions satisfy either of the following conditions:

- the aligned dimensions have the same size
- one of the dimensions has a size of 1

The two arrays are broadcast-compatible if either of these conditions are satisfied for each pair of aligned dimensions.



In [None]:
## BROADCASTING ARRAYS

# a shape-(3, 4) array
x = np.array(
        [[-0.0, -0.1, -0.2, -0.3], 
        [-0.4, -0.5, -0.6, -0.7], 
        [-0.8, -0.9, -1.0, -1.1]]
)

# a shape-(4,) array
y = np.array([1, 2, 3, 4])

# multiplying a shape-(4,) array -- y
# with a shape-(3, 4) array -- x
# `y` is multiplied by each row of `x`

print("arrays x * y ...\n", x * y)
