# Data Analytics

## Python NumPy, and Broadcasting

We are going to learn about ...

- More NumPy Arrays
- NumPy and Broadcasting

<br>

---


<br>

### What we learned ...

**NumPy** (_Numerical Python_) is an open-source library for the Python programming language. It is used for scientific computing and working with arrays.

About NumPy Arrays ...
- Are usually fixed-size containers of items of the same type and size
- A grid of values that contains information about raw data, how to locate an element, and how to interpret it
- The elements are all the same, referred to as the array `dtype`
- The number of dimensions and items in an array is defined by its shape. 
- The shape of an array is a tuple of non-negative integers that specify the sizes of each dimension.
- The rank of the array is the number of its `dimensions`

### Useful NumPy keywords

- **Ndarray** == N-dimensional array
- **1-D** == one dimensional array
- **2-D** == two-dimensional array
- **3-D** == three-dimensional array
- **Vector** == an array with a single dimension
- **Matrix** == an array with two dimensions
- **Tensor** == an array with 3+ dimensions
- **Dimensions** == the number of axes of an array

<br>

![NumPy Multi-Dimensional Arrays](./images/numpy_array_t.png)

<br>


#### **Install NumPy**

With Pip setup on your system, you can use the command line for installing NumPy.

Install NumPy with Python by typing:

```bash
    pip install numpy
```

#### **Importing NumPy into your projects**

To use NumPy we need to import it like any other modules
You can import 2 ways

```python
    import numpy

    # OR ...

    import numpy as np

```

#### **Creating NumPy Arrays**


In [None]:
# creating Numpy Arrays
import numpy as np

a1 = np.array([1, 2, 3, 4, 5, 6])
print(a1)

a2 = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(a2)


In [None]:
# creating arrays filled with eros or Ones
import numpy as np

# build an array with 4x zeros
b = np.zeros(4)
print("b ...", b)

# build an array with 3x ones
c = np.ones(3)
print("c ...", c)

# create an empty array
d = np.empty(3)
print("d ...", d)


#### **We can create arrays with a range of elements**:

In [None]:
import numpy as np

# Contains `a range`` of evenly spaced intervals
# between 1st num, and 2nd num, in steps of 3rd num
f = np.arange(2, 9, 2)
print("f ...",  f)

# create an array with linearly spaced values at a specified interval
# range from 0 up to 14 in 5 intervals
g = np.linspace(0, 14, num=5)
print("g ...", g)


#### **Specifying the data type of an array**

In [None]:
# explicitly specify which data type -- dtype --  to use
import numpy as np

h = np.ones(2, dtype=np.int64)
print(h)


#### **Sorting NumPy arrays**

The NumPy nd-array object has a function called sort(), that will sort a specified array. 

NumPy's `.sort()` function is much more efficient and useful for data analytics than the Python sort methods or even the Python array sort methods, due to speed and memory management during sort.

**Note**: This method returns a sorted copy of the array, leaving the original array unchanged.

One can also sort arrays of strings, or any other data type:


In [None]:
import numpy as np

# sorting a 1-D array
arr1 = np.array([2, 1, 5, 3, 7, 4, 6, 8])
i1 = np.sort(arr1)
print(i1)

# Sort a 2-D array:
# on a 2-D array both arrays will be sorted
arr2 = np.array([[3, 2, 4], [5, 0, 1]])
print(np.sort(arr2))

# Sort the array alphabetically:
arr3 = np.array(['banana', 'cherry', 'apple'])
print(np.sort(arr3))

# Sort a boolean array:
arr4 = np.array([True, False, True])
print(np.sort(arr4))


#### **`argsort()`**
A related function is argsort, which instead returns an array of the indices of the sorted elements.

The first element of this result gives the index of the smallest element, the second value gives the index of the second smallest, and so on. 

These indices can then be used (via fancy indexing) to construct the sorted array if desired.

In [None]:
# argsort returns the Index Numbers of elements in order
import numpy as np
xx = np.array([2, 1, 4, 3, 5])
i12 = np.argsort(xx)
print(i12)


#### **`lexsort()`**

NumPy `lexsort()` perform an indirect stable sort using a sequence of keys.
Syntax:-` numpy.lexsort(keys, axis=- 1)`

This function is used for sorting using multiple sort keys involving more than one array. 

For example, we first sort data in Column A and then sort the values in column B. 

In the below example we take two arrays representing column A and column B. 

On applying the `lexsort()` function for sorting first by column A and then by column B we get the result of sorting as an array containing the indices of the elements in column A.

In [None]:
import numpy as np

# First column -- Indexes of our data
colA = [2,5,1,8,1]
# Second column -- the values of our data
colB = [9,0,3,2,0]

# Sort by ColA and then by colB
sorted_index = np.lexsort((colB,colA))
print("Sorted Index ...", sorted_index)

#print the result showing the column values as pairs
print( [(colA[i],colB[i]) for i in sorted_index] )


#### Other sort Methods

There are other sort methods like the `searchsorted()` method and the `partition()` methods. You can read about these in the NumPy documentation at ... [NumPy - Sorting, searching, and counting](https://numpy.org/doc/stable/reference/routines.sort.html)

#### **NumPy Concatenation**

