# Data Analytics

## Python Arrays, and Intro to NumPy

We are going to learn about ...

- Python and Arrays
- An introduction to NumPy

<br>

---

<br>

### What is an Array?

An array is basically a data structure which can hold more than one value at a time. It is a collection or ordered series of elements of the same type.

An array is a collection of items stored at contiguous memory locations. The idea is to store multiple items of the same type together. 

This makes it easier to calculate the position of each element in memory, by simply adding an offset to a base value, i.e., the memory location of the first element of the array (generally denoted by the name of the array).

If you have a list of items (a list of car names, for example), storing the cars in single variables could look like this:

```python
    car1 = "Ford"
    car2 = "Volvo"
    car3 = "BMW"
```

However, what if you want to loop through the cars and find a specific one? And what if you had not 3 cars, but 300?

The solution is an array!

An array can hold many values -- in square brackets -- under a single name, and you can access the values by referring to an index number.

```python
    # Create an array containing car names
    cars = ["Ford", "Volvo", "BMW"]
```

_Arrays are most useful when we have to manipulate values of only one specific data type_. 

> Python does not have built-in support for Arrays by default, but there are other ways to implement the benefits of Arrays.

Arrays could be handled in Python by a module named `array`. However, we will not examine the `Array` module any further as it's complexities are beyond the scope of this curriculum.

We could also treat `list`s as arrays. _However, lists and arrays are not the same thing._

#### What's the Difference between Python Lists and Python Arrays?

Lists and arrays behave similarly. Just like lists, arrays are an ordered sequence of elements. Both are also mutable and not fixed in size, which means iItems can be added and removed, making them very flexible to work with.

The difference lies in the data types that can be stored:

    -   Lists store items that are of various data types, 
    -   while arrays store only elements of the same data type.

<br>

> **For our purposes of Data Analysis, We'll learn about and use the `NumPy` module built for the manipulation of multi-dimensional array data.**




---

### Python NumPy Array vs. List

We use Python NumPy array instead of a list because of the below three reasons:

- Less Memory - NumPy arrays occupy less memory as compared to list
- Fast - NumPy arrays are fast compared to lists
- Convenient - it is convenient to work with NumPy

> Biggest difference between Python lists and NumPy arrays:- NumPy array elements are homogeneous -- of the same or a similar kind or nature.


In [None]:
#### JUST AN INTERESTING  TEST ####
# how much memory is consumed by NumPy vs Lists
import numpy as np
import time, sys

pyList = range(1000)    # allocate a list
print("Memory size of pyList ...", sys.getsizeof(pyList)*len(pyList))

pyArray = np.arange(1000)   # allocate a NumPy array
print("Memory size of NumPy Array ...", pyArray.size*pyArray.itemsize)

In [None]:
#### ANOTHER INTERESTING  TEST ####
# how fast is NumPy vs Lists
import time, sys
SIZE = 1000000

# create 2 lists and 2 arrays of equal size 
L1 = range(SIZE)
L2 = range(SIZE)

A1 = np.arange(SIZE)
A2 = np.arange(SIZE)

# time to find the sum of lists
start = time.time()
result = [(x,y) for x,y in zip(L1,L2)]
print("List time in ms ...", (time.time()-start)*1000)

# time to find the sum of numpy arrays
start = time.time()
result = A1+A2
print("NumPy time in ms ...", (time.time()-start)*1000)



If you would like to read more about the Python array method, here is a FreeCodeCamp.com article on the topic ... [Python Array Tutorial – Define, Index, Methods](https://www.freecodecamp.org/news/python-array-tutorial-define-index-methods/)



---

<br>

## Python NumPy Module

We'll look at aspects such as ...

- What are NumPy Arrays?
- Where is NumPy used?
- NumPy Arrays v/s List
- Why NumPy is used in Python?
- NumPy Operations

<br>

![What Is Numpy](images/What_Is_NumPy.png)

### Introduction

**NumPy** (_Numerical Python_) is an open-source library for the Python programming language. It is used for scientific computing and working with arrays.

Apart from its multidimensional array object, it also provides high-level functioning tools for working with such arrays.


### What NumPy Does

- It contains a _multidimensional array_ and matrix data structure (in the form of rows and columns)
- Provides a powerful homogeneous _N-dimensional array object_
- Supplies an enormous library of high-level mathematical functions that operate on these arrays and matrices
- Provides sophisticated _broadcasting_ functions
- Adds powerful data structures to Python that guarantee _efficient calculations_ with arrays and matrices

<br>

![The uses of NumPy](images/Uses-of-NumPy.png)

<br>

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. 

Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

<br>

### NumPy works along with other Packages

NumPy combines well with other Python packages like SciPy, Pandas,  Matplotlib, Tkinter, etc. 

The combination of NumPy with other libraries implements a variety of Data Science and Analytics utilities for data wrangling and visualizations etc.

![Applications of NumPy](images/Applications-of-NumPy.png)


### What is a Multi-Dimensional Array

We could graphically display the different elements of an array stored in their respective memory locations, in an image like this. 

This image represents a two-dimensional (2-D) array, because it has 2 plains (rows as well as columns) -- 4 rows and 3 columns of address locations available; or an X and Y axis if one thinks of the data being arranged on axes.

![A Multi Dimensional Array](images/MultiDimensionalArray.png)

NumPy has capabilities to handle more than 2 dimensions of data. Think 3-D, like a cube or something with an X, Y, and Z axes. 

But, NumPy can even do more than 3 dimensions like 4-D etc. ...
> Note: - NUMPY ARRAY AXES ARE NUMBERED STARTING WITH ‘0’


### Useful NumPy keywords

- **Ndarray** == N-dimensional array
- **dtype** == array data type
- **1-D** == one dimensional array
- **2-D** == two-dimensional array
- **3-D** == three-dimensional array
- **Vector** == an array with a single dimension
- **Matrix** == an array with two dimensions
- **Tensor** == an array with 3+ dimensions
- **Dimensions** == the number of axes of an array



### More about NumPy Arrays

- Usually a fixed-size container of items of the same type and size
- An array is the central data structure of the NumPy library
- The number of `dimensions` and items in an array defines its `SHAPE`
- The `shape` is a _**tuple** of non-negative integers_ that specify the dimentions, length and size of each array (3,2,4)
- The rank of the array is the number of its `dimensions`

> You can read all about NumPy at the NumPy project website -- [https://numpy.org/](https://numpy.org/)



### Finding the shape and size of NumPy arrays ...

- Find the number of dimensions of an array with -- `.ndim()`
- Find the shape (number of elements in each dimension) with -- `.shape()`
- Find the size (number of all elements) of the a NumPy array with -- `.size()`

The built-in function `.len()` returns the size of the **_first dimension_**; not to be confused with the number of dimensions in the NumPy array.

**About array `size`:-**

`size` reflects the number of elements in the array, and is equal to the product of the array’s dimensions. 
Examples:-  

    -   3D shape: (4,3,2) -- 4 x 3 x 2 = 24 elements
    -   2D shape: (2,3) -- 2 x 3 = 6 elements

<br>


![NumPy Multi-Dimensional Arrays](images/numpy_array_t.png)


#### **2D Arrays**

> **NOTE: -** In the case of a two-dimensional (2-D) array, `shape` is the number of rows - 1st axis (0), and  number of columns - 2nd axis (1). 

If you only want to get either the number of rows, or the number of columns of a 2-D array, you can get the shape of each element of the tuple individually.


In [None]:
# Example of discovering the shape and size of a Numpy Array
import numpy as np

# build an array
# `arange` creates a list of elements
# reshape the list into a matrix / array
a_2d = np.arange(12).reshape((3, 4))
print("The a_2d array ...\n", a_2d)

print("\nNumber of dimension ...", a_2d.ndim)
# shape is the number of rows (1st axis), and  number of columns (2nd axis). 
print("Shape of array ...", a_2d.shape)

# get either the number of rows, or number of columns
print("\nnumber of rows ...", a_2d.shape[0])
print("number of columns ...", a_2d.shape[1])

<br>

#### **3D Arrays**

![Numpy 3-D Array / Tensor visualized...](images/NumPy_3D_Tensor.png)

> **NOTE: -** For three-dimensional (3-D) arrays, shape is ...

    -   the number of dimensions/axes (0, 1, 2, etc.);
    -   shape/length of each dimension (rows);
    -   and the size/number of elements (columns).
    

In [None]:
# Another Example: - Discovering the shape and size of a 3D NumPy Array
import numpy as np

# this is a very stylized layout of an array
# note the Tuple with a list of lists, with lists
array_example = np.array(
                            [  
                                [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9] ],
                                [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9] ],
                                [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9] ],
                                [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9] ],
                                [ [0 ,1 ,2, 3, 4], [5, 6, 7, 8, 9] ] 
                            ]
                        )

# 3 dimensions/axes
print("Array dimensions / axes ...", array_example.ndim)

# 50 total size/number of elements of array
print("Total number of elements ...", array_example.size)  

# shape is the number of elements in each dimension = 5, 2, 5
print("Array shape ...", array_example.shape)

# len() returns the size of the first dimension
print("array_example length ...", len(array_example))

print("\narray_example ...\n", array_example)

In [None]:
# Creating an array of 6 dimensions using ndim
# with a vector containing values [2,4,6,8,10]
# and verifying the shape
import numpy as np

# creating an array of 6 dimension using ndim
arrTest = np.array([2, 4, 6, 8, 10], ndmin=6)
print("Array of 6 dimensions ..." arrTest)

# verifying the shape
print('\nshape of arrTest ...\n', arrTest.shape)

# .len() returns the size of the first dimension
print("\nlength of arrTest ...", len(arrTest))

#### Note About NumPy Axes:-

The most confusing part of understanding NumPy arrays is how to work with axes, especially when it gets to 3 or more ...

Here is a different look at how the Axes in NumPy work: [NumPy Axes, Explained](https://www.sharpsightlabs.com/blog/numpy-axes-explained/)

<br>



---

<br>

### Let's Get Physical and Create Some Arrays


#### **Install NumPy**

With Pip setup on your system, you can use the command line for installing NumPy.

Install NumPy with Python by typing:

```bash
    pip install numpy
```

#### **Importing NumPy into your projects**

To use NumPy we need to import it like any other modules


In [None]:
# Import the module
import numpy

# OR import and alias the module for easy inline use
import numpy as np


#### **Creating NumPy Arrays**

All you need to do to create a simple array is pass a list to it. If you choose to, you can also specify the type of data in your list

To create a NumPy array, you can use the function `np.array()`.

In [None]:
# creating Numpy Arrays -- from a list with SAME DATA TYPES
import numpy as np

a1 = np.array([1, 2, 3, 4, 5, 6])
print("a1 ...\n", a1)

a2 = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print("a2 ...\n", a2)



Besides creating an array from a sequence of elements, 
    
    -- you can easily create an array filled with 0’s, or 1's :

In [None]:
# creating arrays filled with ONEs or ZEROs
import numpy as np

# build an array with 4x zeros
b = np.zeros(4)
print("b ...", b)

# build an array with 3x ones
c = np.ones(3)
print("c ...", c)


**Or even create an empty array!**

The function `empty` creates an array whose initial content is random and depends on the state of the memory. The reason to use `empty()` over `zeros()` (or something similar) is speed - just make sure to fill every element afterwards!

In [None]:
# create an empty array
import numpy as np

d = np.empty(3)
print("d ...", d)

**You can create an array with a range of elements**:


In [None]:
# create an array from `a range`
import numpy as np

# create an array with `a range` of numbers from 0 upto, but not including 5
e = np.arange(5)
print("e =", e)

**And an array that contains a range of evenly spaced intervals!**

To do this, you will specify the first number, last number, and the step size.

Note: This creates a range from the 1st num, up to but not including the 2nd num.

In [None]:
# Contains `a range`` of evenly spaced intervals
# between 1st num, and 2nd num, in steps of 3rd num
import numpy as np

# Create `a range` of evenly spaced intervals
# between 1st num, and 2nd num, in steps of 3rd num
f = np.arange(2, 9, 2)
print("f ...",  f)

**You can also use `.linspace()` to create an array with linearly spaced values at a specified number of intervals**:

In [None]:
# create an array with linearly spaced values at a specified interval
import numpy as np

# create an array with linearly spaced values at a specified number of intervals
# range from 0 up to 14 in 5 intervals -- including 0 and 14
g = np.linspace(0, 14, num=5)
print("Array g ...", g)

**Specifying the data type of an array**

While the default data type of a NumPy Array is floating point (np.float64), you can explicitly specify which data type you want using the `dtype` keyword.

NumPy supports a much greater variety of numerical data types than Python does.

Along with these NumPy dtypes there are a bunch of methods to change and manipulate data types. So, even though your data might be of one type, you could possibly change that to better work with your project.

> Read much, much more on NumPy `dtypes` in this article -- [NumPy Datatypes](https://www.javatpoint.com/numpy-datatypes)

In [None]:
# explicitly specify which data type you want
import numpy as np

h = np.ones(2, dtype=np.int64)
print(h)


You can learn more about creating arrays here -- [NumPy.org - QuickStart Array Creation](https://numpy.org/doc/stable/user/quickstart.html#quickstart-array-creation)


---

#### **Reshaping NumPy Arrays**

Sometimes, we need to reshape the data in an array, from wide to long for example. As the name suggests, the `numpy.reshape()` function helps us to get a new shape to an array without changing its data.

The NumPy reshape operation changes the shape of an array so that it has a new (but compatible) shape. The rules are:

- The number of elements stays the same.
- The order of the elements stays the same[1].

This function returns a nd-array. It is a new view object if possible; otherwise, it will be a shallow copy. There is no guarantee of the memory layout of the returned array.

        Syntax: -   numpy.reshape(arr, new_shape)

![NumPy Array Reshaping](images/NumPy_Array_Reshaping.png)

In [None]:
# reshaping a NumPy 
import numpy as np

aArray = np.array([[1,2,3], [4,5,6]])
bArray = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
print("newShape1 ...\n", aArray)

newShape1 = np.reshape(aArray, 6)
print("newShape1 ...", newShape1)

newShape2 = bArray.reshape(3, 2, 2)
print("\nnewShape3 ...\n", newShape2)

#### Reshaping using an "Unknown" dimension / size

The `numpy.array.reshape()` function can be called with an “unknown dimension”. This is possible by specifying `-1` as the unspecified dimension.

In NumPy, -1 in `.reshape(3,-1)` refers to an unknown dimension that the `.reshape()` function calculates for you, based on the context of the array.

A common use case is to flatten a nested array of an unknown number of elements to a 1D array.

In [None]:
# reshaping using an "Unknown" dimension
import numpy as np

aArray = np.array([[1,2,3,4,5,6,7,8], [11,12,23,34,45,56,67,78]])
print("newShape1 ...\n", aArray)

newShape3 = aArray.reshape(4,-1)
print("\nnewShape3 ...\n", newShape3)

#### Other NumPy methods to help reshape Arrays ...

There are several functions to help reshaping / resizing NumPy arrays like ...

- `np.vstack()` = used to stack arrays in sequence vertically (row wise) [NumPy: vstack() function](https://www.w3resource.com/numpy/manipulation/vstack.php)

- `np.hstack()` = used to stack arrays in sequence horizontally (column wise) [NumPy: hstack() function](https://www.w3resource.com/numpy/manipulation/hstack.php)

- `np.hsplit()` = used to split an array into multiple sub-arrays horizontally (column-wise) [NumPy: hsplit() function](https://www.w3resource.com/numpy/manipulation/hsplit.php)


---

### **Copy NumPy Arrays**

When working with an array, one might **NOT** want to change the original data. So, often it is useful to make a a copy of the original data array.

Essentially, you’ll sometimes want copies that you can modify without automatically modifying the original data at the same time. 

**_When making copies, there’s an important difference between shallow and deep copying:_**

- **A shallow copy** means constructing a new collection object and then populating it with references to the child objects found in the original. In essence, a shallow copy is only one level deep. The copying process does not recurse and therefore won’t create copies of the child objects themselves. **_Both arrays share the same child objects_**.

- **A deep copy** makes the copying process recursive. It means first constructing a new collection object and then recursively populating it with copies of the child objects found in the original. Copying an object this way walks the whole object tree to _create a fully independent clone_ of the original object and all of its children.

The `deepcopy()` function inside the copy module is used to deep copy lists, but it also works just fine with arrays in Python. 

The `copy.deepcopy()` function takes the array as an input argument and returns a deep copy of the array.

In [None]:
# best way to DEEP copy a NumPy Array:
import copy

xArray = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
zArray = copy.deepcopy(xArray)

print("The xArray ...\n", xArray)
print("\nThe zArray ...\n", xArray)


Read more about the different 'degrees' at which Python objects can be copied here in this article, [Shallow vs Deep Copying of Python Objects](https://realpython.com/copying-python-objects/)