# Data Analytics

## Python Arrays, and Intro to NumPy

We are going to learn about ...

- Python and Arrays
- An introduction to NumPy

<br>

---

<br>

### What is an Array?

An array is basically a data structure which can hold more than one value at a time. It is a collection or ordered series of elements of the same type.

An array is a collection of items stored at contiguous memory locations. The idea is to store multiple items of the same type together. 

This makes it easier to calculate the position of each element in memory, by simply adding an offset to a base value, i.e., the memory location of the first element of the array (generally denoted by the name of the array).

If you have a list of items (a list of car names, for example), storing the cars in single variables could look like this:

```python
    car1 = "Ford"
    car2 = "Volvo"
    car3 = "BMW"
```

However, what if you want to loop through the cars and find a specific one? And what if you had not 3 cars, but 300?

The solution is an array!

An array can hold many values -- in square brackets -- under a single name, and you can access the values by referring to an index number.

```python
    # Create an array containing car names
    cars = ["Ford", "Volvo", "BMW"]
```

_Arrays are most useful when we have to manipulate values of only one specific data type_. 

> Python does not have built-in support for Arrays, but there are other ways to implement the benefits of Arrays.

Arrays could be handled in Python by a module named `array`. If we create arrays using the `array` module, all elements of the array must be of the same data type.

We will not examine the `Array` module any further as it's complexities are beyond the scope of this curriculum.

We could also treat `list`s as arrays. _However, lists and arrays are not the same thing._

#### What's the Difference between Python Lists and Python Arrays?

Lists and arrays behave similarly. Just like lists, arrays are an ordered sequence of elements. Both are also mutable and not fixed in size, which means iItems can be added and removed, making them very flexible to work with.

The difference lies in the data types that can be stored: Lists store items that are of various data types, while arrays store only elements of the same data type.

If you would like to read more about the Python array method, here is a FreeCodeCamp.com article on the topic ... [Python Array Tutorial – Define, Index, Methods](https://www.freecodecamp.org/news/python-array-tutorial-define-index-methods/)

<br>

### Review: Python Lists:

Python Lists, like arrays in other programming languages, can be used to store multiple items in a single variable.

**A `List` is an ordered sequence of values separated by commas, and enclosed in square brackets `[ ]`.**

```python
    fruitList = ["apple", "banana", "cherry"]
```

#### List Items 

- List items are ordered, changeable, and allow duplicate values.
- When we say that lists are ordered, it means that the items have a defined order, and that order will generally  not change.
- The list is changeable, meaning that we can change, add, and remove items in a list after it has been created.
- List items are indexed, the first item has index [0], the second item has index [1] etc.
- Since lists are indexed, lists can have duplicate items with the same value.

#### Operations on List

We learned that there are numerous operations that can be performed on lists like ...
- getting the length
- accessing elements by index
- extending and changing lists
- adding and repeating lists
- sorting lists
- and we can copy, clear and delete lists



> **For our purposes of Data Analysis, We'll learn about and use the `NumPy` module built for the manipulation of multi-dimensional array data.**




---

<br>

## Python NumPy Module

We'll look at aspects such as ...

- What are NumPy Arrays?
- Where is NumPy used?
- NumPy Arrays v/s List
- Why NumPy is used in Python?
- NumPy Operations

<br>

![What Is Numpy](./images/What_Is_NumPy.png)

### Introduction

**NumPy** (_Numerical Python_) is an open-source library for the Python programming language. It is used for scientific computing and working with arrays.

Apart from its multidimensional array object, it also provides high-level functioning tools for working with such arrays.


### What NumPy Does

- It contains a _multidimensional array_ and matrix data structure (in the form of rows and columns)
- Provides a powerful homogeneous _N-dimensional array object_
- Supplies an enormous library of high-level mathematical functions that operate on these arrays and matrices
- Provides sophisticated _broadcasting_ functions
- Adds powerful data structures to Python that guarantee _efficient calculations_ with arrays and matrices

<br>

![The uses of NumPy](./images/Uses-of-NumPy.png)

<br>

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. 

Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

<br>

### NumPy works along with other Packages

NumPy combines well with other Python packages like SciPy, Pandas,  Matplotlib, Tkinter, etc. 

The combination of NumPy with other libraries implements a variety of Data Science and Analytics utilities for data wrangling and visualizations etc.

![Applications of NumPy](./images/Applications-of-NumPy.png)

<br>

#### **What is a Multi-Dimensional Array**

We could graphically display the different elements of an array stored in their respective memory locations, in an image like this. 

This image represents a two-dimensional (2-D) array, because it has 2 plains -- rows as well as columns: 3 columns and 4 rows of address locations available. Or and X and Y axis if one thinks of the data being arranged on axis. 

![A Multi Dimensional Array](./images/MultiDimensionalArray.png)

NumPy has capabilities to handle more than 2 dimensions of data. 

Think 3-D, like a cube or something with an X, Y, and Z axis. 

But NumPy can even do more than 3 dimensions like 4-D etc. ...

<br>

### More about NumPy Arrays

- Usually a fixed-size container of items of the same type and size
- An array is the central data structure of the NumPy library
- It's a grid of values that contains information about raw data, how to locate an element, and how to interpret it
- The elements are all the same, referred to as the array `dtype`
- The number of `dimensions` and items in an array defines its `SHAPE`
- The shape is a _**tuple** of non-negative integers_ that specify the `sizes` of each `dimension`
- The rank of the array is the number of its `dimensions`

> You can read all about NumPy at the NumPy project website -- [https://numpy.org/](https://numpy.org/)


### Useful NumPy keywords

- **Ndarray** == N-dimensional array
- **1-D** == one dimensional array
- **2-D** == two-dimensional array
- **3-D** == three-dimensional array
- **Vector** == an array with a single dimension
- **Matrix** == an array with two dimensions
- **Tensor** == an array with 3+ dimensions
- **Dimensions** == the number of axes of an array


In [None]:
# Example of discovering the shape and size of a Numpy Array

import numpy as np

array_example = np.array([ [ [0, 1, 2, 3], [4, 5, 6, 7] ],
                            [ [0, 1, 2, 3], [4, 5, 6, 7] ],
                            [[0 ,1 ,2, 3], [4, 5, 6, 7] ] ])

print("Array dimensions ...", array_example.ndim)
print("Array size and length ...", array_example.size)
print("Array shape ...", array_example.shape)

#### We'll look at this again ####

<br>

#### **Install NumPy**

With Pip setup on your system, you can use the command line for installing NumPy.

Install NumPy with Python by typing:

```bash
    pip install numpy
```

#### **Importing NumPy into your projects**

To use NumPy we need to import it like any other modules


In [None]:
# Import the module
import numpy

# OR import and alias the module for easy inline use
import numpy as np

<br>

### Python NumPy Array vs. List

We use python NumPy array instead of a list because of the below three reasons:

- Less Memory - NumPy arrays occupy less memory as compared to list
- Fast - NumPy arrays are fast compared to lists
- Convenient - it is convenient to work with NumPy

> Biggest difference between Python lists and NumPy arrays: NumPy arrays are homogeneous -- Of the same or a similar kind or nature; This referring to the ability of a NumPy array to gold just one single data type.


In [None]:
# how much memory is consumed by NumPy vs Lists

import numpy as np
import time
import sys

# allocate a list
pyList = range(1000) 
print("Memory size of pyList ...", sys.getsizeof(pyList)*len(pyList))

# allocate a NumPy array
pyArray = np.arange(1000)
print("Memory size of pyArray ...", pyArray.size*pyArray.itemsize)


In [None]:
# how fast is NumPy vs Lists

import time
import sys
SIZE = 1000000

# create 2 lists and 2 arrays of equal size 
L1= range(SIZE)
L2= range(SIZE)
A1= np.arange(SIZE)
A2=np.arange(SIZE)

# time to find the sum of lists
start= time.time()
result=[(x,y) for x,y in zip(L1,L2)]
print("List time in ms ...", (time.time()-start)*1000)

# time to find the sum of numpy arrays
start=time.time()
result= A1+A2
print("NumPy time in ms ...", (time.time()-start)*1000)



---

<br>

### Let's Get Physical and Create Some Arrays

All you need to do to create a simple array is pass a list to it. If you choose to, you can also specify the type of data in your list

To create a NumPy array, you can use the function `np.array()`.


In [None]:
# creating Numpy Arrays
import numpy as np

a1 = np.array([1, 2, 3, 4, 5, 6])
print(a1)

a2 = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(a2)



Besides creating an array from a sequence of elements, 
    
    -- you can easily create an array filled with 0’s, or 1's :

In [None]:
import numpy as np

# build an array with 4x zeros
b = np.zeros(4)
print("b ...", b)

# build an array with 3x ones
c = np.ones(3)
print("c ...", c)

**Or even create an empty array!**

The function `empty` creates an array whose initial content is random and depends on the state of the memory. The reason to use `empty()` over `zeros()` (or something similar) is speed - just make sure to fill every element afterwards!

In [None]:
# create an empty array
import numpy as np

d = np.empty(3)
print(d)

**You can create an array with a range of elements**:


In [None]:
# create an array from `a range`
import numpy as np

e = np.arange(5)
print(e)

**And even an array that contains a range of evenly spaced intervals!**

To do this, you will specify the first number, last number, and the step size.

Note: This creates a range from the 1st num, up to but not including the 2nd num.

In [None]:
# Contains `a range`` of evenly spaced intervals
# between 1st num, and 2nd num, in steps of 3rd num
import numpy as np

f = np.arange(2, 9, 2)
print(f)

**You can also use `np.linspace()` to create an array with linearly spaced values at specified intervals**:

In [None]:
# create an array with linearly spaced values at a specified interval
import numpy as np

# range from 0 up to 14 in 5 intervals
g = np.linspace(0, 14, num=5)
print(g)


#### Specifying your data type

While the default data type of a NumPy Array is floating point (np.float64), you can explicitly specify which data type you want using the `dtype` keyword:

In [None]:
# explicitly specify which data type you want
import numpy as np

h = np.ones(2, dtype=np.int64)
print(h)


You can learn more about creating arrays here -- [NumPy.org - QuickStart Array Creation](https://numpy.org/doc/stable/user/quickstart.html#quickstart-array-creation)