# Creating Arrays

We will start with the most fundamental task in NumPy: creating and working with an array. An **array** is a vector, matrix, or tensor of numbers. In plain English, it is a grid of numeric values that we can efficiently perform numeric operations with, such as statistics or machine learning. 



## Why NumPy and Vectorization? 

In Python, you may be familiar with collections like lists. We could take numbers, such as integers or floating point values, and put them inside lists like this. 

In [1]:
x = [1.2, 5.4, 7.3]
y = [3.2, 5.1, 2.8]

print(f"x = {x}")
print(f"y = {y}")

x = [1.2, 5.4, 7.3]
y = [3.2, 5.1, 2.8]


Now let's say we wanted to add each of the respective elements of $ x $ and $ y $ together. If we were use the Python add `+` operator, it would merge the two lists. In a linear algebra/numeric computing context this is not what we want. 

In [None]:
x = [1.2, 5.4, 7.3]
y = [3.2, 5.1, 2.8]

x + y 

To achieve this in plain Python, we would have to use a `for` loop or list comprehension with a `zip()` like below:  

In [None]:
x = [1.2, 5.4, 7.3]
y = [3.2, 5.1, 2.8]

[_x + _y for _x,_y in zip(x,y)]

Not to mention, **Python is REALLY slow**. It is not performant at doing numeric computing in this manner. Python's computational efficiencies comes from the low-level libraries like NumPy which are written in C. 

So what would this look like with NumPy using its `ndarray` type? Let's take a look. 

In [None]:
import numpy as np 

x = np.array([1.2, 5.4, 7.3])
y = np.array([3.2, 5.1, 2.8])

x + y 

Note how we bring in `numpy` and alias it as `np`, a common best practice. We then declare two arrays using the `array()` functions and pass two lists of numeric values to it. We can then proceed to add the the two arrays together using the `+` operator. 

This **vectorized** approach to doing mathematical operations is a lot more efficient and convenient, avoiding a lot of `for` loops and leveraging efficiencies in NumPy. More specifically, NumPy is optimized to handle lists or grids of numbers, and performing mathematical operations with other lists or grids of numbers.  Wtih data full of numeric values, we can take advantage of the fact CPU's and GPU's to do math more efficiently on multiple values at once. Therefore, vectorization is a requirement when you do tasks like machine learning.

You may hear a list of numbers referred to as a **vector**, and a grid of numbers in two or more dimensions referred to as a **matrix** or **tensor**. Below is an example of a vector $ \vec{v} $ and a matrix $ A $. 

$$
\Large \vec{v} = \begin{bmatrix} \Large 3 \\ \Large 2 \\ \Large 7 \end{bmatrix}
$$

$$
\Large A = \begin{bmatrix} \Large -1 & \Large 1  \\ \Large 0.5 & \Large -2 \end{bmatrix}
$$


> Linear algebra is a topic in itself, and you should [consider taking the Anaconda course on Linear Algebra](https://learning.anaconda.cloud/linear-algebra). 

## Declaring an Array 

Let's dive into the array more, or more specifically the `ndarray`. This is probably the most fundamental data type in NumPy. As we saw earlier, we can declare it using a simple numeric list passed to the `array()` function. 

In [None]:
x = np.array([6, 1, 17, 3, 0, 3]) 
x

In [None]:
type(x)

That is a 1-dimensional array. An array can have 0 dimensions, meaning it is just a single scalar value. 

In [None]:
np.array(5)

We can also make a 2-dimensional array, meaning we have an array consisting of rows and columns. You can do this by nesting lists `[]` inside a list `[[]]`. 

In [None]:
x = np.array([[6, 1, 17], 
              [3, 0, 3]]) 
x

A common operation to check the array is to see its `shape`. We can see below this has 2 rows and 3 columns.

In [None]:
x.shape

We can also check the `dtype` of the array, which is the type of numeric values it is holding.

In [None]:
x.dtype

The datatype of the array is inferred from the list of numbers you provide, but you can also be explicit. If we wanted to declare an array of integers and force them to be `float32`, we can do that. 

In [None]:
x = np.array([[6, 1, 17], 
              [3, 0, 3]], dtype='float32') 
x

NumPy [supports many datatypes](https://numpy.org/doc/stable/reference/arrays.dtypes.html) including dates, times, and arbitrary data. Generally when starting out, you will work with floats and integers. The `8`, `16`, `32`, or `64` next to the type specifies how large the range that number can be (at the cost of more memory). An `int16` can hold any integer through $ -32768 $ through $ 32767 $, while `int64` can hold $ -2,147,483,648 $ to $ 2,147,483,647 $. 

Integers also have unsigned counterparts (meaning they can only be 0 or more, no negatives) with `uint8`, `uint16`, `uint32`, etc. 

## Declaring Higher-Dimensional Arrays

You can get really crazy, declaring higher-dimensional tensors where you have stacks and stacks of numeric grids representing images and video data. Below we have a 3x3 pixel image stored as a tensor, where each red-green-blue channel is represented in three sub-layers. The point is... you can get crazy with how you ingest data and store it in higher-dimensional numeric patterns. 

In [None]:
my_image = np.array([
    [[0, 1, 3],
     [6, 2, 6], 
     [1, 5, 4]], 
    [[8, 3, 19],
     [33, 34, 11], 
     [13, 14, 89]], 
    [[14, 68, 17],
     [66, 84, 92], 
     [4, 2, 58]]
])

my_image

We can get the number of dimensions for a given array using the `ndim` property. 

In [None]:
my_image.ndim

## Exercise

Create this matrix of numbers using a NumPy array, and put it in the cell below. 

$$
\Large A = \begin{bmatrix} \Large 7 & \Large 1 & \Large -3 \\ \Large 2 & \Large -2 & \Large 31 \end{bmatrix}
$$

In [None]:
# Put your code here 





### SCROLL DOWN FOR ANSWER
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
|<br>
v 

In [None]:
import numpy as np 

A = np.array([
    [7, 1, -3], 
    [2, -2, 31]
])
A 