---
title: "NumPy: Quick Start Guide"
author: "Vahram Poghosyan"
date: "2023-10-27"
categories: ["Python", "NumPy", "Machine Learning"]
image: "numpy_quick_start_guide.png"
repo-url: https://www.example.com
format:
  html:
    code-fold: false
    code-line-numbers: true
    code-tools:
      source: repo
jupyter: python3
---

# Quick Introduction to NumPy

Perhaps the most important package for scientific computing included with Conda is NumPy. Let's get a feel for what NumPy offers.

## Python's Built-In Data Types

### Arrays
A 1D array, or a **vector**, is a collection of scalars (usually, but not necessarily, of similar data type) in a contiguous chunk of computer memory. A 2D array, or a **matrix**, is a collection of vectors. A 3D array (or a higher dimensional array), also referred to as a **tensor**, is a collection of matrices.

## NumPy Data Types
NumPy exposes the `ndarray` type. This is a multidimensional, homogeneous array type (i.e. its elements are of the same data type) optimized for computing and indexed by a **tuple**. It offeres mathematical indexing (based on Boolean expressions) so that we don't have to write inefficient loops. The terms vector, matrix, and tensor equally apply to `ndarray`s. 

To import NumPy, we can type: 

In [1]:
import numpy as np

### Working with `ndarray`s

#### Creating `ndarray`s (arange, zero, one)

In [3]:
sequence_array = np.arange(10)
print(sequence_array)

[0 1 2 3 4 5 6 7 8 9]


In [4]:
zeros_array = np.zeros((3,4),dtype='int32')
print(zeros_array)
print(zeros_array.dtype)

[[0 0 0 0]
 [0 0 0 0]
 [0 0 0 0]]
int32


In [5]:
ones_array = np.ones((3,2))
print(ones_array)
print(ones_array.dtype)

[[1. 1.]
 [1. 1.]
 [1. 1.]]
float64


#### Verifying Type

We can verify that the object we're working with is, indeed, and `ndarray` by using the built-in Python `type` function.

In [6]:
array1 = np.array([1,2,3])
print('array1 type: ', type(array1))

array1 type:  <class 'numpy.ndarray'>


#### Getting the Shape

The **shape** of an `ndarray` is in format `(x,y,...)` where `x` corresponds to the number **rows**, `y` corresponds to the number of **columns**, and so on.

In [7]:
print('array1 shape: ', array1.shape)

array1 shape:  (3,)


Higher dimensional `ndarray`s take tuples of arrays as input:

##### Contrived Example of a Multidimensional `ndarray`

There is a subtle difference between a 1D array and a 2D array with a single column which is worth exploring. 

As we saw above, `array1` was of shape `(3,)`. Now let's examine the shape of a similar `ndarray` instance.

In [8]:
array2 = np.array([[1],[2],[3]])
print('array2 shape: ', array2.shape)

array2 shape:  (3, 1)


As we can see, this one's shape is `(3,1)`.

::: {.callout-tip title="📖 Note" appearance="minimal" collapse="false"}
The shape `(3,)` means a 1D array with 3 elements, meanwhile the shape `(3,1)` means a 2D array with 3 rows and a single column.
:::

Sometimes these differences are just superficial, or the result of data impurities. NumPy provides a method called `np.squeeze` which flattens the arrays by removing axes of length 1.

In [15]:
print(np.squeeze(array2).shape == array1.shape)

True


##### A More Natural Example of a Multidimensional `ndarray`

In [16]:
array3 = np.array([[1,2,3], 
                  [4,5,6]])
print('array3 shape: ', array3.shape)

array3 shape:  (2, 3)


#### Getting the Dimension

To get the dimension, we use `ndarray.ndim`. 

In [17]:
print(array1.ndim, array2.ndim, array3.ndim)

1 2 2


#### Getting the Data Types of The Elements

`ndarray`s can include numeric types (int, unsigned int, float, complex), text types (string), and null. However, as mentioned above, `ndarrays` can't include more than one data type. To get the data type of the elements, we use `ndarray.dtype`. 

#### Reshape

We can reshape `ndarrays` where it makes sense. For example, we can reshape `array3`, of shape `(2,3)` into an array of shape `(3,2)`, `(6,1)`, or `(1,6)`. 

In [18]:
print(array3)
print(array3.shape)
array4 = array3.reshape(3,2)
print(array4)
print(array4.shape)

[[1 2 3]
 [4 5 6]]
(2, 3)
[[1 2]
 [3 4]
 [5 6]]
(3, 2)


Providing the value `-1` for either row or column makes the reshape automatic across that dimension. For instance, instead of `array3.reshape(3,2)` we could say `array3.reshape(-1,2)` or `array3.reshape(3,-1)`. This would achieve the same effect.

In [19]:
a=np.array([1,2,3])
b=np.array([4,5,6])
c=np.stack((a,b), axis=1)
print(c.shape)

(3, 2)
