# Introduction to NumPy

NumPy, or Numerical Python, is the foundation for Pythonic data science, machine learning, and scientific libraries like **pandas**, **scikit-learn**, and **TensorFlow**. Learning NumPy is to understand the basics of how these other tools work. NumPy (or Numpy) is basically a linear algebra library for Python. Numpy is important for Data Science because almost all of the libraries in the PyData Ecosystem rely on NumPy as one of their main building blocks. 

NumPy provides built-in tools for math, statistics, and linear algebra. NumPy makes numerical computing in Python **fast, efficient, and powerful**.

## NumPy Data Structure

The primary data structure in NumPy is the N-dimensional array, or `ndarray`. NumPy's arrays are a list of lists in Python, but more compact than Python lists. In essence, a Python list is an array of pointers to heterogenous Python objects, while a NumPy array is an array of uniform values of the same type ({numref}`c-vs-python-list`). Python lists there are more flexible, but Numpy arrays are smaller in file size and access in reading and writing items is much faster {cite}`Martelli_2009`.

<!-- , at least 4 bytes per pointer plus 16 bytes for even the smallest Python object (4 for type pointer, 4 for reference count, 4 for value -- and the memory allocators rounds up to 16).  -->

<!-- (4 bytes each for single-precision numbers and 8 bytes double-precision).  -->


```{figure} ../images/c-vs-python-list.png
---
width: 350px
name: c-vs-python-list
alt: c-vs-python-list
align: center
---
Difference between C and Python lists {cite}`Vanderplas_2022`
```

<!-- <img src="../images/c-vs-python-list.png"
alt="c-vs-python-list"
style = "width: 350px"> 
name= "c-vs-python-list"
caption = Difference between C and Python lists {cite}`Vanderplas_2022`
-->

## Installing NumPy

To install NumPy, do one of the following:
At command line, navigate to your project directory (dsm), activate the virtual environment, then use the `pip install` [package] syntax: 
```
pip install numpy
```
<!-- <code>
pip install numpy
</code> -->

You should see the installation happens like:

```python
(.venv) tychen✪macː~/workspace/dsm$ pip install numpy
Collecting numpy
Downloading numpy-2.3.3-cp312-cp312-macosx_14_0_arm64.whl.metadata (62 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.1/62.1 kB 1.3 MB/s eta 0:00:00
Downloading numpy-2.3.3-cp312-cp312-macosx_14_0_arm64.whl (5.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.1/5.1 MB 4.1 MB/s eta 0:00:00
Installing collected packages: numpy
Successfully installed numpy-2.3.3
```


```{note}
Alternatively, in a Jupyter notebook, you may issue `%pip install` [package] in a code cell to install packages just like like `pip install` [package] in the command line. You will see people use `!pip install` as well. `!pip` runs pip as a shell command, while `%pip` is a Jupyter magic function that works in the current running notebook kernel, which allows you to customize your notebooks. 

Don't forget to comment out your `pip` commands in the cells, or it will keep running every time you run the cells.
```

## Using NumPy

Once you've installed NumPy you can import it as a library:

In [None]:
import numpy as np

Numpy has many built-in functions and capabilities. For example:
```
arr = np.array([1, 2, 3, 4, 5])     ### creating array
print(np.mean(arr))                 # mean
print(np.std(arr))                  # standard deviation

```

Here we will focus on some of the most important aspects of Numpy: 
- vectors
- arrays
- matrices 
- number generation. 

## NumPy Arrays

NumPy arrays (n-dimensional array, or `ndarray`) NumPy is like a super-powered list that can store numbers in rows and columns. NumPy arrays essentially come in two flavors: vectors and matrices. Vectors are strictly 1-d arrays has only one axis, and matrices are 2-d arrays with two axes: (rows, columns).

 Mastering arrays, vectorization, and broadcasting will give you a strong foundation for data science and beyond.

### Creating Arrays from Lists
You can create an array by directly casting a list to an array. Let us create some lists first:

In [None]:
nums_list = [1,2,3]
nums_list

In [None]:
type(nums_list)

Now let's cast the list into a numpy array using numpy's `array()` method.

In [None]:
arr = np.array(nums_list)       ### casting a list to a numpy array
arr

In [None]:
type(arr)                   ### data type: numpy.ndarray 

Casting a Python list of lists into NumPy array:

In [None]:
arr = np.array([[1, 2, 3],
                [4, 5, 6]])
print(arr)

In [None]:
list_of_list = [[1,2,3],[4,5,6],[7,8,9]]
list_of_list

Compare the result of evaluation of the Python list above with the NumPy array:

In [None]:
np.array(list_of_list)

### Vectorized Operations 

NumPy does math on whole arrays at once, which much faster than using loops.

In [None]:
arr = np.array([1, 2, 3, 4])
print(arr * 2)              # multiply each element by 2
print(arr + 5)              # add 5 to each element

### Broadcasting

NumPy can combine arrays of different shapes by \"stretching\" one to match the other.

In [None]:
A = np.array([[1, 2, 3],
              [4, 5, 6]])

B = np.array([10, 20, 30])

print(A + B)

### NumPy Array Methods

NumPy arrays (ndarrays) provide a rich set of methods for various operations, including creation, manipulation, and mathematical computations.

#### Array Creation Methods:

- `np.array()`: Creates an array from a Python list or tuple.
- `np.arange()`: Generates an array with evenly spaced values within a given interval.
- `np.zeros()` / `np.ones()` / `np.full()`: Creates arrays filled with zeros, ones, or a specified value, respectively.
- `np.zeros_like()` / `np.ones_like()` / `np.full_like()`: Creates arrays with the same shape and data type as another array, filled with zeros, ones, or a specified value.
- `np.empty()`: Creates an uninitialized array of a given shape and data type.
- `np.identity()`: Returns the identity array (a square array with ones on the main diagonal and zeros elsewhere).

#### Array Manipulation Methods:

- `reshape()`: Changes the shape of an array without changing its data.
- `transpose()`: Swaps the axes of an array.
- `flatten()`: Returns a copy of the array collapsed into one dimension.
- `resize()`: Changes the shape and size of an array in-place.
- `concatenate()` / `stack()`: Joins arrays along an existing or new axis.
- `split()`: Divides an array into multiple sub-arrays.

#### Mathematical and Statistical Methods:

- `sum()` / `min()` / `max()` / `mean()` / `std()`: Computes the sum, minimum, maximum, mean, and standard deviation of array elements, optionally along a specified axis.
- `cumsum()` / `cumprod()`: Computes the cumulative sum or product of elements along an axis.
- `sqrt()` / `exp()` / `log()`: Applies element-wise mathematical functions like square root, exponential, and logarithm.
- `dot()`: Performs matrix multiplication.

#### Other Useful Methods/Attributes:

- `dtype`: Returns the data type of the array elements.
- `ndim`: Returns the number of dimensions of the array.
- `itemsize`: Returns the size in bytes of each element in the array.
- `copy()`: Returns a copy of the array.
- `fill()`: Fills the array with a scalar value.
- `clip()`: Limits the values in an array to a specified range.


## Built-in Methods

There are lots of built-in ways to generate Arrays

### arange

Return evenly spaced values within a given interval.

In [None]:
np.arange(0,10)

In [None]:
np.arange(0,11,2)

### zeros and ones

Generate arrays of zeros or ones

In [None]:
np.zeros(3)

In [None]:
np.zeros((5,5))

In [None]:
np.ones(3)

In [None]:
np.ones((3,3))

### linspace
Return evenly spaced numbers over a specified interval.

In [None]:
np.linspace(0,10,3)

In [None]:
np.linspace(0,10,50)

### eye
Creates an identity matrix

In [None]:
np.eye(4)

## Random 

Numpy also has lots of ways to create random number arrays:

### rand
Create an array of the given shape and populate it with
random samples from a uniform distribution
over ``[0, 1)``.

In [None]:
np.random.rand(5)

In [None]:
np.random.rand(5,5)

### randn

Return a sample (or samples) from the "standard normal" distribution. Unlike rand which is uniform:

In [None]:
np.random.randn(5)

In [None]:
np.random.randn(5,5)

### randint
Return random integers from `low` (inclusive) to `high` (exclusive).

In [None]:
np.random.randint(1,100)

In [None]:
np.random.randint(1,100,10)

## Array Attributes and Methods

Let's discuss some useful attributes and methods or an array:

In [None]:
arr = np.arange(25)
ranarr = np.random.randint(0,50,10)

In [None]:
arr

In [None]:
ranarr

## Reshape
Returns an array containing the same data with a new shape.

In [None]:
arr.reshape(5,5)

### max,min,argmax,argmin

These are useful methods for finding max or min values. Or to find their index locations using argmin or argmax

In [None]:
ranarr

In [None]:
ranarr.max()

In [None]:
ranarr.argmax()

In [None]:
ranarr.min()

In [None]:
ranarr.argmin()

## Shape

Shape is an attribute that arrays have (not a method):

In [None]:
# Vector
arr.shape

In [None]:
# Notice the two sets of brackets
arr.reshape(1,25)

In [None]:
arr.reshape(1,25).shape

In [None]:
arr.reshape(25,1)

In [None]:
arr.reshape(25,1).shape

### dtype

You can also grab the data type of the object in the array:

In [None]:
arr.dtype