# Introduction to NumPy
NumPy is the most fundamental library to scientific computing in Python. It forms the basis for most of the important data science libraries like pandas, scipy and tensorflow.

The main data structure that NumPy provides is the n-dimensional array object or **`ndarray`**. ndarray objects may be any number of dimensions. Typically in data science we are dealing with two dimensional tabular data of rows and columns. Here we will begin by creating a 2-D array of random values from a normal distribution and do some basic analysis on it.

In [2]:
import numpy as np

## Create first array

The simplest way to create a small ndarray is to specify the values in a list.

In [7]:
a = np.array([1,2,3,4])  # 1-d array
print(a)
b = np.array([[1,2],[3,4]])  # 2-d array
print(b)
c = np.array([[1,2],[3,4]], dtype=np.float64)  # 2-d array of a given 'data-type'
print(c)

[1 2 3 4]
[[1 2]
 [3 4]]
[[ 1.  2.]
 [ 3.  4.]]


# Common ways to initialize ndarrays

We can initialise an ndarray of a given `shape` in several ways:

In [4]:
# Specify the values explicitly
a = np.array([[1, 2], [3, 4]])

# Create an uninitialized array of random values
b = np.empty((2, 2))

# Create an array of 0s
c = np.zeros((3, 2))

# Create an array of 1s
d = np.ones((2, 3))

# Create an identity matrix
I = np.eye(4)  # Square, so we only need to specify the no. of rows

# Create an array of random values from a distribution with mean=0 and std=1
R = np.random.randn(2, 3)

# Useful attributes for inspecting an ndarray

In [10]:
# VERY helpful to see the 'shape' of an ndarray to make sure you have what you expect
a = np.array([[1, 0, 1], [0, 1, 0]])
print(a.shape)
# We will keep inspecting the shape of arrays regularly to maintain sanity!

# No. of dimensions of an ndarray
# This is also called 'rank' of the ndarray in the numpy world
# (Not to be confused with the 'rank' of a matrix which is a concept in linear algebra)
print(a.ndim)

# Total no. of elements in an ndarray - product of shape values
print(a.size)

# The data-type of an ndarray
print(a.dtype)

(2, 3)
2
6
int32


Note that most common data types are supported by Numpy. For a complete list, see [Numpy Data Types](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html).

To get things started we will create an array with numbers generated from a random normal distribution with mean 0 and standard deviation 1.

In [None]:
np.random.seed(123)
array = np.random.randn(10, 5)  # 10x5 is the 'shape' of the array
array = array.round(2)
array

# Accessing elements
In native Python, the indexing operator, the brackets **[]**, select items from a container. This is most commonly done in tuples, lists and dictionaries. ndarrays use the same operator for selection. 

To select a single element simply place the index of the row and column inside the brackets separated by a comma.

In [None]:
# select the element at 4th row, 3rd column
# Get in the habit of counting from 0 - 0th row, 1st row .. etc
array[4, 3]

In [None]:
# select all the rows of the 4th column
array[:, 4]

In [None]:
# Use slice notation to select a block of data
# Make sure we're getting what we would expect to get
array[5:10, 2:5]

In [None]:
# start:stop:step notation
# Make sure we're getting what we would expect to get
array[3:10:5, ::2]

# Operations on the entire array
Applying an operation to entire array is easy and looks exactly how it would in normal mathematical notation. These operations are not so trivial with python lists

In [None]:
# multiply each element by 5
array * 5

In [None]:
# take 3
array - 3

# Vectorized Operations
NumPy is blazingly fast by Python standards. It is fast because it executes its code in pre-compiled C and Fortran that is highly optimized for scientific computing.

In [None]:
# grab the first row
row = array[:, 0]
some_list = list(row)
print(type(some_list))  # Note that we're dealing with a regular Python list of numbers here

In [None]:
print([x + 1 for x in some_list])

In [None]:
%timeit [x + 1 for x in some_list]

In [None]:
%timeit row + 1

# Applying functions

Its easy to apply NumPy functions to all the values. These are referred to as *universal functions* that act on each element of an array, producing an array in return without the need for an explicit loop.

In [None]:
# absolute value
np.abs(array)

In [None]:
np.sqrt(np.abs(array)).round(2)

In [None]:
# sum all elements in the array
array.sum()

In [None]:
# Same as calling the numpy function on the array
np.sum(array)
# Note that some operations are available as numpy functions, others as methods on the array.
# In general, the syntax np.<function>(<array>) should cover us in most situations.

In [None]:
# sum along rows with axis parameter
# Note - summing 'along' rows gives us the same no. of results as the no. of rows
array.sum(axis=1)

In [None]:
# sum along columns
# Note - summing 'along' columns gives us the same no. of results as the no. of columns
array.sum(axis=0)

In [None]:
# find max of each column
array.max(axis=0)

# Comparison operators
The 6 comparison operators <, >, <=, >=, ==, != work on all elements of the array.

In [None]:
array > 0

In [None]:
# Boolean Indexing
# find out how many values are greater than 0
np.sum(array > 0)

In [None]:
# find percentage of values greater than 0
np.mean(array > 0)

In [None]:
# find how many are between -2 and 2
(array > -2) & (array < 2)

In [None]:
# this should be about 95%
((array > -2) & (array < 2)).mean()

# Common matrix Operations

In [9]:
import numpy as np

# A 2x3 matrix
a = np.random.randn(2, 3)

# A 2x3 matrix
b = np.random.randn(2, 3)

# Multiply two ndarrays element-wise
print(a * b)

# Get the transpose of a matrix
print(a.T)

# Multipy two 2d Matrices using Matrix Multiplication
print(a.T @ b)

# Or use A.dot(B) for matrix multiplication
print(a.T.dot(b))

# Other Linear Algebra Operations

# Matrix inverse
from numpy.linalg import inv
c = np.random.rand(3, 3)
print(inv(c))

[[-0.72033033  1.42497934 -0.3211407 ]
 [-0.2804813   0.03822686  0.09973959]]
[[-0.64890158  0.66039784]
 [-2.09679426 -1.83237088]
 [ 0.50177461  0.43174585]]
[[-1.00081163  0.42721568  0.56786496]
 [-1.54936527  1.4632062   0.91866462]
 [ 0.37363894 -0.3500126  -0.22140111]]
[[-1.00081163  0.42721568  0.56786496]
 [-1.54936527  1.4632062   0.91866462]
 [ 0.37363894 -0.3500126  -0.22140111]]
[[ 0.1722472   1.34174447 -1.2556513 ]
 [ 1.14456092 -1.59456881  0.83428155]
 [-1.15052005  1.58654277  1.15089258]]


[Link](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html) to other Linear Algebra Operations

# Images as Numpy Arrays

Images, being a 2D structure of pixel values, are especially suited for getting familiar with Numpy syntax and operations in a fun and interactive way.

In [58]:
# The following is a Jupyter 'magic' command that tells it to insert plots right within the notebook.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.image import imread

# Set the default size of any plots we generate, in inches
plt.rcParams['figure.figsize'] = (10, 6)

# Read a PNG image into a Numpy Array
img = imread('images/map.png')

In [None]:
# Basic inspection of what we got back by reading the PNG file

print(type(img))
print(img)
print(img.ndim)
print(img.shape)
print(img.max())
print(img.min())

In [None]:
# Displaying an image using Matplotlib
plt.imshow(img, cmap='gray', vmin=0, vmax=1)

In [None]:
# Notice that the image is especially poor in contrast. Can we do something about it?

# Increase Contrast Range
pixel_min_value = img.min()
pixel_range = img.max() - img.min()
img_normalized = (img - pixel_min_value) / pixel_range
plt.imshow(img_normalized, cmap='gray', vmin=0, vmax=1)

In [None]:
# Basic inspection of normalized array

print(img_normalized.ndim)
print(img_normalized.shape)
print(img_normalized.max())
print(img_normalized.min())

In [None]:
# Plot a histogram of the original and normalized images
img_values = img.flatten()
plt.hist(img_values)
img_normalized_values = img_normalized.flatten()
plt.hist(img_normalized_values)

# Can you make the plots 'see-through', and add a legend?

In [None]:
# All pixel values are between 0 and 1
# Is 0 Black and 1 white, or vice versa?
# Let's find out!

blank_image = np.ones(img.shape) * 1
plt.imshow(blank_image, cmap='gray', vmin=0, vmax=1)

In [None]:
# Color images - notice the shape of the resulting matrix, which is 3d instead of 2d
cezanne = imread('images/cezanne.png')
print(cezanne.ndim)
print(cezanne.shape)
print(cezanne.max())
print(cezanne.min())
print(cezanne[:3])

In [None]:
# Display the image
plt.imshow(cezanne)

In [None]:
# Let's extract the Blue/Green/Red channel values of the image into separate numpy arrays
reds = cezanne[:, :, 0]
blues = cezanne[:, :, 1]
greens = cezanne[:, :, 2]

# We'll add 3 subplots to our figure
# With 1 row and 3 columns of subplots, and with a shared Y-axis
# We get back two things:
#     The figure object
#     A tuple of 'Axes' objects, each corresponding to an individual subplot
# So the axes object here has 3 axis inside it
f, axes = plt.subplots(1, 3, sharey=True)

# The 'Super Title' of the figure, common to all subplots
f.suptitle('RGB Histograms')

# For each of the axes, plot a histogram of the R/G/B channel values
axes[0].hist(reds.flatten())
axes[1].hist(blues.flatten())
axes[2].hist(greens.flatten())

# Can we change the color of the plots to correspond to the channel they represent?

In [None]:
# Let's mess with the Red channel, and modify all values in the red channel to 1
reds = np.ones((cezanne.shape[0], cezanne.shape[1]))

# We use the 'dstack' method here to stack the 3 numpy arrays 'depth-wise'
# Notice that we also have 'hstack', and 'vstack' (horizontal-stacking and vertical-stacking respectively)
red_cezanne = np.dstack((reds, blues, greens))

# It's invaluable to look at the shape of the array at every point to ensure there are no surprises
print(red_cezanne.shape)
plt.imshow(red_cezanne)