# Numpy

NumPy (or Numpy) is a Linear Algebra Library for Python, the reason it is so important for Data Science with Python is that almost all of the libraries in the PyData Ecosystem rely on NumPy as one of their main building blocks.

Numpy is also incredibly fast, as it has bindings to C libraries. The following post explains why it is better to use Arrays instead of lists [StackOverflow post](http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists).

## Installation Instructions

**It is highly recommended to install Python using the Anaconda distribution to make sure all underlying dependencies (such as Linear Algebra libraries) all sync up with the use of a conda install. If you have Anaconda, NumPy can be installed by going to the terminal or command prompt and typing:**
    
    conda install numpy
    
**If no Anaconda is used and someone can not install it, please refer to [Numpy's official documentation on various installation instructions.](http://docs.scipy.org/doc/numpy-1.10.1/user/install.html)**

## Using NumPy

Importing it as a library:

In [1]:
import numpy as np

NumPy has many built-in functions and capabilities. Some of the most important aspects of Numpy are: vectors,arrays,matrices, and number generation. Let's start by discussing arrays.

# NumPy Arrays

NumPy arrays essentially come in two flavors: vectors and matrices. Vectors are strictly 1-d arrays and matrices are 2-d (but a matrix can still have only one row or one column).

## NumPy Arrays Creation

### Using Python lists

#### Creating an array by directly converting a list or list of lists:

In [2]:
my_list = [1, 2, 3]
my_list

[1, 2, 3]

#### Array

In [3]:
my_arr = np.array(my_list)
my_arr

array([1, 2, 3])

#### Matrix

In [4]:
my_mat = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
my_mat

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [5]:
my_matrix = np.array(my_mat)
my_matrix

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

### Built-in Methods

There are lots of built-in ways to generating Arrays

#### np.arange

np.arange() is similar to Python's range() function: Returns evenly spaced values within a given interval.

Vector creation with np.arange():

In [6]:
my_range = np.arange(0, 11, 2)  # start, stop, step

In [7]:
my_range

array([ 0,  2,  4,  6,  8, 10])

In [8]:
# Default
np.arange(0,10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

#### np.zeros

Making array of zeros

Vector creation with np.zeros():

In [9]:
my_zeros = np.zeros(3)

In [10]:
my_zeros

array([0., 0., 0.])

Matrix creation with np.zeros():

In [11]:
my_zeros_matrix = np.zeros((2, 3))

In [12]:
my_zeros_matrix

array([[0., 0., 0.],
       [0., 0., 0.]])

#### np.ones

Making array of ones

Vector creation with np.ones():

In [13]:
my_ones = np.ones(5)

In [14]:
my_ones

array([1., 1., 1., 1., 1.])

Matrix creation with np.ones():

In [15]:
my_ones_matrix = np.ones((3, 2))

In [16]:
my_ones_matrix

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

#### np.linspace

Returns evenly spaced numbers over a specified interval.

Vector creation with np.linspace():

In [17]:
my_linspace = np.linspace(0, 5, 10)

In [18]:
my_linspace

array([0.        , 0.55555556, 1.11111111, 1.66666667, 2.22222222,
       2.77777778, 3.33333333, 3.88888889, 4.44444444, 5.        ])

#### np.eye

Returns a 2-D array with ones (Identity Matrix) on the diagonal and zeros elsewhere.

*Identity Matrix: a square matrix in which all the elements of the principal diagonal are ones and all other elements are zeros. The effect of multiplying a given matrix by an identity matrix is to leave the given matrix unchanged.*

2D array creation with np.eye():

In [19]:
my_eye = np.eye(4, 4)

In [20]:
my_eye

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

#### np.random.rand

Creation of array and matrix of random samples of uniform distribution over 0 to 1

*In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of symmetric probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable.*

1D and 2D arrays creation with np.random.rand():

In [21]:
my_rand = np.random.rand(5)

In [22]:
my_rand

array([0.24969859, 0.61100078, 0.68331825, 0.01759403, 0.89931972])

In [23]:
my_rand_matrix = np.random.rand(5, 5)

In [24]:
my_rand_matrix

array([[0.23087076, 0.71259275, 0.69387458, 0.5528331 , 0.06805094],
       [0.17542162, 0.27889693, 0.88482038, 0.68036935, 0.44179774],
       [0.52006744, 0.82884558, 0.76646396, 0.15755804, 0.88269146],
       [0.71746369, 0.2348842 , 0.40158291, 0.78993337, 0.74740251],
       [0.16601492, 0.23277725, 0.86874901, 0.85939054, 0.01729349]])

#### np.random.randn

Creation of array and matrix of random samples of standard normal distribution (set through around 0).

*In probability theory, the normal (or Gaussian or Gauss or Laplaceâ€“Gauss) distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known. A random variable with a Gaussian distribution is said to be normally distributed and is called a normal deviate.*

1D and 2D arrays creation with np.random.randn():

In [25]:
my_randn = np.random.randn(4)

In [26]:
my_randn

array([-1.30306774,  0.36608615, -0.0398938 ,  1.10400546])

In [27]:
my_randn_matrix = np.random.randn(4, 4)

In [28]:
my_randn_matrix

array([[-0.26377324, -0.26096965,  0.20527265,  1.91627655],
       [-0.87259324, -2.16067284, -0.36613936,  2.07282336],
       [-0.59023146,  0.34132009, -0.23192711,  2.10509747],
       [ 1.80549228, -1.67465491,  0.48613453,  0.54568488]])

#### np.random.randint

Creating random integers array from a low to a high number

Random number creation with np.random.randomint():

In [29]:
my_rand_int = np.random.randint(1, 100, 10)  # low, high, (optional that returns array) number of integers I want    

In [30]:
my_rand_int

array([13, 68,  8, 31, 89, 36, 12, 43, 26, 71])

## USEFUL METHODS

#### creating 2 arrays

In [31]:
arr = np.arange(25)

In [32]:
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

In [33]:
ranarr = np.random.randint(0, 50, 10)

In [34]:
ranarr

array([ 1,  8, 41,  0, 27, 16,  5, 16, 27,  2])

#### Method --> reshape()

reshape(): Gives a new shape to an array without changing its data.

In [35]:
arr.reshape(5, 5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

#### Method --> max()

max(): Returns the maximum along a given axis.

In [36]:
ranarr.max()

41

#### Method --> min()

min(): Returns the minimum along a given axis.

In [37]:
ranarr.min()

0

#### Method --> argmax():

argmax(): Returns the indices of the maximum values along an axis.

In [38]:
ranarr.argmax()

2

#### Method --> argmin()

argmin(): Returns the indices of the minimum values along an axis.

In [39]:
ranarr.argmin()

3

#### Method --> shape

shape: The shape property is usually used to get the current shape of an array, but may also be used to reshape the array in-place by assigning a tuple of array dimensions to it.

In [40]:
arr.shape

(25,)

#### Method --> dtype

checks the type of the elements of an array

A numpy array is homogeneous, and contains elements described by a dtype object. A dtype object can be  constructed from different combinations of fundamental numeric types.

In [41]:
arr.dtype

dtype('int64')

Hint: Instead of typing: np.random.randint

Someone can do: 

    from numpy.random inmport randint
    arr = randint(2, 10)

## NumPy Indexing and Selection

### Bracket Indexing and Selection

The simplest way to pick one or some elements of an array looks very similar to Python lists.

#### Creating sample array

In [42]:
arr = np.arange(0, 11)

In [43]:
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

#### Getting a value at an index

In [44]:
arr[8]

8

#### Getting values in a range

In [45]:
arr[1:5]

array([1, 2, 3, 4])

In [46]:
arr[0:5]

array([0, 1, 2, 3, 4])

### Broadcasting

Numpy arrays differ from a normal Python list because of their ability to broadcast.

#### Setting a value with index range (Broadcasting):

In [47]:
arr[0:5]=100

In [48]:
arr

array([100, 100, 100, 100, 100,   5,   6,   7,   8,   9,  10])

Reseting the array, we'll see why I had to reset it later on.

In [49]:
arr = np.arange(0, 11)

#### Important notes on Slices - Slicing the array

In [50]:
slice_of_arr = arr[0:6]

In [51]:
slice_of_arr

array([0, 1, 2, 3, 4, 5])

#### Changing Slice

In [52]:
slice_of_arr[:] = 99

In [53]:
slice_of_arr

array([99, 99, 99, 99, 99, 99])

#### Now note the changes also occur in our original array

In [54]:
arr

array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

Note: Data is not copied, it's a view of the original array! This avoids memory problems! Numpy does not automatically hold copies of modified arrays, but it changes them on the fly to avoid memory issues.

#### To get a copy, it needs to be explicit: use the copy() method

#### Method --> copy()

In [55]:
arr_copy = arr.copy()

In [56]:
arr_copy

array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

### Indexing a 2D array (matrices)

The general format is arr_2d[row][col] or arr_2d[row,col]. I recommend usually using the comma notation for clarity.

#### Creating sample 2d array

In [57]:
arr_2d = np.array(([5, 10, 15], [20, 25, 30], [35, 40, 45]))

In [58]:
arr_2d

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

#### Indexing row

In [59]:
arr_2d[1]

array([20, 25, 30])

Format is arr_2d[row][col] or arr_2d[row,col]

#### Getting individual element value

In [60]:
arr_2d[1][0]

20

##### Second way, with a comma separator

In [61]:
arr_2d[1, 0]

20

#### 2D array slicing

Shape (2,2) from top right corn:

In [62]:
arr_2d[:2, 1:]

array([[10, 15],
       [25, 30]])

#### Shape bottom row

In [63]:
arr_2d[2]

array([35, 40, 45])

#### Shape bottom row - part 2

In [64]:
arr_2d[2, :]

array([35, 40, 45])

### Fancy Indexing

Fancy indexing allows you to select entire rows or columns out of order, to show this, let's quickly build  out a numpy array

#### Set up matrix

In [65]:
arr2d = np.zeros((10, 10))

#### Length of array

In [66]:
arr_length = arr2d.shape[1]

In [67]:
arr_length

10

#### Setting up array

In [68]:
for i in range(arr_length):
    arr2d[i] = i

In [69]:
arr2d

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [5., 5., 5., 5., 5., 5., 5., 5., 5., 5.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [7., 7., 7., 7., 7., 7., 7., 7., 7., 7.],
       [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.],
       [9., 9., 9., 9., 9., 9., 9., 9., 9., 9.]])

#### Fancy indexing allows the following

In [70]:
arr2d[[2, 4, 6, 8]]

array([[2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.]])

#### It allows in any order

In [71]:
arr2d[[6, 4, 2, 7]]

array([[6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [7., 7., 7., 7., 7., 7., 7., 7., 7., 7.]])

### Conditional Selection

Let's briefly go over how to use brackets for selection based off of comparison operators.

#### Setting up an array

In [72]:
arr = np.arange(1, 11)

In [73]:
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

#### Boolean selection --> boolean array

In [74]:
arr > 4

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

#### Showing the values of boolean array

In [75]:
bool_arr = arr > 4

In [76]:
arr[bool_arr]

array([ 5,  6,  7,  8,  9, 10])

IMPORTANT: This is the way of choosing by combining all selection together in a more useful way --> gonna use all of them in Pandas

#### Another way of putting all together

In [77]:
arr[arr > 5]

array([ 6,  7,  8,  9, 10])

#### Quick creation of 2D array

In [78]:
arr_2D = np.arange(50).reshape(5, 10)

In [79]:
arr_2D

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])

In [80]:
arr_2D > 20

array([[False, False, False, False, False, False, False, False, False,
        False],
       [False, False, False, False, False, False, False, False, False,
        False],
       [False,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True]])

In [81]:
arr_2D[arr_2D > 20]

array([21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
       38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])

## NumPy Operations

### Arithmetic

#### creating simple array

In [82]:
arr = np.arange(0, 10)

In [83]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

#### Add arrays

In [84]:
arrAdd = arr + arr

In [85]:
arrAdd

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

#### Multiply arrays

In [86]:
mulArray = arr * arr

In [87]:
mulArray

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

#### Subtract arrays

In [88]:
subArray = arr - arr

In [89]:
subArray

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

### Scalar Operations (i.e. Operations with Numbers)

#### Universal Array Functions

Numpy comes with many [universal array functions](http://docs.scipy.org/doc/numpy/reference/ufuncs.html), which are essentially just mathematical operations you can use to perform the operation across the array. Let's show some common ones:

#### Add with Scalar (Similar: Multiply, Subtract)

In [90]:
arrAddScalar = arr + 100

In [91]:
arrAddScalar

array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])

#### Divide arrays where an element is 0:

*Warning: on division by zero (here 0 is the first element of arr --> 0 / 0), but not an error! Just replaced with nan*

In [92]:
arrDiv = arr/arr

  """Entry point for launching an IPython kernel.


In [93]:
arrDiv

array([nan,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

#### arrDiv2 = 1 / arr

*Warning, but not an error instead infinity (here, the first element of the array is 0 --> 1 / 0)*

In [94]:
arrDiv2 = 1 / arr

  """Entry point for launching an IPython kernel.


In [95]:
arrDiv2

array([       inf, 1.        , 0.5       , 0.33333333, 0.25      ,
       0.2       , 0.16666667, 0.14285714, 0.125     , 0.11111111])

#### Exponents

In [96]:
arrExp = arr ** 3

In [97]:
arrExp

array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])

#### Square root

In [98]:
arrSqr = np.sqrt(arr)

In [99]:
arrSqr

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

#### Calcualting Exponential (e^)

In [100]:
arrExp = np.exp(arr)

In [101]:
arrExp

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

#### Another way of calculating the max (similar min)

In [102]:
arrMax = np.max(arr)

In [103]:
arrMax

9

#### Sin Calculation

In [104]:
arrSin = np.sin(arr)

In [105]:
arrSin

array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

#### Log

In [106]:
arrLog = np.log(arr)

  """Entry point for launching an IPython kernel.


In [107]:
arrLog

array([      -inf, 0.        , 0.69314718, 1.09861229, 1.38629436,
       1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458])

#### Exponential to a number

In [108]:
arr**3

array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])