# Intro to NumPy
NumPy is the most fundamental library to scientific computing in Python. It forms the basis for most of the important data science libraries like pandas and scikit-learn.

The main data structure that NumPy provides is the n-dimensional array object or **`ndarray`**. ndarray objects may be any number of dimensions. Typically in data science we are dealing with two dimensional tabular data of rows and columns so here we will begin by creating an array of random values from a normal distribution and do some basic analysis on it.

In [2]:
import numpy as np

## Create first array
To get things started we will create an array with numbers generated from a random normal distribution with mean 0 and standard deviation 1.

In [16]:
np.random.seed(123)
array = np.random.randn(30, 7)
array = array.round(2)
array

array([[-1.09,  1.  ,  0.28, -1.51, -0.58,  1.65, -2.43],
       [-0.43,  1.27, -0.87, -0.68, -0.09,  1.49, -0.64],
       [-0.44, -0.43,  2.21,  2.19,  1.  ,  0.39,  0.74],
       [ 1.49, -0.94,  1.18, -1.25, -0.64,  0.91, -1.43],
       [-0.14, -0.86, -0.26, -2.8 , -1.77, -0.7 ,  0.93],
       [-0.17,  0.  ,  0.69, -0.88,  0.28, -0.81, -1.73],
       [-0.39,  0.57,  0.34, -0.01,  2.39,  0.41,  0.98],
       [ 2.24, -1.29, -1.04,  1.74, -0.8 ,  0.03,  1.07],
       [ 0.89,  1.75,  1.5 ,  1.07, -0.77,  0.79,  0.31],
       [-1.33,  1.42,  0.81,  0.05, -0.23, -1.2 ,  0.2 ],
       [ 0.47, -0.83,  1.16, -1.1 , -2.12,  1.04, -0.4 ],
       [-0.13, -0.84, -1.61,  1.26, -0.69,  1.66,  0.81],
       [-0.31, -1.09, -0.73, -1.21,  2.09,  0.16,  1.15],
       [-1.27,  0.18,  1.18, -0.34,  1.03, -1.08, -1.36],
       [ 0.38, -0.38,  0.64, -1.98,  0.71,  2.6 , -0.02],
       [ 0.03,  0.18, -1.86,  0.43, -1.61, -0.43,  1.24],
       [-0.74,  0.5 ,  1.01,  0.28, -1.37, -0.33,  1.96],
       [-2.03,

### Accessing elements
In native Python, the indexing operator, the brackets **[]**, select items from a container. This is most commonly done in tuples, lists and dictionaries. ndarrays use the same operator for selection. 

To select a single element simply place the index of the row and column inside the brackets separated by a comma.

In [17]:
array[10, 3]

-1.1000000000000001

In [18]:
# it is 0 indexed
array[0, 0]

-1.0900000000000001

In [20]:
# select all the rows of the 5th column
array[:, 5]

array([ 1.65,  1.49,  0.39,  0.91, -0.7 , -0.81,  0.41,  0.03,  0.79,
       -1.2 ,  1.04,  1.66,  0.16, -1.08,  2.6 , -0.43, -0.33,  1.61,
       -1.1 ,  1.52, -0.21, -1.41, -0.14,  1.52,  0.07,  0.09, -0.98,
        0.2 , -0.6 , -0.1 ])

In [22]:
# Use slice notation to select a block of data
array[5:10, 2:5]

array([[ 0.69, -0.88,  0.28],
       [ 0.34, -0.01,  2.39],
       [-1.04,  1.74, -0.8 ],
       [ 1.5 ,  1.07, -0.77],
       [ 0.81,  0.05, -0.23]])

In [24]:
# start:stop:step notation
array[3:18:5, ::2]

array([[ 1.49,  1.18, -0.64, -1.43],
       [ 0.89,  1.5 , -0.77,  0.31],
       [-1.27,  1.18,  1.03, -1.36]])

## Operations on the entire array
Applying an operation to entire array is easy and looks exactly how it would in normal mathematical notation. These operations are not so trivial with python lists

In [25]:
# multiply each element by 5
array * 5

array([[ -5.45,   5.  ,   1.4 ,  -7.55,  -2.9 ,   8.25, -12.15],
       [ -2.15,   6.35,  -4.35,  -3.4 ,  -0.45,   7.45,  -3.2 ],
       [ -2.2 ,  -2.15,  11.05,  10.95,   5.  ,   1.95,   3.7 ],
       [  7.45,  -4.7 ,   5.9 ,  -6.25,  -3.2 ,   4.55,  -7.15],
       [ -0.7 ,  -4.3 ,  -1.3 , -14.  ,  -8.85,  -3.5 ,   4.65],
       [ -0.85,   0.  ,   3.45,  -4.4 ,   1.4 ,  -4.05,  -8.65],
       [ -1.95,   2.85,   1.7 ,  -0.05,  11.95,   2.05,   4.9 ],
       [ 11.2 ,  -6.45,  -5.2 ,   8.7 ,  -4.  ,   0.15,   5.35],
       [  4.45,   8.75,   7.5 ,   5.35,  -3.85,   3.95,   1.55],
       [ -6.65,   7.1 ,   4.05,   0.25,  -1.15,  -6.  ,   1.  ],
       [  2.35,  -4.15,   5.8 ,  -5.5 , -10.6 ,   5.2 ,  -2.  ],
       [ -0.65,  -4.2 ,  -8.05,   6.3 ,  -3.45,   8.3 ,   4.05],
       [ -1.55,  -5.45,  -3.65,  -6.05,  10.45,   0.8 ,   5.75],
       [ -6.35,   0.9 ,   5.9 ,  -1.7 ,   5.15,  -5.4 ,  -6.8 ],
       [  1.9 ,  -1.9 ,   3.2 ,  -9.9 ,   3.55,  13.  ,  -0.1 ],
       [  0.15,   0.9 ,  

In [31]:
# take 3
array - 3

array([[-4.09, -2.  , -2.72, -4.51, -3.58, -1.35, -5.43],
       [-3.43, -1.73, -3.87, -3.68, -3.09, -1.51, -3.64],
       [-3.44, -3.43, -0.79, -0.81, -2.  , -2.61, -2.26],
       [-1.51, -3.94, -1.82, -4.25, -3.64, -2.09, -4.43],
       [-3.14, -3.86, -3.26, -5.8 , -4.77, -3.7 , -2.07],
       [-3.17, -3.  , -2.31, -3.88, -2.72, -3.81, -4.73],
       [-3.39, -2.43, -2.66, -3.01, -0.61, -2.59, -2.02],
       [-0.76, -4.29, -4.04, -1.26, -3.8 , -2.97, -1.93],
       [-2.11, -1.25, -1.5 , -1.93, -3.77, -2.21, -2.69],
       [-4.33, -1.58, -2.19, -2.95, -3.23, -4.2 , -2.8 ],
       [-2.53, -3.83, -1.84, -4.1 , -5.12, -1.96, -3.4 ],
       [-3.13, -3.84, -4.61, -1.74, -3.69, -1.34, -2.19],
       [-3.31, -4.09, -3.73, -4.21, -0.91, -2.84, -1.85],
       [-4.27, -2.82, -1.82, -3.34, -1.97, -4.08, -4.36],
       [-2.62, -3.38, -2.36, -4.98, -2.29, -0.4 , -3.02],
       [-2.97, -2.82, -4.86, -2.57, -4.61, -3.43, -1.76],
       [-3.74, -2.5 , -1.99, -2.72, -4.37, -3.33, -1.04],
       [-5.03,

## Vectorized Operations
Like mentioned previously, NumPy is blazingly fast by Python standards. It is fast because it executes its code in pre-compiled C and Fortran that is highly optimized for scientific computing.

In [43]:
# grab the first row
row = array[:, 0]
some_list = list(row)

In [44]:
print([x + 1 for x in some_list])

[-0.09000000000000008, 0.57000000000000006, 0.56000000000000005, 2.4900000000000002, 0.85999999999999999, 0.82999999999999996, 0.60999999999999999, 3.2400000000000002, 1.8900000000000001, -0.33000000000000007, 1.47, 0.87, 0.68999999999999995, -0.27000000000000002, 1.3799999999999999, 1.03, 0.26000000000000001, -1.0299999999999998, 1.8100000000000001, 1.3200000000000001, 0.18000000000000005, 2.54, -0.020000000000000018, 0.81000000000000005, 1.0800000000000001, 2.4699999999999998, 2.3999999999999999, 1.3900000000000001, 0.72999999999999998, 1.6899999999999999]


In [45]:
row + 1

array([-0.09,  0.57,  0.56,  2.49,  0.86,  0.83,  0.61,  3.24,  1.89,
       -0.33,  1.47,  0.87,  0.69, -0.27,  1.38,  1.03,  0.26, -1.03,
        1.81,  1.32,  0.18,  2.54, -0.02,  0.81,  1.08,  2.47,  2.4 ,
        1.39,  0.73,  1.69])

In [46]:
%timeit [x + 1 for x in some_list]

8.53 µs ± 296 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [47]:
%timeit row + 1

1.17 µs ± 71.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


### Applying functions
Its easy to apply NumPy functions to all the values

In [50]:
# absolute value
np.abs(array)

array([[ 1.09,  1.  ,  0.28,  1.51,  0.58,  1.65,  2.43],
       [ 0.43,  1.27,  0.87,  0.68,  0.09,  1.49,  0.64],
       [ 0.44,  0.43,  2.21,  2.19,  1.  ,  0.39,  0.74],
       [ 1.49,  0.94,  1.18,  1.25,  0.64,  0.91,  1.43],
       [ 0.14,  0.86,  0.26,  2.8 ,  1.77,  0.7 ,  0.93],
       [ 0.17,  0.  ,  0.69,  0.88,  0.28,  0.81,  1.73],
       [ 0.39,  0.57,  0.34,  0.01,  2.39,  0.41,  0.98],
       [ 2.24,  1.29,  1.04,  1.74,  0.8 ,  0.03,  1.07],
       [ 0.89,  1.75,  1.5 ,  1.07,  0.77,  0.79,  0.31],
       [ 1.33,  1.42,  0.81,  0.05,  0.23,  1.2 ,  0.2 ],
       [ 0.47,  0.83,  1.16,  1.1 ,  2.12,  1.04,  0.4 ],
       [ 0.13,  0.84,  1.61,  1.26,  0.69,  1.66,  0.81],
       [ 0.31,  1.09,  0.73,  1.21,  2.09,  0.16,  1.15],
       [ 1.27,  0.18,  1.18,  0.34,  1.03,  1.08,  1.36],
       [ 0.38,  0.38,  0.64,  1.98,  0.71,  2.6 ,  0.02],
       [ 0.03,  0.18,  1.86,  0.43,  1.61,  0.43,  1.24],
       [ 0.74,  0.5 ,  1.01,  0.28,  1.37,  0.33,  1.96],
       [ 2.03,

In [52]:
np.sqrt(np.abs(array)).round(2)

array([[ 1.04,  1.  ,  0.53,  1.23,  0.76,  1.28,  1.56],
       [ 0.66,  1.13,  0.93,  0.82,  0.3 ,  1.22,  0.8 ],
       [ 0.66,  0.66,  1.49,  1.48,  1.  ,  0.62,  0.86],
       [ 1.22,  0.97,  1.09,  1.12,  0.8 ,  0.95,  1.2 ],
       [ 0.37,  0.93,  0.51,  1.67,  1.33,  0.84,  0.96],
       [ 0.41,  0.  ,  0.83,  0.94,  0.53,  0.9 ,  1.32],
       [ 0.62,  0.75,  0.58,  0.1 ,  1.55,  0.64,  0.99],
       [ 1.5 ,  1.14,  1.02,  1.32,  0.89,  0.17,  1.03],
       [ 0.94,  1.32,  1.22,  1.03,  0.88,  0.89,  0.56],
       [ 1.15,  1.19,  0.9 ,  0.22,  0.48,  1.1 ,  0.45],
       [ 0.69,  0.91,  1.08,  1.05,  1.46,  1.02,  0.63],
       [ 0.36,  0.92,  1.27,  1.12,  0.83,  1.29,  0.9 ],
       [ 0.56,  1.04,  0.85,  1.1 ,  1.45,  0.4 ,  1.07],
       [ 1.13,  0.42,  1.09,  0.58,  1.01,  1.04,  1.17],
       [ 0.62,  0.62,  0.8 ,  1.41,  0.84,  1.61,  0.14],
       [ 0.17,  0.42,  1.36,  0.66,  1.27,  0.66,  1.11],
       [ 0.86,  0.71,  1.  ,  0.53,  1.17,  0.57,  1.4 ],
       [ 1.42,

In [55]:
# sum all elements in the array
array.sum()

2.6600000000000001

In [56]:
# Same as function
np.sum(array)

2.6600000000000001

In [58]:
# sum across rows with axis parameter
array.sum(axis=1)

array([-2.68,  0.05,  5.66, -0.68, -5.6 , -2.62,  4.29,  1.95,  5.54,
       -0.28, -1.78,  0.46,  0.06, -1.66,  1.95, -2.02,  1.31, -0.65,
       -1.64,  3.83,  1.72, -1.32,  0.43,  0.83, -1.59,  2.47, -3.95,
       -2.42,  1.36, -0.36])

In [60]:
# sum down columns
array.sum(axis=0)

array([ 1.43,  1.12,  4.74, -8.15, -1.79,  7.05, -1.74])

In [61]:
# find max of each column
array.max(axis=0)

array([ 2.24,  1.75,  2.21,  2.19,  2.39,  2.6 ,  2.2 ])

## Comparison operators
The 6 comparison operators <, >, <=, >=, ==, != work on all elements of the array.

In [62]:
array > 0

array([[False,  True,  True, False, False,  True, False],
       [False,  True, False, False, False,  True, False],
       [False, False,  True,  True,  True,  True,  True],
       [ True, False,  True, False, False,  True, False],
       [False, False, False, False, False, False,  True],
       [False, False,  True, False,  True, False, False],
       [False,  True,  True, False,  True,  True,  True],
       [ True, False, False,  True, False,  True,  True],
       [ True,  True,  True,  True, False,  True,  True],
       [False,  True,  True,  True, False, False,  True],
       [ True, False,  True, False, False,  True, False],
       [False, False, False,  True, False,  True,  True],
       [False, False, False, False,  True,  True,  True],
       [False,  True,  True, False,  True, False, False],
       [ True, False,  True, False,  True,  True, False],
       [ True,  True, False,  True, False, False,  True],
       [False,  True,  True,  True, False, False,  True],
       [False,

In [63]:
# find out how many values are greater than 0
np.sum(array > 0)

104

In [65]:
# find percentage of values greater than 0
np.mean(array > 0)

0.49523809523809526

In [66]:
# find how many are between -2 and 2
(array > -2) & (array < 2)

array([[ True,  True,  True,  True,  True,  True, False],
       [ True,  True,  True,  True,  True,  True,  True],
       [ True,  True, False, False,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True, False,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True, False,  True,  True],
       [False,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True, False,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True, False,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True, False,  True],
       [ True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True],
       [False,

In [69]:
# this should be about 95%
((array > -2) & (array < 2)).mean()

0.93809523809523809

### Resources
+ [NumPy's own tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)
+ [Datacamp NumPy tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)