# Introduction to Numpy

Lesson Goals

    Learn about Numpy data structures.
    Extract data from Numpy arrays.
    Convert other Python data structures to Numpy arrays.
    Perform basic mathematical functions on arrays and elements.

Introduction

Many of the libraries you will use to perform data analysis in Python, as well as many of the mathematical functions you'll use, will involve working with Numpy. Numpy (short for Numerical Python) is used for numeric computing and includes support for multi-dimensional arrays and matrices along with a variety of mathematical functions to apply to them. In this lesson, we will learn about Numpy's primary data structures and how to apply some basic math functions to them.
Importing Numpy

In order to use Numpy, you must first import it. It is common to also alias it to np using the as keyword so that you don't have to spell out "numpy" every time you want to call one of its methods. 

In [2]:
import numpy as np

Once the library has been imported, it is ready to use.
Numpy Arrays

The basic data structures in Numpy are arrays, which can be used to represent tabular data. You can think of arrays as lists of lists, where all the elements of a list are of the same type (typically numeric since the reason you use Numpy is to do numeric computing). A matrix is just a two-dimensional array.

The size of an array is the total number of elements in every list. The shape of an array is the size of the array along each dimension (e.g. number of rows and number of columns for a two-dimensional array). Let's create a two dimensional 10 x 4 array containing random numbers and calculate the shape and size of the array using the shape and size methods. 

In [3]:
a = np.random.random((10,4))
print(a)

[[0.35605017 0.38524524 0.47072    0.49673435]
 [0.81828885 0.13301275 0.26637166 0.56445544]
 [0.02512704 0.64748827 0.29202    0.49860129]
 [0.09972849 0.43157566 0.76987646 0.7441905 ]
 [0.73459093 0.56794237 0.79688258 0.75622804]
 [0.7857226  0.6559837  0.49756514 0.54911577]
 [0.68263944 0.04843655 0.32599514 0.31388399]
 [0.46836477 0.03065619 0.1130949  0.28266974]
 [0.39260549 0.86730788 0.82815227 0.70594094]
 [0.03339563 0.55960254 0.0830973  0.5396749 ]]


In [4]:
print(a.shape)
print(a.size)

(10, 4)
40


As you can see, the array has a shape of 10 x 4 (just as we specified) and the total number of elements in the array is 40.

Now that we have seen an example of a basic two-dimensional array (a matrix), let's learn about how creating arrays with more dimensions than two works in Numpy. Let's build a three-dimensional array of random numbers and see what that looks like.

In [5]:
b = np.random.random((5,2,3))
print(b)

[[[0.73971971 0.01026711 0.49900375]
  [0.03947863 0.03218962 0.51408865]]

 [[0.20802716 0.28796361 0.84686546]
  [0.19614132 0.92077357 0.76703768]]

 [[0.36593393 0.30031274 0.98408585]
  [0.02032803 0.19198159 0.46341594]]

 [[0.12761237 0.08935497 0.64080342]
  [0.53055958 0.69374832 0.74316982]]

 [[0.47244215 0.41407323 0.07888855]
  [0.0900746  0.02147194 0.74926884]]]


This created an array with five groups of 2 x 3 matrices. Let's see what happens if we pass four dimensions.

In [6]:
c = np.random.random((2,3,4,5))
print(c)

[[[[0.68154854 0.18042147 0.40349528 0.76344208 0.6698664 ]
   [0.18715723 0.61454416 0.95597369 0.47432605 0.49271977]
   [0.08269307 0.54500014 0.35657123 0.91959349 0.63520896]
   [0.11937398 0.85895277 0.24254396 0.59102125 0.74851417]]

  [[0.7984734  0.27308116 0.81641581 0.5989713  0.99443571]
   [0.36376529 0.60895682 0.60959037 0.01914435 0.64946944]
   [0.20080742 0.16320944 0.51216232 0.26830048 0.52912079]
   [0.50165579 0.38505582 0.57508256 0.14364222 0.86433787]]

  [[0.4031412  0.48473825 0.18072352 0.02803713 0.27292662]
   [0.04123018 0.02702388 0.7666448  0.47056765 0.14296077]
   [0.19457347 0.63396627 0.36959045 0.98027551 0.41443794]
   [0.84084184 0.34260117 0.17393249 0.86259695 0.5532183 ]]]


 [[[0.95624444 0.07218699 0.40424841 0.33295088 0.44720556]
   [0.91008062 0.22556408 0.38317468 0.94615705 0.1711044 ]
   [0.88031367 0.80616281 0.17462107 0.87927019 0.68453916]
   [0.82868062 0.25749547 0.66842532 0.18192569 0.2299615 ]]

  [[0.85875292 0.57819435 0.34

This time, we got two groups of three 4 x 5 matrices.


# Extracting Data from Arrays

Extracting elements from arrays works just like it does for other Python data structures. We just need to reference the indexes of the values we want to extract. Below are some examples of how to reference specific rows, columns, and values in a two dimensional array.

In [7]:
# First row of matrix a
print(a)
print(a[0])

[[0.35605017 0.38524524 0.47072    0.49673435]
 [0.81828885 0.13301275 0.26637166 0.56445544]
 [0.02512704 0.64748827 0.29202    0.49860129]
 [0.09972849 0.43157566 0.76987646 0.7441905 ]
 [0.73459093 0.56794237 0.79688258 0.75622804]
 [0.7857226  0.6559837  0.49756514 0.54911577]
 [0.68263944 0.04843655 0.32599514 0.31388399]
 [0.46836477 0.03065619 0.1130949  0.28266974]
 [0.39260549 0.86730788 0.82815227 0.70594094]
 [0.03339563 0.55960254 0.0830973  0.5396749 ]]
[0.35605017 0.38524524 0.47072    0.49673435]


In [8]:
# First column of matrix a
print(a[:,0])

[0.35605017 0.81828885 0.02512704 0.09972849 0.73459093 0.7857226
 0.68263944 0.46836477 0.39260549 0.03339563]


In [10]:
# Value in the fifth row and third column of matrix a
print(a[4,2])

0.7968825805292168


What about arrays that have more than two dimensions? You just pass a list of indexes for the values you want, and it will return the corresponding dimensions or values.

In [11]:
# First group of array c
print(c[0])

[[[0.68154854 0.18042147 0.40349528 0.76344208 0.6698664 ]
  [0.18715723 0.61454416 0.95597369 0.47432605 0.49271977]
  [0.08269307 0.54500014 0.35657123 0.91959349 0.63520896]
  [0.11937398 0.85895277 0.24254396 0.59102125 0.74851417]]

 [[0.7984734  0.27308116 0.81641581 0.5989713  0.99443571]
  [0.36376529 0.60895682 0.60959037 0.01914435 0.64946944]
  [0.20080742 0.16320944 0.51216232 0.26830048 0.52912079]
  [0.50165579 0.38505582 0.57508256 0.14364222 0.86433787]]

 [[0.4031412  0.48473825 0.18072352 0.02803713 0.27292662]
  [0.04123018 0.02702388 0.7666448  0.47056765 0.14296077]
  [0.19457347 0.63396627 0.36959045 0.98027551 0.41443794]
  [0.84084184 0.34260117 0.17393249 0.86259695 0.5532183 ]]]


In [16]:
# Second subgroup of the first group
print(c[0,1])

[[0.7984734  0.27308116 0.81641581 0.5989713  0.99443571]
 [0.36376529 0.60895682 0.60959037 0.01914435 0.64946944]
 [0.20080742 0.16320944 0.51216232 0.26830048 0.52912079]
 [0.50165579 0.38505582 0.57508256 0.14364222 0.86433787]]


In [18]:
# Third row of the second subgroup
print(c[0,1,2])

[0.20080742 0.16320944 0.51216232 0.26830048 0.52912079]


In [19]:
# Fourth column of the second subgroup
print(c[0,1,:,3])

[0.5989713  0.01914435 0.26830048 0.14364222]


In [21]:
# Value in the third row and fourth column of the second subgroup
print(c[0,1,2,3])

0.26830047882736874


# Converting Other Data Structures to Arrays

If you have data in another type of data structure and you would like to convert it to an array so that you can take advantage of Numpy's mathematical functions, you can convert them using the array() method as follows. 

In [22]:
lst_lst = [[1,2,3],[4,5,6],[7,8,9]]
d = np.array(lst_lst)
print(d)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


This works the same way whether you have a list of lists, a list of tuples, a tuple of lists, or a tuple of tuples.




# Numpy Math Functions

Now that we know how to create and navigate arrays, let's take a look at how to perform mathematical calculations on them.

One of the most common (and useful) functions is np.sum, which lets you obtain the sum of any elements you select from an array. 

In [23]:
# Sum of all elements in matrix a
print(np.sum(a))

18.619034960036906


In [25]:
# Sum of each column in matrix a
print(np.sum(a, axis=0))

[4.3965134  4.32725115 4.44377544 5.45149497]


In [26]:
# Sum of each row in matrix a
print(np.sum(a, axis=1))

[1.70874976 1.7821287  1.4632366  2.04537112 2.85564392 2.48838721
 1.37095511 0.89478559 2.79400657 1.21577037]


In [27]:
# Sum of all the elements in the first two groups of array b
np.sum(b[:2])

5.061556275675805

The np.mean function works the same way and is also very useful. 

In [28]:
# Mean of all elements in matrix a
print(np.mean(a))

0.46547587400092266


In [29]:
# Mean of each column in matrix a
print(np.mean(a, axis=0))

[0.43965134 0.43272511 0.44437754 0.5451495 ]


In [30]:
# Mean of each row in matrix a
print(np.mean(a, axis=1))

[0.42718744 0.44553218 0.36580915 0.51134278 0.71391098 0.6220968
 0.34273878 0.2236964  0.69850164 0.30394259]


In [31]:
# Mean of all the elements in the first two groups of array b
np.mean(b[:2])

0.4217963563063171

In addition to letting you perform calculations on individual arrays, Numpy also lets you perform calculations between arrays. For example, let's select two of the subarrays from array c to illustrate how this works.

In [32]:
x = c[0,0]
print(x)

[[0.68154854 0.18042147 0.40349528 0.76344208 0.6698664 ]
 [0.18715723 0.61454416 0.95597369 0.47432605 0.49271977]
 [0.08269307 0.54500014 0.35657123 0.91959349 0.63520896]
 [0.11937398 0.85895277 0.24254396 0.59102125 0.74851417]]


In [33]:
y = c[0,1]
print(y)

[[0.7984734  0.27308116 0.81641581 0.5989713  0.99443571]
 [0.36376529 0.60895682 0.60959037 0.01914435 0.64946944]
 [0.20080742 0.16320944 0.51216232 0.26830048 0.52912079]
 [0.50165579 0.38505582 0.57508256 0.14364222 0.86433787]]


We can now add, subtract, multiply, and divide the two arrays.

In [34]:
# Add elements of x and y together
print(np.add(x, y))

[[1.48002194 0.45350263 1.21991108 1.36241337 1.6643021 ]
 [0.55092253 1.22350098 1.56556406 0.49347041 1.14218921]
 [0.28350048 0.70820958 0.86873355 1.18789397 1.16432976]
 [0.62102977 1.24400859 0.81762651 0.73466347 1.61285204]]


In [35]:
# Subtract elements of x from elements of y
print(np.subtract(y, x))

[[ 0.11692485  0.09265968  0.41292053 -0.16447078  0.32456931]
 [ 0.17660806 -0.00558734 -0.34638332 -0.4551817   0.15674967]
 [ 0.11811435 -0.38179071  0.15559109 -0.65129301 -0.10608817]
 [ 0.38228181 -0.47389695  0.3325386  -0.44737903  0.1158237 ]]


In [36]:
# Multiply elements of x and y together
print(np.multiply(x, y))

[[0.54419838 0.0492697  0.32941992 0.45727989 0.66613906]
 [0.06808131 0.37423085 0.58275235 0.00908067 0.32000643]
 [0.01660538 0.08894917 0.18262235 0.24672737 0.33610227]
 [0.05988465 0.33074476 0.1394828  0.0848956  0.64696914]]


In [37]:
# Divide elements of y by elements of x
print(np.divide(y, x))

[[1.17155763 1.51357348 2.02335902 0.78456679 1.48452843]
 [1.94363468 0.99090815 0.63766438 0.04036117 1.31813147]
 [2.42834655 0.29946678 1.43635346 0.29175987 0.83298697]
 [4.20238812 0.4482852  2.37104469 0.2430407  1.15473815]]


This is only the tip of the iceberg. Numpy has many more functions, which you can and should explore. You can read more about them in the [Numpy documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.math.html). 