# Numpy

Numpy is a foundational package for data analysis in Python. It enables extremely efficient mathematical operations on large data sets. It also can be used to store and manipulate tabular data.

## Import Modules

It is convention to import Numpy as np

In [1]:
import numpy as np

## Creating N-dimensional arrays using NumPy

There are many ways to create N-dimensional arrays.

Create 2X3 array initialized to all zeroes

In [2]:
a = np.zeros((2,3))
a

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

Create array initialized by list of lists

In [6]:
a = np.array([[0,1,2],[3,4,5]])
a

array([[0, 1, 2],
       [3, 4, 5]])

Create array using `arange` function

In [4]:
a = np.arange(6)
a

array([0, 1, 2, 3, 4, 5])

## Get values from N-dimensional array


NumPy provides many ways to extract data from arrays

Get single element of 2D array

In [7]:
a[0,0]      # a scalar, not an array

0

Get first row of 2D array

In [8]:
a[0,:]      # 1D array

array([0, 1, 2])

Get last column of array

In [9]:
a[:,-1]     # 1D array

array([2, 5])

Get sub-matrix of 2D array

In [10]:
a[0:2,1:3]  # 2D array

array([[1, 2],
       [4, 5]])

## Modifying N-dimensional arrays

NumPy uses the same basic syntax for modifying arrays

In [11]:
a

array([[0, 1, 2],
       [3, 4, 5]])

Assign single value to single element of 2D array

In [12]:
a[0,0] = 25.0
a

array([[25,  1,  2],
       [ 3,  4,  5]])

Assign 1D array to first row of 2D array

In [13]:
a[0,:] = np.array([10,11,12])
a

array([[10, 11, 12],
       [ 3,  4,  5]])

Assign 1D array to last column of 2D array

In [14]:
a[:,-1] = np.array([20,21])
a

array([[10, 11, 20],
       [ 3,  4, 21]])

Assign 2D array to sub-matrix of 2D array

In [15]:
a[0:2,1:3] = np.array([[10,11],[20,21]])
a

array([[10, 10, 11],
       [ 3, 20, 21]])

## Modifying arrays using broadcasting



Assign scalar to first row of 2D array

In [16]:
a[0,:] = 10.0
a

array([[10, 10, 10],
       [ 3, 20, 21]])

Assign 1D array to all rows of 2D array

In [17]:
a[:,:] = np.array([30,31,32])
a

array([[30, 31, 32],
       [30, 31, 32]])

Assign 1D array to all columns of 2D array

In [18]:
tmp = np.array([40,41]).reshape(2,1)
tmp

array([[40],
       [41]])

In [19]:
a[:,:] = tmp
a

array([[40, 40, 40],
       [41, 41, 41]])

Assign scalar to sub-matrix of 2D array

In [20]:
a[0:2,1:3] = 100.0
a

array([[ 40, 100, 100],
       [ 41, 100, 100]])

## Math on Arrays

Operate on arrays using binary operators and NumPy functions

Create 1D array

In [21]:
a = np.arange(4)
a

array([0, 1, 2, 3])

Add 1D arrays elementwise

In [22]:
a + a

array([0, 2, 4, 6])

Multiply 1D arrays elementwise

In [23]:
a * a

array([0, 1, 4, 9])

Multiple by a scalar

In [24]:
3 * a

array([0, 3, 6, 9])

Compare to lists:

In [25]:
my_list = [0., 1., 2.]

[3*x for x in my_list]

[0.0, 3.0, 6.0]

Sum elements of 1D array

In [29]:
a.max()

3

Compute dot product

In [30]:
np.dot(a, a)

14

Compute cross product

In [31]:
np.dot(a.reshape(4,1), a.reshape(1,4))

array([[0, 0, 0, 0],
       [0, 1, 2, 3],
       [0, 2, 4, 6],
       [0, 3, 6, 9]])

## Numpy with Tabular Data

In [32]:
solar_system_file = 'solar_system_abbr.csv' 
solar_system_data = np.genfromtxt(solar_system_file, delimiter=',', skip_header=1,
                                  dtype=['S10', 'S20', 'i4', 'f4' , 'f4'])

In [34]:
solar_system_data

array([('Sun', 'Star',  0,   1.39200000e+06,   3.33000000e+05),
       ('Mercury', 'Terrestrial planet',  1,   4.87800000e+03,   5.49999997e-02),
       ('Venus', 'Terrestrial planet',  2,   1.21040000e+04,   8.14999998e-01),
       ('Earth', 'Terrestrial planet',  3,   1.27560000e+04,   1.00000000e+00),
       ('Mars', 'Terrestrial planet',  4,   6.78700000e+03,   1.07000001e-01),
       ('Jupiter', 'Gas giant',  6,   1.42800000e+05,   3.18000000e+02),
       ('Saturn', 'Gas giant',  7,   1.20000000e+05,   9.50000000e+01),
       ('Uranus', 'Gas giant',  8,   5.11180000e+04,   1.50000000e+01),
       ('Neptune', 'Gas giant',  9,   4.95280000e+04,   1.70000000e+01),
       ('Ceres', 'Dwarf planet',  5,   9.74599976e+02,   1.59999996e-04),
       ('Pluto', 'Dwarf planet', 10,   2.30000000e+03,   2.00000009e-03),
       ('Haumea', 'Dwarf planet', 11,   1.30000000e+03,   6.99999975e-04),
       ('Makemake', 'Dwarf planet', 12,   1.42000000e+03,   6.69999979e-04),
       ('Eris', 'Dwarf pl

You can name the columns, which makes this a 'record array'

In [35]:
solar_system_data.dtype.names = ('planet', 'type', 'order', 'diameter', 'mass')
solar_system_data

array([('Sun', 'Star',  0,   1.39200000e+06,   3.33000000e+05),
       ('Mercury', 'Terrestrial planet',  1,   4.87800000e+03,   5.49999997e-02),
       ('Venus', 'Terrestrial planet',  2,   1.21040000e+04,   8.14999998e-01),
       ('Earth', 'Terrestrial planet',  3,   1.27560000e+04,   1.00000000e+00),
       ('Mars', 'Terrestrial planet',  4,   6.78700000e+03,   1.07000001e-01),
       ('Jupiter', 'Gas giant',  6,   1.42800000e+05,   3.18000000e+02),
       ('Saturn', 'Gas giant',  7,   1.20000000e+05,   9.50000000e+01),
       ('Uranus', 'Gas giant',  8,   5.11180000e+04,   1.50000000e+01),
       ('Neptune', 'Gas giant',  9,   4.95280000e+04,   1.70000000e+01),
       ('Ceres', 'Dwarf planet',  5,   9.74599976e+02,   1.59999996e-04),
       ('Pluto', 'Dwarf planet', 10,   2.30000000e+03,   2.00000009e-03),
       ('Haumea', 'Dwarf planet', 11,   1.30000000e+03,   6.99999975e-04),
       ('Makemake', 'Dwarf planet', 12,   1.42000000e+03,   6.69999979e-04),
       ('Eris', 'Dwarf pl

In [36]:
solar_system_data['mass'].mean()

23817.641

In [37]:
import math
volume_of_earth = 4/3*math.pi*(solar_system_data['diameter'][3]/2.)**3.
volume_of_earth

815085969407.16687

## Linear Algebra

In [38]:
from numpy import linalg
a = np.array([[1, 2], [3, 4]], dtype=np.float64)

Compute the inverse matrix

In [39]:
linalg.inv(a)

array([[-2. ,  1. ],
       [ 1.5, -0.5]])

Compute singular value decomposition

In [40]:
linalg.svd(a)

(array([[-0.40455358, -0.9145143 ],
        [-0.9145143 ,  0.40455358]]),
 array([ 5.4649857 ,  0.36596619]),
 array([[-0.57604844, -0.81741556],
        [ 0.81741556, -0.57604844]]))

Compute eigenvalues

In [41]:
linalg.eigvals(a)

array([-0.37228132,  5.37228132])

## NumPy Views

* Views are arrays that share memory with another array.
* views can make your program more memory and CPU efficient
* views are explicitly generated via the view method
* reshape and transpose implicitly return views of the original array arrays generated by slicing are views of the original
* use the copy method to avoid sharing memory
* set the writeable flag to make a view read-only (a.flags.writeable)

In [42]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [44]:
b = a[2::2]
b

array([2, 4, 6, 8])

 Can check if a variable is a view with .flags.owndata

In [46]:
a.flags.owndata, b.flags.owndata

(True, False)

If you update an element in a view, it will also update it in the original array



In [47]:
b[0] = 100
a
# array([  0,   1, 100,   3,   4,   5,   6,   7,   8,   9])

array([  0,   1, 100,   3,   4,   5,   6,   7,   8,   9])

If you copy a view (or array), it will create an independent array that owns its own data.

In [48]:
d = a.copy()
d is a

False