# Numpy

Numpy is a foundational package for data analysis in Python. It enables extremely efficient mathematical operations on large data sets. It also can be used to store and manipulate tabular data.

Refer to the excellent docs (with examples!) https://docs.scipy.org/doc/

## We'll cover

- creating and accessing arrays
- broadcasting
- some math examples
- tabular data
- views


## Importing

It is convention to import Numpy as np

In [1]:
import numpy as np

## Creating N-dimensional arrays using NumPy

Create 2X3 array initialized to all zeroes

In [2]:
a = np.zeros((2,3))
a

array([[0., 0., 0.],
       [0., 0., 0.]])

Create array initialized by list of lists

In [3]:
a = np.array([[0,1,2],[3,4,5]])
a

array([[0, 1, 2],
       [3, 4, 5]])

Create array using the `arange` and `reshape` functions

In [4]:
a = np.arange(6)
print(a)
a = np.reshape(a, (2,3))
print(a)

[0 1 2 3 4 5]
[[0 1 2]
 [3 4 5]]


## Get values from N-dimensional array


NumPy provides many ways to extract data from arrays

Get single element of 2D array

In [5]:
a[0,0]      # a scalar, not an array

0

Get first row of 2D array

In [6]:
a[0,:]      # 1D array

array([0, 1, 2])

Get last column of array

In [7]:
a[:,-1]     # 1D array

array([2, 5])

Get sub-matrix of 2D array

In [8]:
a[0:2,1:3]  # 2D array

array([[1, 2],
       [4, 5]])

## Modifying N-dimensional arrays

NumPy uses the same basic syntax for modifying arrays

Assign single value to single element of 2D array

In [9]:
a[0,0] = 25.0
a

array([[25,  1,  2],
       [ 3,  4,  5]])

Assign 1D array to first row of 2D array

In [10]:
a[0,:] = np.array([10,11,12])
a

array([[10, 11, 12],
       [ 3,  4,  5]])

Assign 1D array to last column of 2D array

In [11]:
a[:,-1] = np.array([20,21])
a

array([[10, 11, 20],
       [ 3,  4, 21]])

Assign 2D array to sub-matrix of 2D array

In [12]:
a[0:2,1:3] = np.array([[10,11],[20,21]])
a

array([[10, 10, 11],
       [ 3, 20, 21]])

## Modifying arrays using broadcasting



Assign scalar to first row of 2D array

In [13]:
a[0,:] = 10.0
a

array([[10, 10, 10],
       [ 3, 20, 21]])

Assign 1D array to all rows of 2D array

In [14]:
a[:,:] = np.array([30,31,32])
a

array([[30, 31, 32],
       [30, 31, 32]])

Assign 1D array to all columns of 2D array

In [15]:
tmp = np.array([40,41]).reshape(2,1)
tmp

array([[40],
       [41]])

In [16]:
a[:,:] = tmp
a

array([[40, 40, 40],
       [41, 41, 41]])

Assign scalar to sub-matrix of 2D array

In [17]:
a[0:2,1:3] = 100.0
a

array([[ 40, 100, 100],
       [ 41, 100, 100]])

## Math on Arrays

Operate on arrays using binary operators and NumPy functions

Create 1D array

In [18]:
a = np.arange(4)
a

array([0, 1, 2, 3])

Add 1D arrays elementwise

In [19]:
a + a

array([0, 2, 4, 6])

Multiply 1D arrays elementwise

In [20]:
a * a

array([0, 1, 4, 9])

Multiply by a scalar

In [21]:
3 * a

array([0, 3, 6, 9])

Compare to lists:

In [22]:
my_list = [0., 1., 2.]

[3*x for x in my_list]

[0.0, 3.0, 6.0]

Sum elements of 1D array

In [23]:
a.sum()

6

Compute dot product

In [24]:
np.dot(a, a)

14

## Linear Algebra

In [25]:
from numpy import linalg
a = np.array([[1, 2], [3, 4]], dtype=np.float64)

Compute the inverse matrix

In [26]:
linalg.inv(a)

array([[-2. ,  1. ],
       [ 1.5, -0.5]])

Compute singular value decomposition

In [27]:
u, s, vh = linalg.svd(a)
print('u: ', u, '\ns: ', s, '\nvh: ', vh)

u:  [[-0.40455358 -0.9145143 ]
 [-0.9145143   0.40455358]] 
s:  [5.4649857  0.36596619] 
vh:  [[-0.57604844 -0.81741556]
 [ 0.81741556 -0.57604844]]


Compute eigenvalues

In [28]:
linalg.eigvals(a)

array([-0.37228132,  5.37228132])

## Numpy with Tabular Data

In [29]:
solar_system_file = 'solar_system_abbr.csv' 
solar_system_data = np.genfromtxt(solar_system_file, delimiter=',',
                                  skip_header=1,
                                  dtype=['S10', 'S20', 'i4', 'f4' , 'f4'])

In [30]:
solar_system_data

array([(b'Sun', b'Star',  0, 1.3920e+06, 3.33e+05),
       (b'Mercury', b'Terrestrial planet',  1, 4.8780e+03, 5.50e-02),
       (b'Venus', b'Terrestrial planet',  2, 1.2104e+04, 8.15e-01),
       (b'Earth', b'Terrestrial planet',  3, 1.2756e+04, 1.00e+00),
       (b'Mars', b'Terrestrial planet',  4, 6.7870e+03, 1.07e-01),
       (b'Jupiter', b'Gas giant',  6, 1.4280e+05, 3.18e+02),
       (b'Saturn', b'Gas giant',  7, 1.2000e+05, 9.50e+01),
       (b'Uranus', b'Gas giant',  8, 5.1118e+04, 1.50e+01),
       (b'Neptune', b'Gas giant',  9, 4.9528e+04, 1.70e+01),
       (b'Ceres', b'Dwarf planet',  5, 9.7460e+02, 1.60e-04),
       (b'Pluto', b'Dwarf planet', 10, 2.3000e+03, 2.00e-03),
       (b'Haumea', b'Dwarf planet', 11, 1.3000e+03, 7.00e-04),
       (b'Makemake', b'Dwarf planet', 12, 1.4200e+03, 6.70e-04),
       (b'Eris', b'Dwarf planet', 13, 2.3260e+03, 2.80e-03)],
      dtype=[('f0', 'S10'), ('f1', 'S20'), ('f2', '<i4'), ('f3', '<f4'), ('f4', '<f4')])

You can name the columns, which makes this a 'record array'

In [31]:
solar_system_data.dtype.names = ('planet', 'type', 'order', 'diameter', 'mass')
solar_system_data

array([(b'Sun', b'Star',  0, 1.3920e+06, 3.33e+05),
       (b'Mercury', b'Terrestrial planet',  1, 4.8780e+03, 5.50e-02),
       (b'Venus', b'Terrestrial planet',  2, 1.2104e+04, 8.15e-01),
       (b'Earth', b'Terrestrial planet',  3, 1.2756e+04, 1.00e+00),
       (b'Mars', b'Terrestrial planet',  4, 6.7870e+03, 1.07e-01),
       (b'Jupiter', b'Gas giant',  6, 1.4280e+05, 3.18e+02),
       (b'Saturn', b'Gas giant',  7, 1.2000e+05, 9.50e+01),
       (b'Uranus', b'Gas giant',  8, 5.1118e+04, 1.50e+01),
       (b'Neptune', b'Gas giant',  9, 4.9528e+04, 1.70e+01),
       (b'Ceres', b'Dwarf planet',  5, 9.7460e+02, 1.60e-04),
       (b'Pluto', b'Dwarf planet', 10, 2.3000e+03, 2.00e-03),
       (b'Haumea', b'Dwarf planet', 11, 1.3000e+03, 7.00e-04),
       (b'Makemake', b'Dwarf planet', 12, 1.4200e+03, 6.70e-04),
       (b'Eris', b'Dwarf planet', 13, 2.3260e+03, 2.80e-03)],
      dtype=[('planet', 'S10'), ('type', 'S20'), ('order', '<i4'), ('diameter', '<f4'), ('mass', '<f4')])

In [32]:
solar_system_data['mass'].mean()

23817.64

In [33]:
import math
volume_of_earth = 4/3*math.pi*(solar_system_data['diameter'][3]/2.)**3.
volume_of_earth

1086781292542.8892

## Numpy Views

- arrays that share memory with another array.
- can make your program more memory and CPU efficient
- explicitly generated via the view method
- reshape, transpose, slicing implicitly return views of the original

Tips:

* use the copy method to avoid sharing memory
* set the writeable flag to make a view read-only (`a.flags.writeable`)

In [34]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [35]:
b = a[2::2]
b

array([2, 4, 6, 8])

 Can check if a variable is a view with .flags.owndata

In [36]:
a.flags.owndata, b.flags.owndata

(True, False)

If you update an element in a view, it will also update it in the original array



In [37]:
b[0] = 100
a
# array([  0,   1, 100,   3,   4,   5,   6,   7,   8,   9])

array([  0,   1, 100,   3,   4,   5,   6,   7,   8,   9])

If you copy a view (or array), it will create an independent array that owns its own data.

In [38]:
d = a.copy()
d is a

False