# Python for Data Analysis

Python is more of a general purpose programming language than R or Matlab. It has gradually become more popular for data analysis and scientific computing, but additional modules are needed. Some of the more popular modules are:

* **NumPy** - N-dimensional array
* **SciPy** - Scientific computing (linear algebra, numerical integration, optimization, etc)
* **Matplotlib** - 2D Plotting (similar to Matlab)
* **IPython** - Enhanced Interactive Console
* **Sympy** - Symbolic mathematics
* **Pandas** - Data analysis (provides a data frame structure similar to R)

NumPy, SciPy and Matplotlib are used in this presentation.

## Import numpy module

In [None]:
import numpy as np

## Creating N-dimensional arrays using NumPy

There are many ways to create N-dimensional arrays

Create 2X3 double precision array initialized to all zeroes

In [None]:
a = np.zeros((2,3), dtype=np.float64)
a

Create array initialized by list of lists

In [None]:
a = np.array([[0,1,2],[3,4,5]], dtype=np.float64)
a

Create array using `arange` function

In [None]:
a = np.arange(6, dtype=np.float64).reshape(2,3)
a

## Get values from N-dimensional array


NumPy provides many ways to extract data from arrays

Get single element of 2D array

In [None]:
a[0,0]      # a scalar, not an array

Get first row of 2D array

In [None]:
a[0,:]      # 1D array

Get last column of array

In [None]:
a[:,-1]     # 1D array

Get sub-matrix of 2D array

In [None]:
a[0:2,1:3]  # 2D array

## Modifying N-dimensional arrays

NumPy uses the same basic syntax for modifying arrays

In [None]:
a

Assign single value to single element of 2D array

In [None]:
a[0,0] = 25.0
a

Assign 1D array to first row of 2D array

In [None]:
a[0,:] = np.array([10,11,12], dtype=np.float64)
a

Assign 1D array to last column of 2D array

In [None]:
a[:,-1] = np.array([20,21], dtype=np.float64)
a

Assign 2D array to sub-matrix of 2D array

In [None]:
a[0:2,1:3] = np.array([[10,11],[20,21]], dtype=np.float64)
a

## Modifying arrays using broadcasting



Assign scalar to first row of 2D array

In [None]:
a[0,:] = 10.0
a

Assign 1D array to all rows of 2D array

In [None]:
a[:,:] = np.array([30,31,32], dtype=np.float64)
a

Assign 1D array to all columns of 2D array

In [None]:
tmp = np.array([40,41], dtype=np.float64).reshape(2,1)
tmp

In [None]:
a[:,:] = tmp
a

Assign scalar to sub-matrix of 2D array

In [None]:
a[0:2,1:3] = 100.0
a

## Arithmetic on Arrays

Operate on arrays using binary operators and NumPy functions

Create 1D array

In [None]:
a = np.arange(4, dtype=np.float64)
a

Add 1D arrays elementwise

In [None]:
a + a

Multiply 1D arrays elementwise

In [None]:
a * a

Sum elements of 1D array

In [None]:
a.sum()

Compute dot product

In [None]:
np.dot(a, a)

Alternative dot product

In [None]:
(a * a).sum()

Compute cross product

In [None]:
np.dot(a.reshape(4,1), a.reshape(1,4))

## NumPy Views

* Views are arrays that share memory with another array.
* views can make your program more memory and CPU efficient
* views are explicitly generated via the view method
* reshape and transpose implicitly return views of the original array arrays generated by slicing are views of the original
* use the copy method to avoid sharing memory
* set the writeable flag to make a view read-only (a.flags.writeable)

In [None]:
a = np.arange(10)
a

In [None]:
b = a[2::2]
b

 Can check if a variable is a view with .flags.owndata

In [None]:
a.flags.owndata, b.flags.owndata

If you update an element in a view, it will also update it in the original array



In [None]:
b[0] = 100
a
# array([  0,   1, 100,   3,   4,   5,   6,   7,   8,   9])

In [None]:
a.__array_interface__['data'][0]
#4330625024

In [None]:
b.__array_interface__['data'][0]
#4330625024

If you copy a view (or array), it will create an independent array that owns its own data.

In [None]:
c = b.copy()
c.flags.owndata, c.__array_interface__['data'][0]
#(True, 4301585776)

In [None]:
d = a
d is a
#True

## SciPy Linear Algebra

In [None]:
from scipy import linalg
a = np.array([[1, 2], [3, 4]], dtype=np.float64)

Compute the inverse matrix

In [None]:
linalg.inv(a)

Compute singular value decomposition

In [None]:
linalg.svd(a)

Compute eigenvalues

In [None]:
linalg.eigvals(a)

## 2D plotting using Matplotlib

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
x = np.linspace(0.0, 2.0, 20)

Example: red circles

In [None]:
plt.plot(x, np.sqrt(x), 'ro')
plt.show()

Example: Blue line

In [None]:
plt.plot(x, np.sqrt(x), 'b-')
plt.show()

Three lines on the same figure

In [None]:
plt.plot(x, x, 'g--') # green dashed line
plt.plot(x, np.sqrt(x), 'ro') # red circles
plt.plot(x, np.sqrt(x), 'b-') # blue line
plt.show()

In [None]:
import pandas as pd
names = ['Frank', 'Joe', 'Chet', 'Biff']
values = [152, 137, 140, 162]
d = list(zip(names, values))
d

In [None]:
# Convert the list of tuples into a Data Frame
df = pd.DataFrame(data=d, columns=['Names', 'Values'])
df

In [None]:
# Plot the "Values" column
df['Values'].plot()
plt.show()