# Numpy

Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. Numpy arrays are implemented in C language, and are of great importance when manipulating and storing data for machine learning applications. As that data is often huge, Python's List data structure proves to be very slow and Numpy's arrays are used instead. 

In [1]:
import numpy as np

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

In [2]:
a = np.array([1, 2, 3])  # Create a rank 1 array
print ("Array A:", type(a), a.shape, a[0], a[1], a[2])
a[0] = 5                 # Change an element of the array
print (a)

('Array A:', <type 'numpy.ndarray'>, (3,), 1, 2, 3)
[5 2 3]


In [5]:
a = np.array([1,2,3])
print a.shape

(3,)


In [10]:
b = np.array([[1,2,3],[4,5,6]])   # Create a rank 2 array
print (b)
print "Dimension",b.ndim

[[1 2 3]
 [4 5 6]]
Dimension 2


In [4]:
print (b.shape)                   
print (b[0, 0], b[0, 1], b[1, 0])

(2, 3)
(1, 2, 4)


In [9]:
a = np.zeros((4,5))  # Create an array of all zeros
print (a)

[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]


In [12]:
b = np.ones((1,2))   # Create an array of all ones
print (b)

[[ 1.  1.]]


In [14]:
c = np.full((5,2), -2) # Create a constant array
print (c)

[[-2. -2.]
 [-2. -2.]
 [-2. -2.]
 [-2. -2.]
 [-2. -2.]]




In [15]:
d = np.eye(3)        # Create a 2x2 identity matrix
print (d)

[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]


In [18]:
e = np.random.random((3,4)) # Create an array filled with random values
print (e)

[[ 0.17947967  0.5628262   0.88751221  0.13090677]
 [ 0.47613824  0.04504748  0.97379502  0.25251025]
 [ 0.42275324  0.18333628  0.08229757  0.65811226]]


## Array Indexing

Numpy offers several ways to index into arrays.
Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

In [28]:
import numpy as np

# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 2:]
print (b)

[[3 4]
 [7 8]]


A slice of an array is a view into the same data, so modifying it will modify the original array.


In [11]:
print (a[0, 1])
b[0, 0] = 77    # b[0, 0] is the same piece of data as a[0, 1]
print (a[0, 1])

2
77


In [12]:
# Create the following rank 2 array with shape (3, 4)
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print (a)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [31]:
x = np.random.random((5,3))
print 'Original Array:\n',x,'\n'
print 'Sliced One:\n',x[:,:2]

Original Array:
[[ 0.9496292   0.02959224  0.32854968]
 [ 0.2479264   0.36454872  0.27739179]
 [ 0.00734668  0.44156074  0.14687968]
 [ 0.00262466  0.90432667  0.9085696 ]
 [ 0.83723671  0.90788389  0.95142256]] 

Sliced One:
[[ 0.9496292   0.02959224]
 [ 0.2479264   0.36454872]
 [ 0.00734668  0.44156074]
 [ 0.00262466  0.90432667]
 [ 0.83723671  0.90788389]]


In [13]:
row_r1 = a[1, :]    # Rank 1 view of the second row of a  
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
row_r3 = a[[1], :]  # Rank 2 view of the second row of a
print (row_r1, row_r1.shape) 
print (row_r2, row_r2.shape)
print (row_r3, row_r3.shape)

(array([5, 6, 7, 8]), (4,))
(array([[5, 6, 7, 8]]), (1, 4))
(array([[5, 6, 7, 8]]), (1, 4))


In [14]:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print (col_r1, col_r1.shape)
print
print (col_r2, col_r2.shape)

(array([ 2,  6, 10]), (3,))

(array([[ 2],
       [ 6],
       [10]]), (3, 1))


Integer array indexing: When you index into numpy arrays using slicing, the resulting array view will always be a subarray of the original array. In contrast, integer array indexing allows you to construct arbitrary arrays using the data from another array. Here is an example:

In [15]:
a = np.array([[1,2], [3, 4], [5, 6]])

print(a[a>2])
print (a>2)


[3 4 5 6]
[[False False]
 [ True  True]
 [ True  True]]


In [16]:

a = np.array([[1,2], [3, 4], [5, 6]])

# An example of integer array indexing.
# The returned array will have shape (3,) and 
print (a[[0, 1, 2], [0, 1, 0]])

# The above example of integer array indexing is equivalent to this:
print (np.array([a[0, 0], a[1, 1], a[2, 0]]))

[1 4 5]
[1 4 5]


## Data Types

Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:

In [17]:
x = np.array([1, 2])  # Let numpy choose the datatype
y = np.array([1.0, 2.0])  # Let numpy choose the datatype
z = np.array([1, 2], dtype=np.int64)  # Force a particular datatype

print (x.dtype, y.dtype, z.dtype)

(dtype('int64'), dtype('float64'), dtype('int64'))


In [32]:
a = [1,2,3]
b = [1,4]
a+b

[1, 2, 3, 1, 4]

In [33]:
a = np.random.random(3,)
b = np.random.random(3,)
print a
print b
a+b

array([ 1.2650741 ,  1.81468395,  1.39275494])

## Array Math

Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the numpy module:

In [18]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
print (x + y)
print (np.add(x, y))

[[  6.   8.]
 [ 10.  12.]]
[[  6.   8.]
 [ 10.  12.]]


In [19]:
print (x - y)
print ("--------------------------------------")
print (x*y)
print ("--------------------------------------")
print (x/y)
print ("--------------------------------------")
print (np.sqrt(x))


[[-4. -4.]
 [-4. -4.]]
--------------------------------------
[[  5.  12.]
 [ 21.  32.]]
--------------------------------------
[[ 0.2         0.33333333]
 [ 0.42857143  0.5       ]]
--------------------------------------
[[ 1.          1.41421356]
 [ 1.73205081  2.        ]]


In [20]:

x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])
v = np.array([9,10])
w = np.array([11, 12])

print (np.dot(v,w))
print("")
print (np.dot(x,v))
print("")
print (np.dot(x,y))
print("")
print (np.dot(y,x))

219

[29 67]

[[19 22]
 [43 50]]

[[23 34]
 [31 46]]


In [35]:
x = np.array([[1,2, 5],[3,4, 9]])

print (np.sum(x))  # Compute sum of all elements; prints "24"
print (np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6 14]"
print (np.sum(x, axis=1))  # Compute sum of each row; prints "[8 16]"

24
[ 4  6 14]
[ 8 16]


In [36]:
print (x.T)

[[1 3]
 [2 4]
 [5 9]]


# Broadcasting 

Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:


In [38]:

# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])

v = np.array([1, 0, 1])
y = np.empty_like(x)   # Create an empty matrix with the same shape as x

# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
    y[i, :] = x[i, :] + v

print (y)
print(" ")

print (x + v)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]
 
[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


The line y = x + v works even though x has shape (4, 3) and v has shape (3,) due to broadcasting; this line works as if v actually had shape (4, 3), where each row was a copy of v, and the sum was performed elementwise.
Broadcasting two arrays together follows these rules:

1. If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
2. The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
3. The arrays can be broadcast together if they are compatible in all dimensions.
4. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension

In [24]:

# Compute outer product of vectors
v = np.array([1,2,3])  # v has shape (3,)
w = np.array([4,5])    # w has shape (2,)
# To compute an outer product, we first reshape v to be a column
# vector of shape (3, 1); we can then broadcast it against w to yield
# an output of shape (3, 2), which is the outer product of v and w:

print (np.reshape(v, (3, 1)) * w)

[[ 4  5]
 [ 8 10]
 [12 15]]


In [25]:
# Add a vector to each column of a matrix
# x has shape (2, 3) and w has shape (2,).
# If we transpose x then it has shape (3, 2) and can be broadcast
# against w to yield a result of shape (3, 2); transposing this result
# yields the final result of shape (2, 3) which is the matrix x with
# the vector w added to each column. Gives the following matrix:
x = np.array([[1,2,3], [4,5,6]])

print ((x.T + w).T)


[[ 5  6  7]
 [ 9 10 11]]


In [26]:
print (x * 2)


[[ 2  4  6]
 [ 8 10 12]]


In [40]:
b = np.arange(12)
print (b)
a =b.reshape(3,4)
print (a)

[ 0  1  2  3  4  5  6  7  8  9 10 11]
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [43]:
a= np.random.random(3)
a

array([ 0.09724263,  0.37838261,  0.42192933])

In [48]:
b = np.random.random(2)
b

array([ 0.44255706,  0.72342816])

In [49]:
np.concatenate((a,b),axis=0)

array([ 0.09724263,  0.37838261,  0.42192933,  0.44255706,  0.72342816])