# Working with Numpy

## Statistics Functions


|      Category   |        Functions                                          |
|-----------------|-----------------------------------------------------------|
|Order Statistics | `amin, amax, ptp, percentile`                             |
|Summary          | `mean, median, std, var, average`                         |
|Correlating      | `corrcoef, correlate, cov`                                |
|Histogram        | `histogram, histogram2d, histogramdd, bincount, digitize` |
|Sums, products   | `sum, prod, cumsum, cumprod, diff`                        |
|Min, Max         | `min, max, argmin, argmax`                                |


In [1]:
import numpy as np
x = np.random.normal(10, 4, 50)        # Random sample from N(10, 16)
np.mean(x)

10.114881891459294

In [2]:
np.std(x)

4.0880128913459215

Many of these functions are also available as ndarray methods.

In [3]:
print("Mean :", x.mean(), "\nVariance ", x.var())

Mean : 10.114881891459294 
Variance  16.711849399810443


## Matrix operations

A two-dimensional numpy array can be treated as a matrix. Numpy also provides a matrix class, which shall be discussed later.
Addition, subtraction, and multiplication by scaler can be performed using usual arithmetic operator on the two-dimensional array.

### Addition, Subtraction

Vectorized arithmatic operators of numpy readily provide matrix addition and subtraction for matrices of same shape

A + B  
A - B


### Scalar multiplication

Vectorized multiplication operator coupled with broadcasting mechsnism readily provide scalar multiplication 

c * A   # Here c is a scalar

### Matrix multiplication

Matrix multiplication can be performed using a matrix multiplication operator `@`

In [4]:
A = np.array([[2, 3, 1],
              [3, 5, 2]])
B = np.array([[2, 3],
              [1, 2],
              [3, 2]])
A

array([[2, 3, 1],
       [3, 5, 2]])

In [5]:
B

array([[2, 3],
       [1, 2],
       [3, 2]])

In [6]:
C = A @ B
C

array([[10, 14],
       [17, 23]])

Alternatively, matrix multiplication can also be performed using `dot` function of numpy or `dot` method of ndarray.

In [7]:
np.dot(A, B)    # Using "dot" function

array([[10, 14],
       [17, 23]])

In [8]:
A.dot(B)        # Using "dot" method

array([[10, 14],
       [17, 23]])

In [9]:
B.dot(A)

array([[13, 21,  8],
       [ 8, 13,  5],
       [12, 19,  7]])

### Linear Algebra package

Many matrix operations are available in the linear algebra sub package(np.linalg) of numpy. 

#### Matrix inverse

In [10]:
D = np.linalg.inv(C)
D

array([[-2.875,  1.75 ],
       [ 2.125, -1.25 ]])

In [11]:
C @ D  # Varify inverse

array([[1., 0.],
       [0., 1.]])

In [12]:
D @ C

array([[1., 0.],
       [0., 1.]])

#### QR Decomposition

In [13]:
Q, R = np.linalg.qr(A) 
Q

array([[-0.5547002 , -0.83205029],
       [-0.83205029,  0.5547002 ]])

In [14]:
R

array([[-3.60555128, -5.82435206, -2.21880078],
       [ 0.        ,  0.2773501 ,  0.2773501 ]])

In [15]:
Q @ R  # Verify decomosition

array([[2., 3., 1.],
       [3., 5., 2.]])

In [16]:
Q1, R1 = np.linalg.qr(B)

In [17]:
Q1

array([[-0.53452248, -0.57735027],
       [-0.26726124, -0.57735027],
       [-0.80178373,  0.57735027]])

In [18]:
R1

array([[-3.74165739, -3.74165739],
       [ 0.        , -1.73205081]])

In [19]:
Q1 @ R1

array([[2., 3.],
       [1., 2.],
       [3., 2.]])

In [20]:
Q @ Q.T

array([[1., 0.],
       [0., 1.]])

In [21]:
Q

array([[-0.5547002 , -0.83205029],
       [-0.83205029,  0.5547002 ]])

In [22]:
np.linalg.inv(Q)

array([[-0.5547002 , -0.83205029],
       [-0.83205029,  0.5547002 ]])

 **Home Work**: Find other functions in linalg submodule.

## Stacking Arrays

Arrays can be stacked together using the following commands  

### Vertical stacking

In [23]:
A

array([[2, 3, 1],
       [3, 5, 2]])

In [24]:
A1 = np.array([[4, 6, 3]])
A1

array([[4, 6, 3]])

In [25]:
np.vstack((A, A1))   # Stacking vertically

array([[2, 3, 1],
       [3, 5, 2],
       [4, 6, 3]])

### Horizontal stacking

In [26]:
A2 = B[:2]
A2

array([[2, 3],
       [1, 2]])

In [27]:
A         # For ready reference

array([[2, 3, 1],
       [3, 5, 2]])

In [28]:
np.hstack ((A, A2))   # Horizontal stacking

array([[2, 3, 1, 2, 3],
       [3, 5, 2, 1, 2]])

### Block stacking

In [29]:
A11 = A
A12 = A2
A21 = A1
A22 = np.array([[5, 5]])
AA = np.block([[A11, A12],[A21, A22]])# Assemble array from given blocks
AA

array([[2, 3, 1, 2, 3],
       [3, 5, 2, 1, 2],
       [4, 6, 3, 5, 5]])

In [30]:
np.hstack((A11, A12))

array([[2, 3, 1, 2, 3],
       [3, 5, 2, 1, 2]])

In [31]:
np.block([[A11, A12]])

array([[2, 3, 1, 2, 3],
       [3, 5, 2, 1, 2]])

In [32]:
np.vstack((A11, A21))

array([[2, 3, 1],
       [3, 5, 2],
       [4, 6, 3]])

In [33]:
np.block([[A11],[A21]])

array([[2, 3, 1],
       [3, 5, 2],
       [4, 6, 3]])

**Home Work**: Also see the methods: `column_stack`, `r_`, `c_`

## Splitting arrays

Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur.  
 
For example,  

`np.hsplit(a,3) 	 # Split a into 3`  
`np.hsplit(a,(3,5))  # Split a at the third and the fifth  column`  

Similarly, `vsplit` splits along the vertical axis, and `array_split` allows one to specify along which axis to split.


In [59]:
AA

array([[2, 3, 1, 2, 3],
       [3, 5, 2, 1, 2],
       [4, 6, 3, 5, 5]])

In [34]:
np.hsplit(AA, (2, 4))

[array([[2, 3],
        [3, 5],
        [4, 6]]),
 array([[1, 2],
        [2, 1],
        [3, 5]]),
 array([[3],
        [2],
        [5]])]

In [35]:
AA1, AA2, AA3 = np.hsplit(AA, (2, 4))   # Split at column 2 and 4
AA1

array([[2, 3],
       [3, 5],
       [4, 6]])

In [36]:
AA2

array([[1, 2],
       [2, 1],
       [3, 5]])

In [37]:
AA3

array([[3],
       [2],
       [5]])

In [38]:
AA[:,:4]    # First 4 columns of AA

array([[2, 3, 1, 2],
       [3, 5, 2, 1],
       [4, 6, 3, 5]])

In [39]:
np.hsplit (AA[:,:4], 2)     # Split into 2 matrices with equal number of columns

[array([[2, 3],
        [3, 5],
        [4, 6]]),
 array([[1, 2],
        [2, 1],
        [3, 5]])]

## Statistical Functions with additional arguments

In [40]:
salary = (np.random.randint(15, 25, 30)*1000).reshape(30,1)
savings = salary*(np.random.random(30).reshape(30,1)*0.25)

In [41]:
Data = np.hstack((salary, savings))
Data

array([[19000.        ,  1641.82113136],
       [19000.        ,  2343.39280839],
       [20000.        ,   499.33235801],
       [16000.        ,   284.98866914],
       [16000.        ,  1316.93282645],
       [16000.        ,  3518.59623582],
       [21000.        ,  4331.22724263],
       [15000.        ,  3695.27113178],
       [20000.        ,  3135.85535746],
       [23000.        ,  1973.81767835],
       [22000.        ,  2174.23031027],
       [16000.        ,  1121.16892674],
       [15000.        ,  2931.51729507],
       [19000.        ,  4462.68167467],
       [17000.        ,  2794.30625109],
       [21000.        ,  1546.07833994],
       [22000.        ,  4430.01995682],
       [21000.        ,   872.59407032],
       [23000.        ,   176.09842441],
       [23000.        ,  5488.33090708],
       [23000.        ,  1463.9604703 ],
       [24000.        ,  2029.47875407],
       [18000.        ,  1137.17960577],
       [18000.        ,  3063.30341152],
       [15000.  

In [42]:
Data.mean(axis=0)     # Compute Column means

array([19500.        ,  2311.60983572])

In [43]:
Data.mean()

10905.804917859648

Thus,  
Average salary is Rs. 19466.667, and  
Average Savings is Rs. 2720.977

In [44]:
np.corrcoef(Data.T)       # "corrcoef" function assumes variables in rows, and obs in columns

array([[1.        , 0.01803566],
       [0.01803566, 1.        ]])

Thus, correlation coefficient between Salary an Savings is -0.009

## `random` sub-package of Numpy 

We have already seen use of some functions of the random sub-package. The `random` sub-package (module) of Numpy provides functions for generating random numbers or selecting random samples.

In [45]:
from numpy import random

In [46]:
xx = np.array([1, 2, 3, 4, 5])
xx

array([1, 2, 3, 4, 5])

In [47]:
random.permutation(xx)

array([5, 1, 4, 2, 3])

In [48]:
random.permutation(xx)

array([5, 4, 3, 2, 1])

In [49]:
xx

array([1, 2, 3, 4, 5])

In [50]:
random.shuffle(xx)

In [51]:
xx

array([2, 5, 4, 3, 1])

In [52]:
xx = random.normal(50, 5, 50)
xx

array([49.20262822, 47.72280694, 51.00944708, 48.32825594, 52.65641562,
       43.91621026, 47.82007481, 49.02697354, 48.37924383, 56.17539339,
       45.49282088, 51.89685635, 54.17894714, 51.45967143, 50.75310245,
       40.84489543, 56.04658809, 51.62636056, 47.58146569, 49.81383567,
       49.29211286, 44.26503303, 48.08904565, 45.03624036, 51.2850196 ,
       47.54216616, 51.46172695, 53.98352454, 48.54894647, 46.9240746 ,
       52.75012913, 55.62875   , 49.03115106, 44.18937825, 45.8389551 ,
       60.03120879, 42.42536789, 48.76530683, 41.74936606, 46.9422677 ,
       48.57737474, 45.50144249, 51.55694299, 52.7895853 , 51.14045786,
       53.02366112, 56.29297951, 54.5532345 , 49.25068039, 52.84317152])

In [53]:
xx.mean()

49.664825894817625

In [54]:
random.permutation(xx).mean()

49.664825894817625

In [55]:
random.permutation(xx).mean()

49.664825894817604

In [56]:
random.random()

0.7625630083552134

In [57]:
10 + random.random()*20

12.154727747382424

In [61]:
np.sum((x-np.mean(x))**2)/np.size(x)

16.711849399810443

In [63]:
x.size

50

In [64]:
x

array([10.47422309,  5.85933787,  5.22194903, 15.09620192, 10.34514738,
       12.32789177,  8.61093322, 12.71286817, 11.6622428 , 19.55196994,
        3.87840221,  7.71880291, 18.92194984,  9.45801403, 10.55723191,
       16.08835664, 10.64730729, 12.79519056,  8.35745953,  8.60852166,
       11.62734142,  8.47370995,  3.46248927, 13.41182683, 11.5617367 ,
        4.41724093, 11.47108283, 12.46862265, 11.03409574, 19.74140009,
        7.66473947,  8.88020004,  3.58810905, 10.23561497, 14.21915032,
       10.97129287, 16.34313925,  4.65959642, 15.20202773,  9.0428641 ,
        7.51112398,  6.70551395,  6.54290553,  3.85176995,  9.98769966,
        4.12849539, 11.73515391,  8.71921739,  6.30643803, 12.88549439])

In [65]:
x.mean()

10.114881891459294