# Working with Numpy

## Statistics Functions


|      Category   |        Functions                                          |
|-----------------|-----------------------------------------------------------|
|Order Statistics | `amin, amax, ptp, percentile`                             |
|Summary          | `mean, median, std, var, average`                         |
|Correlating      | `corrcoef, correlate, cov`                                |
|Histogram        | `histogram, histogram2d, histogramdd, bincount, digitize` |
|Sums, products   | `sum, prod, cumsum, cumprod, diff`                        |
|Min, Max         | `min, max, argmin, argmax`                                |


In [1]:
import numpy as np
x = np.random.normal(10, 4, 50)        # Random sample from N(10, 16)
np.mean(x)

10.175942425808412

In [2]:
np.std(x)

4.385136297810001

Many of these functions are also available as ndarray methods.

In [3]:
print("Mean :", x.mean(), "\nVariance ", x.var())

Mean : 10.175942425808412 
Variance  19.229420350370802


## Matrix operations

A two-dimensional numpy array can be treated as a matrix. Numpy also provides a matrix class, which shall be discussed later.
Addition, subtraction, and multiplication by scaler can be performed using usual arithmetic operator on the two-dimensional array.

### Addition, Subtraction

Vectorized arithmatic operators of numpy readily provide matrix addition and subtraction for matrices of same shape

A + B  
A - B


### Scalar multiplication

Vectorized multiplication operator coupled with broadcasting mechsnism readily provide scalar multiplication 

c * A   # Here c is a scalar

### Matrix multiplication

Matrix multiplication can be performed using a matrix multiplication operator `@`

In [4]:
A = np.array([[2, 3, 1],
              [3, 5, 2]])
B = np.array([[2, 3],
              [1, 2],
              [3, 2]])
A

array([[2, 3, 1],
       [3, 5, 2]])

In [5]:
B

array([[2, 3],
       [1, 2],
       [3, 2]])

In [6]:
C = A @ B
C

array([[10, 14],
       [17, 23]])

Alternatively, matrix multiplication can also be performed using `dot` function of numpy or `dot` method of ndarray.

In [7]:
np.dot(A, B)    # Using "dot" function

array([[10, 14],
       [17, 23]])

In [8]:
A.dot(B)        # Using "dot" method

array([[10, 14],
       [17, 23]])

In [9]:
B.dot(A)

array([[13, 21,  8],
       [ 8, 13,  5],
       [12, 19,  7]])

### Linear Algebra package

Many matrix operations are available in the linear algebra sub package(np.linalg) of numpy. 

#### Matrix inverse

In [10]:
D = np.linalg.inv(C)
D

array([[-2.875,  1.75 ],
       [ 2.125, -1.25 ]])

In [11]:
C @ D  # Varify inverse

array([[1., 0.],
       [0., 1.]])

In [12]:
D @ C

array([[1., 0.],
       [0., 1.]])

#### QR Decomposition

In [14]:
A

array([[2, 3, 1],
       [3, 5, 2]])

In [15]:
Q, R = np.linalg.qr(A) 
Q

array([[-0.5547002 , -0.83205029],
       [-0.83205029,  0.5547002 ]])

In [16]:
R

array([[-3.60555128, -5.82435206, -2.21880078],
       [ 0.        ,  0.2773501 ,  0.2773501 ]])

In [17]:
Q @ R  # Verify decomosition

array([[2., 3., 1.],
       [3., 5., 2.]])

In [18]:
Q1, R1 = np.linalg.qr(B)

In [19]:
Q1

array([[-0.53452248, -0.57735027],
       [-0.26726124, -0.57735027],
       [-0.80178373,  0.57735027]])

In [20]:
R1

array([[-3.74165739, -3.74165739],
       [ 0.        , -1.73205081]])

In [21]:
Q1 @ R1

array([[2., 3.],
       [1., 2.],
       [3., 2.]])

In [22]:
Q @ Q.T

array([[1., 0.],
       [0., 1.]])

In [23]:
Q

array([[-0.5547002 , -0.83205029],
       [-0.83205029,  0.5547002 ]])

In [24]:
np.linalg.inv(Q)

array([[-0.5547002 , -0.83205029],
       [-0.83205029,  0.5547002 ]])

 **Home Work**: Find other functions in linalg submodule.

## Stacking Arrays

Arrays can be stacked together using the following commands  

### Vertical stacking

In [25]:
A

array([[2, 3, 1],
       [3, 5, 2]])

In [26]:
A1 = np.array([[4, 6, 3]])
A1

array([[4, 6, 3]])

In [27]:
np.vstack((A, A1))   # Stacking vertically

array([[2, 3, 1],
       [3, 5, 2],
       [4, 6, 3]])

### Horizontal stacking

In [28]:
A2 = B[:2]
A2

array([[2, 3],
       [1, 2]])

In [29]:
A         # For ready reference

array([[2, 3, 1],
       [3, 5, 2]])

In [30]:
np.hstack ((A, A2))   # Horizontal stacking

array([[2, 3, 1, 2, 3],
       [3, 5, 2, 1, 2]])

### Block stacking

In [31]:
A11 = A
A12 = A2
A21 = A1
A22 = np.array([[5, 5]])
AA = np.block([[A11, A12],[A21, A22]])# Assemble array from given blocks
AA

array([[2, 3, 1, 2, 3],
       [3, 5, 2, 1, 2],
       [4, 6, 3, 5, 5]])

In [32]:
np.hstack((A11, A12))

array([[2, 3, 1, 2, 3],
       [3, 5, 2, 1, 2]])

In [33]:
np.block([[A11, A12]])

array([[2, 3, 1, 2, 3],
       [3, 5, 2, 1, 2]])

In [34]:
np.vstack((A11, A21))

array([[2, 3, 1],
       [3, 5, 2],
       [4, 6, 3]])

In [35]:
np.block([[A11],[A21]])

array([[2, 3, 1],
       [3, 5, 2],
       [4, 6, 3]])

**Home Work**: Also see the methods: `column_stack`, `r_`, `c_`

## Splitting arrays

Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur.  
 
For example,  

`np.hsplit(a,3) 	 # Split a into 3`  
`np.hsplit(a,(3,5))  # Split a at the third and the fifth  column`  

Similarly, `vsplit` splits along the vertical axis, and `array_split` allows one to specify along which axis to split.


In [37]:
AA

array([[2, 3, 1, 2, 3],
       [3, 5, 2, 1, 2],
       [4, 6, 3, 5, 5]])

In [38]:
np.hsplit(AA, (2, 4))

[array([[2, 3],
        [3, 5],
        [4, 6]]),
 array([[1, 2],
        [2, 1],
        [3, 5]]),
 array([[3],
        [2],
        [5]])]

In [39]:
AA1, AA2, AA3 = np.hsplit(AA, (2, 4))   # Split at column 2 and 4
AA1

array([[2, 3],
       [3, 5],
       [4, 6]])

In [40]:
AA2

array([[1, 2],
       [2, 1],
       [3, 5]])

In [41]:
AA3

array([[3],
       [2],
       [5]])

In [42]:
AA[:,:4]    # First 4 columns of AA

array([[2, 3, 1, 2],
       [3, 5, 2, 1],
       [4, 6, 3, 5]])

In [43]:
np.hsplit (AA[:,:4], 2)     # Split into 2 matrices with equal number of columns

[array([[2, 3],
        [3, 5],
        [4, 6]]),
 array([[1, 2],
        [2, 1],
        [3, 5]])]

## Statistical Functions with additional arguments

In [44]:
salary = (np.random.randint(15, 25, 30)*1000).reshape(30,1)
savings = salary*(np.random.random(30).reshape(30,1)*0.25)

In [45]:
Data = np.hstack((salary, savings))
Data

array([[24000.        ,  5985.48389896],
       [17000.        ,  1399.7056481 ],
       [23000.        ,  4453.41012757],
       [24000.        ,   652.07806942],
       [18000.        ,  2816.10633045],
       [15000.        ,  1497.8649375 ],
       [23000.        ,  3359.0142547 ],
       [18000.        ,  3510.16362748],
       [16000.        ,  1949.96220694],
       [23000.        ,  3866.72736907],
       [24000.        ,  3703.37711351],
       [23000.        ,  2110.10307362],
       [21000.        ,  4406.81188887],
       [20000.        ,  4013.78961448],
       [17000.        ,  3781.53135365],
       [20000.        ,  1982.23325225],
       [20000.        ,  2944.25970795],
       [17000.        ,   428.14479619],
       [22000.        ,  5226.8086927 ],
       [17000.        ,  1674.24631062],
       [20000.        ,  3036.06845468],
       [24000.        ,  3874.28851875],
       [24000.        ,  4185.83164615],
       [15000.        ,   288.12796893],
       [19000.  

In [46]:
Data.mean(axis=0)     # Compute Column means

array([19833.33333333,  2957.26033992])

In [47]:
Data.mean()

11395.296836626976

Thus,  
Average salary is Rs. 19466.667, and  
Average Savings is Rs. 2720.977

In [48]:
np.corrcoef(Data.T)       # "corrcoef" function assumes variables in rows, and obs in columns

array([[1.        , 0.49904995],
       [0.49904995, 1.        ]])

Thus, correlation coefficient between Salary an Savings is -0.009

## `random` sub-package of Numpy 

We have already seen use of some functions of the random sub-package. The `random` sub-package (module) of Numpy provides functions for generating random numbers or selecting random samples.

In [49]:
from numpy import random

In [50]:
xx = np.array([1, 2, 3, 4, 5])
xx

array([1, 2, 3, 4, 5])

In [51]:
random.permutation(xx)

array([4, 3, 5, 1, 2])

In [52]:
random.permutation(xx)

array([1, 3, 4, 2, 5])

In [53]:
xx

array([1, 2, 3, 4, 5])

In [54]:
random.shuffle(xx)

In [55]:
xx

array([5, 1, 3, 2, 4])

In [56]:
xx = random.normal(50, 5, 50)
xx

array([54.866593  , 48.3567351 , 51.70860642, 51.37899078, 51.02071992,
       46.89360698, 47.66163006, 54.48871491, 45.55914128, 51.32888348,
       55.61137486, 60.08986286, 46.36868955, 52.08692473, 49.42902527,
       49.46160109, 55.95267483, 54.33663525, 48.02818773, 51.91575557,
       51.57628008, 47.27347651, 54.09200113, 51.99692803, 59.856424  ,
       44.3061176 , 51.59471249, 37.97789687, 52.62128495, 48.14155058,
       43.91618482, 54.60997113, 39.84870836, 45.36656601, 46.30357316,
       47.66763705, 44.9550605 , 46.50468962, 49.59795797, 46.66855479,
       47.25163905, 46.10929715, 41.5507482 , 46.75185427, 51.94520925,
       52.7611176 , 54.02864382, 47.45499371, 51.02302879, 44.64889361])

In [57]:
xx.mean()

49.49890709470538

In [58]:
random.permutation(xx).mean()

49.49890709470538

In [59]:
random.permutation(xx).mean()

49.498907094705395

In [60]:
random.random()

0.2711035775575886

In [61]:
10 + random.random()*20

24.464988198170296

In [62]:
np.sum((x-np.mean(x))**2)/n

NameError: name 'n' is not defined

In [63]:
3**2

9