# Numpy Package

## Creating a numpy array

In [3]:
import numpy as np         # np is the standard alias for numpy 
data = [6, 7.5, 8, 0, 1]   # Python list object
x = np.array(data)         # Using arry function of numpy

Above statements have created a one-dimentional numpy arry x.

In [2]:
another_data = [[1, 2, 3, 4], [5, 6, 7, 8]]
y = np.array(another_data)

These statements have created a two-dimentional numpy arry x.  
The type of object is `numpy.ndarray` as shown below.

In [3]:
type(x)

numpy.ndarray

In [4]:
type(y)

numpy.ndarray

### Attributes of numpy array  

##### dtype
The dtype attribute returns the common data type of the array elements.  

In [5]:
x.dtype

dtype('float64')

In [6]:
y.dtype

dtype('int32')

Note that these dtypes have been chosen automatically by the array function.  
It is also possible explicitly specify the dtype while creating an ndarray.This is shown below.

In [7]:
x1 = np.array(data, dtype = 'float32')
x1.dtype

dtype('float32')

#### ndim  
The ndim attribute returns the number of dimensions of an array.

In [8]:
x.ndim

1

In [9]:
y.ndim

2

#### shape

Shape of an ndarray is a tuple that specify size of each dimension.

In [10]:
x.shape

(5,)

In [11]:
y.shape

(2, 4)

#### size
Size of an ndarray is the number of elements in array

In [12]:
print(x)
x.size

[6.  7.5 8.  0.  1. ]


5

In [13]:
print(y)
y.size

[[1 2 3 4]
 [5 6 7 8]]


8

#### itemsize
Item size returns the size of an array elements in bytes

In [14]:
x.itemsize   # Recall that dtype of x is float64

8

In [15]:
x1.itemsize  # dtype is float32

4

In [16]:
y.itemsize  # dtype is int32

4

### Array Scalars
When a specific element of an array is extracted by *indexing the array*, the extracted element is an array scalar.

In [17]:
a = x[2]
type(a)

numpy.float64

Arry scalars supports the same methods and attributes as ndarrays.

In [18]:
a.ndim

0

## Different ways of creating numpy arrays
### Using other python structures

We have already seen how to create an ndarray using array function of numpy using List as argument.

### Using built-in numpy functions

There are a number of other functions also for creating new arrays.


In [19]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [20]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [21]:
np.ones((2,3))

array([[1., 1., 1.],
       [1., 1., 1.]])

In [22]:
a = np.arange(5)
a

array([0, 1, 2, 3, 4])

In [23]:
np.arange(2, 1, -0.1)

array([2. , 1.9, 1.8, 1.7, 1.6, 1.5, 1.4, 1.3, 1.2, 1.1])

In [24]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [25]:
np.empty(7)

array([0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
       0.00000000e+000, 0.00000000e+000, 3.11525958e-307])

### Inputting from file

In [5]:
b = np.loadtxt("d:\\data1.txt")

In [6]:
b

array([1.2, 1.4, 1.7, 2.1, 2.4, 3. ])

In [7]:
c = np.loadtxt("data2.txt")

In [8]:
c

array([[1.1, 1.3, 1.5, 1.7, 1.9],
       [2.1, 2.3, 2.5, 2.7, 2.9],
       [3.1, 3.3, 3.5, 3.7, 3.9]])

In [None]:
f = open("data3.txt")
d = np.loadtxt(f)

In [1]:
np.loadtxt?

Object `np.loadtxt` not found.


# Working with numpy Arrays

## Indexing numpy array

Numpy ndarrays can be indexed using the standard syntax `x[obj]`, where x is the array and obj the selection. Two kinds of indexing are available: 
* Basic indexing
  * Field access 
  * Slicing 
* Advanced indexing 
  * Using bool array as index
  * Fancy indexing (Using non-bool array as index)

The type of `obj` determines the kind of indexing being used. 

In [12]:
a = np.array([12, 35, 32, 56, 33])
a

array([12, 35, 32, 56, 33])

### Positive integer as index

In [28]:
a[0]         # 0th element - element with index 0

12

In [29]:
a[2]         # Element with index 2

32

In [30]:
i = 3
a[i]         # variable as an index, variable must take integer value

56

In [31]:
a[i-2]       # Expression as an index

34

In [32]:
i/2

1.5

In [33]:
a[i/2]       # expression with non-integer value not allowed

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

In [34]:
s = a[i+1]*2     # indexed array element can participate in an expression

In [35]:
s

66

In [11]:
a[5]

IndexError: index 5 is out of bounds for axis 0 with size 5

### Negative integer as index

In [36]:
a[-1]      # refers to 1st value from end

33

In [37]:
a[-2]      # 2nd value from end

56

### Index value must be within the bounds

In [38]:
a[5]

IndexError: index 5 is out of bounds for axis 0 with size 5

This is a runtime error, called **exception**. The name of the exception is `IndexError`.  
Also note that the only dimension of the one-dimensional array a is refered to as `axis 0` in the message.

In [39]:
## Bool array as index

In [13]:
even = a%2 == 0     # even is a bool array (of the same size as a)

In [14]:
even

array([ True, False,  True,  True, False])

In [15]:
a[even]        # The elements corresponding to "True" in the index array are extracted.

array([12, 32, 56])

In [43]:
a[2]=23

In [44]:
a

array([12, 34, 23, 56, 33])

In [45]:
a[a%2==1] 

array([23, 33])

In [46]:
gender = np.array(['Male', 'Male', 'Female', 'Male', 'Female'])

In [47]:
gender

array(['Male', 'Male', 'Female', 'Male', 'Female'], dtype='<U6')

In [48]:
gender=='Male'

array([ True,  True, False,  True, False])

In [49]:
a[gender=='Male']     # Values in a corresponding to Male gender

array([12, 34, 56])

### Indexing a 2d array

In [4]:
b = np.array([[1, 2, 3, 4, 5], [11, 22, 33, 44, 55]])

In [5]:
b

array([[ 1,  2,  3,  4,  5],
       [11, 22, 33, 44, 55]])

In [28]:
b.shape

(2, 5)

In [29]:
b[0, 0]

1

In [30]:
b[0, 2]

3

In [31]:
b[1, 3]      # index value 1 for "axis 0", and index value 3 for "axis 1"

44

In [32]:
b[0, -1]     # -ve value as index have the same meaning as discussed earlier

5

In [33]:
b > 10

array([[False, False, False, False, False],
       [ True,  True,  True,  True,  True]])

In [34]:
b[b>10]     # bool array of the same shape as b can be used as an index for b

array([11, 22, 33, 44, 55])

In [35]:
b

array([[ 1,  2,  3,  4,  5],
       [11, 22, 33, 44, 55]])

#### Syntactic Sugar 

From the syntax `x[obj]` for indexing an ndarray, it may appear that multidimensional indexing such as `x[2, 3]` doesnot follow the syntax. However, following examples clarifies that expression such as `x[i, j]` is actually same as `x[(i, j)]`, where the index is a tuple.

In [19]:
b[(1, 3)]

44

In [20]:
b[1, 3]

44

Thus, `b[expr1, expr2]` is actually a *syntactic sugar* for `b[(expr1, expr2)]`

### Using less number of indices than ndim

In [37]:
b[0]    # Referes to entire row with index 0

array([1, 2, 3, 4, 5])

In [21]:
b[:,1]    # Column with index 1

array([ 2, 22])

### Slicing

Slicing operation of core python is supported in numpy also, but for n dimensions.

To perform slicing, use ***slice object*** (or tuple of slice objects) as index. A slice object is constructed by syntax `start:end[:step]`

In [39]:
a

array([12, 34, 23, 56, 33])

In [40]:
a[:3]        # All elements upto, but excluding, index 3

array([12, 34, 23])

In [41]:
a[2:]        # All elements with index 2 and above

array([23, 56, 33])

In [42]:
a[2:5]       # All elements with index 2 or more but less than 5

array([23, 56, 33])

In [43]:
a[1:5]

array([34, 23, 56, 33])

In [44]:
a[1:5:2]    # step size can be optionally specified for index update

array([34, 56])

In [45]:
a[4:1:-1]   # -ve stepsize can also be used

array([33, 56, 23])

In [46]:
b

array([[ 1,  2,  3,  4,  5],
       [11, 22, 33, 44, 55]])

In [47]:
b[0,2:5]   # Slicing can be used for any axis (i.e. dimension)

array([3, 4, 5])

In [48]:
b[:2, 2:4]  # Slicing can also be used for both the axes.

array([[ 3,  4],
       [33, 44]])

In [49]:
a

array([12, 34, 23, 56, 33])

In [50]:
a[:]     # Only ':' character means all index values, from begining to end

array([12, 34, 23, 56, 33])

In [51]:
b

array([[ 1,  2,  3,  4,  5],
       [11, 22, 33, 44, 55]])

In [52]:
b[:,3]      # Referes to column with index 3

array([ 4, 44])

In [53]:
b[0,:]      # Same as b[0]. Thus, ':' is optional for one or more axes at the end

array([1, 2, 3, 4, 5])

In [54]:
b[0]

array([1, 2, 3, 4, 5])

It is important to note that the slicing operation preserves the dimension of array. Whereas normal indexing collapses the array to a lower dimensional arry.  

See, for example, the following results.

In [6]:
b[1, 1:4]       # Results in 1-d array.

array([22, 33, 44])

In [7]:
b[1:2, 1:4]    # Results in 2-d array

array([[22, 33, 44]])

In [62]:
b[:2, 3]      # Results in 1-d array

array([ 4.4, 44.4])

In [63]:
b[:2, 3:4]      # Results in 2-d array

array([[ 4.4],
       [44.4]])

### Fancy Indexing

Fancy indexing is performed when an array (or tuple of arrays) of integers is used as index.  

#### Fancy indexing 1-d array

In [64]:
a

array([11, 22, 33, 44, 55])

In [65]:
a[[0, 2, 3]]             # List will be treated as 1-d array when used as index

array([11, 33, 44])

In [66]:
a[np.array([0, 2, 3])]   # Same result

array([11, 33, 44])

In [67]:
a[[1, 2, 2]]

array([22, 33, 33])

In [68]:
a[np.array([[0, 2],[1, 3]])]

array([[11, 33],
       [22, 44]])

In [69]:
a[np.array([[0],[3]])]

array([[11],
       [44]])

It important to note that **the shape of the result is same as the shape of index**.

#### Fancy indexing 2-d array

In [70]:
b

array([[ 1.1 ,  2.2 ,  3.3 ,  4.4 ,  5.5 ],
       [11.1 , 22.2 , 33.3 , 44.4 , 55.5 ],
       [ 1.11,  2.22,  3.33,  4.44,  5.55]])

In [71]:
b[([0,2], [1, 3])]     

array([2.2 , 4.44])

In [72]:
b[[0,2], [1, 3]]     # Using the syntactic sugar

array([2.2 , 4.44])

The result is surprising, as most people will expect that a matrix will be returned comprising of two rows and two columns. Instead the result is a 1-d array containing (0,1)th element and (2, 3)th element.

In general, for the fancy indexing of an n-dimensional array, all n indices must be the arrays of the same shape, and the result of such indexing is also of the same shape.

In [73]:
b[np.array([[0],[2]]), np.array([[1],[3]])]         # both indices have shape (2,1)

array([[2.2 ],
       [4.44]])

In [74]:
id1 = np.array([[0,0],[2, 2]])
id1

array([[0, 0],
       [2, 2]])

In [75]:
id2 = np.array([[1, 2], [1, 2]])
id2

array([[1, 2],
       [1, 2]])

In [76]:
b[id1, id2]

array([[2.2 , 3.3 ],
       [2.22, 3.33]])

in above example, both the indices id1 and id2 are 2-d arrays of shape (2,2). Therefore, 
the result is also a 2-d array of shape(2,2), comprising of the index pairs formed by the corresponding elements of the two 
indices id1 and id2.  

This examples, also shows how to obtain the submatrix (subarray) comprising of rows indexed by 0 and 2, and the columns indexed by 1, and 2.

Following example shows an alternative way of achieving the same result.

In [77]:
id1 = np.array([[0], [2]])
id1

array([[0],
       [2]])

In [78]:
id2 = [1, 2]
id2

[1, 2]

In [79]:
b[id1, id2]

array([[2.2 , 3.3 ],
       [2.22, 3.33]])

This example works due to the mechanism of ***broadcasting***. Broadcasting will be discussed later.  

`np.ix_` function provides a simpler way of achieving the same result as above.

In [80]:
b[np.ix_([0,2],[1,2])]

array([[2.2 , 3.3 ],
       [2.22, 3.33]])

Function `np.ix_` evaluates to a tuple of arrays of the requires shapes.

In [81]:
np.ix_([0,2],[1,2])

(array([[0],
        [2]]),
 array([[1, 2]]))

### Mixing the types of indexing

In [82]:
b

array([[ 1.1 ,  2.2 ,  3.3 ,  4.4 ,  5.5 ],
       [11.1 , 22.2 , 33.3 , 44.4 , 55.5 ],
       [ 1.11,  2.22,  3.33,  4.44,  5.55]])

In [83]:
b[1, 2:4]

array([33.3, 44.4])

In [84]:
b[1, [2, 4]]

array([33.3, 55.5])

In [85]:
b[[1], :]

array([[11.1, 22.2, 33.3, 44.4, 55.5]])

In [86]:
b[[0,2],:]

array([[1.1 , 2.2 , 3.3 , 4.4 , 5.5 ],
       [1.11, 2.22, 3.33, 4.44, 5.55]])

In [87]:
b[:,[2, 4]]

array([[ 3.3 ,  5.5 ],
       [33.3 , 55.5 ],
       [ 3.33,  5.55]])

## Copy vs View

Basic indexing creates a view, whereas advanced indexing creates a copy. A view refers to the elements of original array, thereby preserving memory. As opposed to that copy is another array having the same value as original array.

### Modifying elements of array

In [88]:
a

array([11, 22, 33, 44, 55])

In [89]:
a[2] = 35
a

array([11, 22, 35, 44, 55])

In [90]:
a[:2]= [10, 20]
a

array([10, 20, 35, 44, 55])

Elements of an array can be modified by assigning as shown above.

In [91]:
a_part = a[1:4]     # a_part is a view
a_part

array([20, 35, 44])

In [92]:
a[1]=21
a_part

array([21, 35, 44])

Any change in a is also reflected in the view a_part. Similarly, a change in view changes the original array

In [93]:
a_part[0] = 25
a

array([10, 25, 35, 44, 55])

In [94]:
b

array([[ 1.1 ,  2.2 ,  3.3 ,  4.4 ,  5.5 ],
       [11.1 , 22.2 , 33.3 , 44.4 , 55.5 ],
       [ 1.11,  2.22,  3.33,  4.44,  5.55]])

In [95]:
row0 = b[0]       # row0 is a view
b[0,2] = 33.1
b

array([[ 1.1 ,  2.2 , 33.1 ,  4.4 ,  5.5 ],
       [11.1 , 22.2 , 33.3 , 44.4 , 55.5 ],
       [ 1.11,  2.22,  3.33,  4.44,  5.55]])

In [96]:
row0

array([ 1.1,  2.2, 33.1,  4.4,  5.5])

In [97]:
row0[:3] = [1.5, 2.5, 3.5]
row0

array([1.5, 2.5, 3.5, 4.4, 5.5])

In [98]:
b

array([[ 1.5 ,  2.5 ,  3.5 ,  4.4 ,  5.5 ],
       [11.1 , 22.2 , 33.3 , 44.4 , 55.5 ],
       [ 1.11,  2.22,  3.33,  4.44,  5.55]])

Note that any change in `b` is reflected in `row0` and vice-versa. This is because both refer to the same memory locations

In [99]:
row1 = b[[1]]        # row1 is a copy
row1

array([[11.1, 22.2, 33.3, 44.4, 55.5]])

In [100]:
b[1, 1] = 22.5
b

array([[ 1.5 ,  2.5 ,  3.5 ,  4.4 ,  5.5 ],
       [11.1 , 22.5 , 33.3 , 44.4 , 55.5 ],
       [ 1.11,  2.22,  3.33,  4.44,  5.55]])

In [101]:
row1

array([[11.1, 22.2, 33.3, 44.4, 55.5]])

Since `row1` is an independent array, change in `b` doesn't change `row1`

### Explicitly creating copy

`copy` method can be used to explicitly create a copy as shown below.

In [102]:
col12_view = b[:,1:3]
col12_copy = b[:,1:3].copy()

In [103]:
col12_view

array([[ 2.5 ,  3.5 ],
       [22.5 , 33.3 ],
       [ 2.22,  3.33]])

In [104]:
col12_copy

array([[ 2.5 ,  3.5 ],
       [22.5 , 33.3 ],
       [ 2.22,  3.33]])

In [105]:
b[0,2] = 3.6; b[1,2] = 33.6

In [106]:
b

array([[ 1.5 ,  2.5 ,  3.6 ,  4.4 ,  5.5 ],
       [11.1 , 22.5 , 33.6 , 44.4 , 55.5 ],
       [ 1.11,  2.22,  3.33,  4.44,  5.55]])

In [107]:
col12_view

array([[ 2.5 ,  3.6 ],
       [22.5 , 33.6 ],
       [ 2.22,  3.33]])

In [108]:
col12_copy

array([[ 2.5 ,  3.5 ],
       [22.5 , 33.3 ],
       [ 2.22,  3.33]])

## Transforming arrays

### Transposing

Transposing is the operation of interchanging the axes of an array. The `T` attribute of array returns transpose of an array.   

In [109]:
a

array([10, 25, 35, 44, 55])

In [110]:
a.T

array([10, 25, 35, 44, 55])

Transposing has no effect on one dimensional array, as there is only one axis.

In [111]:
b

array([[ 1.5 ,  2.5 ,  3.6 ,  4.4 ,  5.5 ],
       [11.1 , 22.5 , 33.6 , 44.4 , 55.5 ],
       [ 1.11,  2.22,  3.33,  4.44,  5.55]])

In [112]:
b.T

array([[ 1.5 , 11.1 ,  1.11],
       [ 2.5 , 22.5 ,  2.22],
       [ 3.6 , 33.6 ,  3.33],
       [ 4.4 , 44.4 ,  4.44],
       [ 5.5 , 55.5 ,  5.55]])

On two-dimensioanl array (matrix), transpose of the array results in expected result.  

It is important to understand that a 1-d array is not a row vector or column vector. 
Both, row vector and column vectors are 2-d arrays. A row vector has shape (1, n), whereas a column vector has shape (m, 1).

#### Adding an axis

A 1-d array of shape (n,) can be transformed into a 2-d array of shape (1,n) or (n,1) by introducing a new axis as shown below.

In [22]:
a

array([12, 35, 32, 56, 33])

In [113]:
arow = a[np.newaxis, :]
arow

array([[10, 25, 35, 44, 55]])

In [114]:
acolumn = a[:,np.newaxis]
acolumn

array([[10],
       [25],
       [35],
       [44],
       [55]])

In [115]:
arow.T

array([[10],
       [25],
       [35],
       [44],
       [55]])

In [116]:
acolumn.T

array([[10, 25, 35, 44, 55]])

#### `transpose` function/ method

Transpose can also be computed using `transpose` function of numpy or equivalently `transpose` method of ndarray.


<u>Note</u>: *Transpose is a view of the original array, irrespective of how it is obtained*


In [117]:
b

array([[ 1.5 ,  2.5 ,  3.6 ,  4.4 ,  5.5 ],
       [11.1 , 22.5 , 33.6 , 44.4 , 55.5 ],
       [ 1.11,  2.22,  3.33,  4.44,  5.55]])

In [118]:
np.transpose(b)

array([[ 1.5 , 11.1 ,  1.11],
       [ 2.5 , 22.5 ,  2.22],
       [ 3.6 , 33.6 ,  3.33],
       [ 4.4 , 44.4 ,  4.44],
       [ 5.5 , 55.5 ,  5.55]])

In [119]:
b.transpose()     # Same result as above

array([[ 1.5 , 11.1 ,  1.11],
       [ 2.5 , 22.5 ,  2.22],
       [ 3.6 , 33.6 ,  3.33],
       [ 4.4 , 44.4 ,  4.44],
       [ 5.5 , 55.5 ,  5.55]])

The transpose function is more general as it also accepts an axes parameter, a permutation of axes labels.
Thus, if the shape of x is (m, n, p), and we compute y as  
`	y = x.transpose (1, 0, 2)`  
and z as  
`	z = x.transpose (1, 2, 0)`  
then the shapes of y and z are (n, m, p) and (n, p, m).

[HW: See the methods swapaxes, ravel]


In [120]:
c = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
c      # Shape is (2, 2, 3)

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [121]:
c.transpose()    # shape is (3, 2, 2)

array([[[ 1,  7],
        [ 4, 10]],

       [[ 2,  8],
        [ 5, 11]],

       [[ 3,  9],
        [ 6, 12]]])

In [122]:
c.transpose(2, 1, 0)    # Same result as above

array([[[ 1,  7],
        [ 4, 10]],

       [[ 2,  8],
        [ 5, 11]],

       [[ 3,  9],
        [ 6, 12]]])

In [123]:
c.transpose(0, 2, 1)   # Shape is (2, 3, 2)

array([[[ 1,  4],
        [ 2,  5],
        [ 3,  6]],

       [[ 7, 10],
        [ 8, 11],
        [ 9, 12]]])

### Reshaping

Shape of an array can be changed using the `reshape` method as shown below.

In [124]:
x = np.arange(1, 7)
x

array([1, 2, 3, 4, 5, 6])

In [125]:
y = x.reshape(2, 3)
y

array([[1, 2, 3],
       [4, 5, 6]])

In [126]:
z = x.reshape(3, 2)
z

array([[1, 2],
       [3, 4],
       [5, 6]])

It should be noted that reshape method creates a view of the original array.

===== Additional examples discussed in class ======

In [127]:
b

array([[ 1.5 ,  2.5 ,  3.6 ,  4.4 ,  5.5 ],
       [11.1 , 22.5 , 33.6 , 44.4 , 55.5 ],
       [ 1.11,  2.22,  3.33,  4.44,  5.55]])

In [128]:
id1 = np.array([[0], [1]])
id2 = np.array([[0, 2, 4]])

In [129]:
id1

array([[0],
       [1]])

In [130]:
id2

array([[0, 2, 4]])

In [131]:
b[id1, id2]

array([[ 1.5,  3.6,  5.5],
       [11.1, 33.6, 55.5]])

In [132]:
b[np.ix_([0, 1], [0, 2, 4])]

array([[ 1.5,  3.6,  5.5],
       [11.1, 33.6, 55.5]])

In [133]:
a

array([10, 25, 35, 44, 55])

In [134]:
a[np.newaxis, :].shape

(1, 5)

In [135]:
a[:,np.newaxis].shape

(5, 1)

In [136]:
a[np.newaxis,:,np.newaxis].shape

(1, 5, 1)

## ndarray methods

ndarray support several methods to perform operations on ndarray.

In [1]:
x = np.array([5, 10, 15, 20])
a = np.array([[2, 4, 6], [20, 40, 60], [200, 400, 600]])

NameError: name 'np' is not defined

In [None]:
x.tolist()

In [None]:
a.tolist()

In [None]:
x.tofile("e:\\xdata.out",",")
a.tofile("e:\\adata.out",",")

In [None]:
x.resize((2, 2))

In [None]:
x.reshape((4,))

In [None]:
x

In [None]:
x.transpose()

In [None]:
b = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
              [[10, 20, 30], [40, 50, 60], [70, 80, 90]]
             ])

In [None]:
b

In [None]:
b.transpose((0, 1, 2))

In [None]:
b.transpose((0, 2, 1))

In [None]:
b.transpose(1, 2, 0)

In [None]:
y = np.array([5, 2, 9, 6, 11])
y.sort()
y

In [None]:
sid = y.argsort()
print(y[sid[0]])
print(y[sid[1]])

## Statistics Functions


|      Category   |        Functions                                          |
|-----------------|-----------------------------------------------------------|
|Order Statistics | `amin, amax, ptp, percentile`                             |
|Summary          | `mean, median, std, var, average`                         |
|Correlating      | `corrcoef, correlate, cov`                                |
|Histogram        | `histogram, histogram2d, histogramdd, bincount, digitize` |
|Sums, products   | `sum, prod, cumsum, cumprod, diff`                        |
|Min, Max         | `min, max, argmin, argmax`                                |


In [1]:
x = np.random.normal(10, 4, 50)        # Random sample from N(10, 16)
np.mean(x)

11.008510189787973

In [2]:
np.std(x)

3.7684307790090013

Many of these functions are also available as ndarray methods.

In [3]:
print("Mean :", x.mean(), "\nVariance ", x.var())

Mean : 11.008510189787973 
Variance  14.201070536182387


## Matrix operations

A two-dimensional numpy array can be treated as a matrix. Numpy also provides a matrix class, which shall be discussed later.
Addition, subtraction, and multiplication by scaler can be performed using usual arithmetic operator on the two-dimensional array.

### Addition, Subtraction

Vectorized arithmatic operators of numpy readily provide matrix addition and subtraction for matrices of same shape

A + B  
A - B


### Scalar multiplication

Vectorized multiplication operator coupled with broadcasting mechsnism readily provide scalar multiplication 

c * A   # Here c is a scalar

### Matrix multiplication

Matrix multiplication can be performed using a matrix multiplication operator `@`

In [4]:
A = np.array([[2, 3, 1],
              [3, 5, 2]])
B = np.array([[2, 3],
              [1, 2],
              [3, 2]])
A

array([[2, 3, 1],
       [3, 5, 2]])

In [5]:
B

array([[2, 3],
       [1, 2],
       [3, 2]])

In [6]:
C = A @ B
C

array([[10, 14],
       [17, 23]])

Alternatively, matrix multiplication can also be performed using `dot` function of numpy or `dot` method of ndarray.

In [7]:
np.dot(A, B)    # Using "dot" function

array([[10, 14],
       [17, 23]])

In [8]:
A.dot(B)        # Using "dot" method

array([[10, 14],
       [17, 23]])

In [31]:
B.dot(A)

array([[13, 21,  8],
       [ 8, 13,  5],
       [12, 19,  7]])

### Linear Algebra package

Many matrix operations are available in the linear algebra sub package(np.linalg) of numpy. 

#### Matrix inverse

In [9]:
D = np.linalg.inv(C)
D

array([[-2.875,  1.75 ],
       [ 2.125, -1.25 ]])

In [10]:
C @ D  # Varify inverse

array([[1., 0.],
       [0., 1.]])

In [32]:
D @ C

array([[1., 0.],
       [0., 1.]])

#### QR Decomposition

In [11]:
Q, R = np.linalg.qr(A) 
Q

array([[-0.5547002 , -0.83205029],
       [-0.83205029,  0.5547002 ]])

In [12]:
R

array([[-3.60555128, -5.82435206, -2.21880078],
       [ 0.        ,  0.2773501 ,  0.2773501 ]])

In [13]:
Q @ R  # Verify decomosition

array([[2., 3., 1.],
       [3., 5., 2.]])

In [33]:
Q1, R1 = np.linalg.qr(B)

In [34]:
Q1

array([[-0.53452248, -0.57735027],
       [-0.26726124, -0.57735027],
       [-0.80178373,  0.57735027]])

In [35]:
R1

array([[-3.74165739, -3.74165739],
       [ 0.        , -1.73205081]])

In [36]:
Q1 @ R1

array([[2., 3.],
       [1., 2.],
       [3., 2.]])

In [37]:
Q @ Q.T

array([[1., 0.],
       [0., 1.]])

In [38]:
Q

array([[-0.5547002 , -0.83205029],
       [-0.83205029,  0.5547002 ]])

In [39]:
np.linalg.inv(Q)

array([[-0.5547002 , -0.83205029],
       [-0.83205029,  0.5547002 ]])

 **Home Work**: Find other functions in linalg submodule.

## Stacking Arrays

Arrays can be stacked together using the following commands  

### Vertical stacking

In [14]:
A

array([[2, 3, 1],
       [3, 5, 2]])

In [15]:
A1 = np.array([[4, 6, 3]])
A1

array([[4, 6, 3]])

In [16]:
np.vstack((A, A1))   # Stacking vertically

array([[2, 3, 1],
       [3, 5, 2],
       [4, 6, 3]])

### Horizontal stacking

In [17]:
A2 = B[:2]
A2

array([[2, 3],
       [1, 2]])

In [18]:
A         # For ready reference

array([[2, 3, 1],
       [3, 5, 2]])

In [19]:
np.hstack ((A, A2))   # Horizontal stacking

array([[2, 3, 1, 2, 3],
       [3, 5, 2, 1, 2]])

### Block stacking

In [20]:
A11 = A
A12 = A2
A21 = A1
A22 = np.array([[5, 5]])
AA = np.block([[A11, A12],[A21, A22]])# Assemble array from given blocks
AA

array([[2, 3, 1, 2, 3],
       [3, 5, 2, 1, 2],
       [4, 6, 3, 5, 5]])

In [40]:
np.hstack((A11, A12))

array([[2, 3, 1, 2, 3],
       [3, 5, 2, 1, 2]])

In [41]:
np.block([[A11, A12]])

array([[2, 3, 1, 2, 3],
       [3, 5, 2, 1, 2]])

In [42]:
np.vstack((A11, A21))

array([[2, 3, 1],
       [3, 5, 2],
       [4, 6, 3]])

In [43]:
np.block([[A11],[A21]])

array([[2, 3, 1],
       [3, 5, 2],
       [4, 6, 3]])

**Home Work**: Also see the methods: `column_stack`, `r_`, `c_`

## Splitting arrays

Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur.  
 
For example,  

`np.hsplit(a,3) 	 # Split a into 3`  
`np.hsplit(a,(3,5))  # Split a at the third and the fifth  column`  

Similarly, `vsplit` splits along the vertical axis, and `array_split` allows one to specify along which axis to split.


In [21]:
np.hsplit(AA, (2, 4))

[array([[2, 3],
        [3, 5],
        [4, 6]]),
 array([[1, 2],
        [2, 1],
        [3, 5]]),
 array([[3],
        [2],
        [5]])]

In [22]:
AA1, AA2, AA3 = np.hsplit(AA, (2, 4))   # Split at column 2 and 4
AA1

array([[2, 3],
       [3, 5],
       [4, 6]])

In [23]:
AA2

array([[1, 2],
       [2, 1],
       [3, 5]])

In [24]:
AA3

array([[3],
       [2],
       [5]])

In [25]:
AA[:,:4]    # First 4 columns of AA

array([[2, 3, 1, 2],
       [3, 5, 2, 1],
       [4, 6, 3, 5]])

In [26]:
np.hsplit (AA[:,:4], 2)     # Split into 2 matrices with equal number of columns

[array([[2, 3],
        [3, 5],
        [4, 6]]),
 array([[1, 2],
        [2, 1],
        [3, 5]])]

## Statistical Functions with additional arguments

In [27]:
salary = (np.random.randint(15, 25, 30)*1000).reshape(30,1)
savings = salary*(np.random.random(30).reshape(30,1)*0.25)

In [28]:
Data = np.hstack((salary, savings))
Data

array([[20000.        ,  4491.04632311],
       [16000.        ,  3094.85460374],
       [23000.        ,  2939.12801951],
       [21000.        ,  1139.02194462],
       [24000.        ,  4490.63165463],
       [21000.        ,  4844.30019353],
       [20000.        ,  2536.77834987],
       [20000.        ,  2713.98093402],
       [16000.        ,  2833.0472951 ],
       [20000.        ,  1385.43700623],
       [23000.        ,  1106.63450782],
       [18000.        ,  1211.18115818],
       [18000.        ,   209.19664362],
       [16000.        ,  1758.05022181],
       [20000.        ,  1269.83097388],
       [23000.        ,  2730.54501054],
       [23000.        ,   240.79625304],
       [20000.        ,  1849.94956801],
       [17000.        ,  2186.33046025],
       [18000.        ,  1749.358672  ],
       [18000.        ,  3667.69744794],
       [23000.        ,  4117.64547467],
       [16000.        ,  3146.78289941],
       [18000.        ,  4082.62794466],
       [23000.  

In [29]:
Data.mean(axis=0)     # Compute Column means

array([19933.33333333,  2452.2324728 ])

In [45]:
Data.mean()

11192.78290306579

Thus,  
Average salary is Rs. 19466.667, and  
Average Savings is Rs. 2720.977

In [30]:
np.corrcoef(Data.T)       # "corrcoef" function assumes variables in rows, and obs in columns

array([[ 1.        , -0.00883879],
       [-0.00883879,  1.        ]])

Thus, correlation coefficient between Salary an Savings is -0.009

## `random` sub-package of Numpy 

We have already seen use of some functions of the random sub-package. The `random` sub-package (module) of Numpy provides functions for generating random numbers or selecting random samples.

In [56]:
from numpy import random

In [57]:
xx = np.array([1, 2, 3, 4, 5])
xx

array([1, 2, 3, 4, 5])

In [58]:
random.permutation(xx)

array([2, 5, 4, 1, 3])

In [59]:
random.permutation(xx)

array([3, 2, 5, 1, 4])

In [60]:
xx

array([1, 2, 3, 4, 5])

In [61]:
random.shuffle(xx)

In [62]:
xx

array([3, 1, 2, 4, 5])

In [67]:
xx = random.normal(50, 5, 50)
xx

array([58.15314575, 48.32587842, 44.390103  , 49.82568976, 54.20323716,
       48.0241806 , 53.14382408, 51.29784845, 55.1944124 , 50.20172664,
       53.62749983, 50.63595826, 55.6913439 , 50.33951889, 46.09041785,
       58.31725139, 45.41354408, 54.19142933, 42.97148298, 48.51856085,
       44.68839008, 52.93974257, 37.97771329, 40.44625952, 43.78687722,
       53.84363922, 51.14767644, 64.3302838 , 50.78819936, 49.92817966,
       56.11654004, 54.8362217 , 47.60311709, 46.24341209, 56.17633553,
       36.85629006, 59.11530802, 54.47661439, 45.49047269, 57.81211755,
       48.92625201, 44.81793883, 51.05783528, 51.33150197, 50.73374063,
       57.26778515, 51.76231104, 58.22376104, 46.68656913, 41.65517366])

In [68]:
xx.mean()

50.51246625341814

In [69]:
random.permutation(xx).mean()

50.51246625341814

In [70]:
random.permutation(xx).mean()

50.51246625341814

In [71]:
random.random()

0.8951855124779879

# Random Sampling

(*Random number generation as discussed in the previous class is a legacy approach for generating random numbers. The approach discussed in this notebook is a new and recommended approach that provides enhancements in the quality of generated random numbers*)  

Random sampling is one of the very important topics in Statistical investigations. 

The `random` submodule of Numpy provides the functions for generating random numbers, including random numbers from various probability distributions.

It is important to understand that the random numbers generated by any computer software are essentially ***pseudo random numbers***. A sequence of pseudo random numbers is a deterministic sequence (generated by an algorithm), which possess *almost all* properties of a sequence of ***random numbers*** as verifiable by statistical tests.

## Generator 

Objects of the `Generator` class of the `random` submodule provide methods for generating random numbers. While creating an object of the Generator object, we can specify the ***seed*** for the random number generator.

### Default random number generator

The `random` submodule provides a built-in function `default_rng` for creating `Generator` objects. Using `default_rng` is the most common way of creating a Generator object.

In [2]:
import numpy as np
from numpy.random import default_rng
rng = default_rng()     # rng is a Generator object.

As stated earlier, a `Generator` object provides various methods for generating random numbers.

#### `integers` method

The `integers` method generates random integers in the specified interval

In [6]:
rng.integers(1, 10)   # generate a random integer r, with  1 <= r < 10

4

In [9]:
rng.integers(1, 10, endpoint = True)    # generate a random integer r, with  1 <= r <= 10

3

In [10]:
u = rng.integers(1, 10, 30, endpoint = True)    # generate a 1-D array of 30 random integers
u

array([ 5,  7,  7, 10,  4,  5,  5,  4,  6, 10,  9, 10,  1,  4,  6,  8,  7,
       10,  2,  8,  6,  1, 10,  5,  3,  8, 10,  4,  7,  1], dtype=int64)

In [11]:
A = rng.integers(1, 10, (2, 3), endpoint = True)  # generate a 2-D array of shape (2, 3)
A

array([[ 5,  7,  8],
       [ 5, 10,  3]], dtype=int64)

#### `random` method

The `random` method works similar to the `integers` function except that it generate random float numbers in the interval [0, 1).

In [12]:
y = rng.random(10)
y

array([0.58807273, 0.52697743, 0.43204898, 0.61227568, 0.59185823,
       0.09167381, 0.14192107, 0.85813229, 0.71857076, 0.6714568 ])

In [3]:
a = 5
b = 10
a + (b-a)*rng.random(10)

array([6.70147936, 5.48592775, 9.88335525, 8.99321791, 5.96675094,
       6.37871916, 5.3618783 , 7.2408383 , 6.20268161, 6.89313098])

#### `normal` method

The `normal` method generates random numbers from Normal distribution.  

The following command generates 100 random numbers from $N(10.5, 0.7^2)$

In [14]:
x = rng.normal(10.5, 0.7, 100)
x[:10]

array([ 9.95719088, 10.62572136, 10.94329505, 10.12357387, 11.13679273,
       10.44507749, 11.31349497, 10.31250222,  9.43982975, 10.7604429 ])

In [42]:
print('Sample Mean =%7.3f \n'
      'Sample Variance =%7.3f'%(x.mean(), x.var()))

Sample Mean = 10.477 
Sample Variance =  0.544


**Home work :**   
Explore functions to generate random numbers from other probability distributions.  
Visit https://numpy.org/doc/stable/reference/random/generator.html#numpy.random.Generator for more information.

### Random sampling

#### `choice` method

A `Generator` object provides `choice` method for generating a random sample from a population contained in a 1-D array.

In [5]:
popln = np.array(["Club", "Spade", "Heart", "Diamond"])
rng.choice(popln, 10)   #Generate a with replacement sample of size 10

array(['Heart', 'Club', 'Heart', 'Heart', 'Heart', 'Heart', 'Club',
       'Spade', 'Spade', 'Spade'], dtype='<U7')

In [8]:
# Generate a standard deck of cards
cards = []
for suit in ['H', 'D', 'C', 'S']:
    for val in list(range(2,11))+['A','J','Q','K']:
        cards.append(suit + str(val))
aHand = rng.choice(cards, 5, replace = False)  # Generate a 5-card Hand at random without replacement
aHand

array(['C8', 'H6', 'S9', 'C4', 'C7'], dtype='<U3')

In [12]:
data = rng.integers(1, 5, 100, endpoint = True)
values, freq = np.unique(data, return_counts = True)

In [13]:
values

array([1, 2, 3, 4, 5], dtype=int64)

In [14]:
freq

array([23, 16, 19, 18, 24], dtype=int64)

In [15]:
data2 = rng.normal(0, 1, 100)
cuts = [-3, -2, -1, 0, 1, 2, 3]
data2[:10]

array([-0.00277067,  1.25447517,  0.68666973,  0.39006753,  1.33680151,
       -0.4457524 , -0.55680709, -1.70604164,  0.59194046, -0.79586129])

In [16]:
np.digitize(data2, cuts)[:10]

array([3, 5, 4, 4, 5, 3, 3, 2, 4, 3], dtype=int64)

In [17]:
freq = np.bincount(np.digitize(data2, cuts))

In [18]:
freq

array([ 0,  2,  9, 33, 34, 19,  3], dtype=int64)

In [20]:
np.unique(np.digitize(data2, cuts), return_counts = True)

(array([1, 2, 3, 4, 5, 6], dtype=int64),
 array([ 2,  9, 33, 34, 19,  3], dtype=int64))

In [1]:
np.bincount?

Object `np.bincount` not found.
