# <center>**Numpy Introduction**</center>
---

<br>

## **Numpy**
#### Numpy is short for Numerical Python. It is the fundamental package required for high performance scientific computing and data analysis. 
#### NumPy is a general-purpose array-processing package. It provides a high-performance multidimensional array object, and tools for working with these arrays.


Some uses :

- `ndarray`, a fast and space-efficient multidimensional array for large data
- providing vectorized arithmetic operations and sophisticated broadcasting capabilities.
- Standard mathematical functions for fast operations on entire arrays of data without having to write loops.
- Tools for reading/writing array data to disk and working with memory-mapped files.
- Linear algebra, random generation, and Fourier transform capabilities.


## Importing package

In [13]:
import numpy as np

### The NumPy ndarray : A Multidimensional object

A numpy array is a grid of values, all of the `same type`, and is indexed by a tuple of nonnegative integers. 
Every object has 
- a shape = Shape is a tuple giving size of each dimension
- and a dtype = data type. 

Array() tries to infer a good type, if not given explicitly.

### Creating 1-D array

In [27]:
a1 = np.array([1,2,3,4,5])
print(a1)

[1 2 3 4 5]


(5,)

### Creating 2-D array

In [3]:
a2 = np.array([[1,2,3],[4,7,9]])
print(a2)

[[1 2 3]
 [4 7 9]]


### Creating a 2-D array filled with zeroes

In [4]:
a3 = np.zeros((10,5))
print(a3)
print(a3.shape)
print(a3.ndim)

[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]
(10, 5)
2


In [15]:
a4 = np.zeros((2,2,3,3))
print(a4)
print(a4.ndim)

[[[[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]

  [[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]]


 [[[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]

  [[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]]]
4


### Generating numbers between a range with a specific difference

In [5]:
np.arange(10,47,5)

array([10, 15, 20, 25, 30, 35, 40, 45])

### Generating **n** numbers between a range

In [6]:
np.linspace(0,2,9)

array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ,  1.25,  1.5 ,  1.75,  2.  ])

### Generating random sample of given dimentions
**Note - Random numbers are generated between 0 to 1. Multiply them with an appropriate factor to get required outcome.**

In [11]:
np.random.random_sample((3, 2))

array([[0.28321905, 0.63638261],
       [0.40964014, 0.14289847],
       [0.5483918 , 0.0762229 ]])

### Inspecting dimentions for the array

In [12]:
a3.shape

(10, 5)

### Basic array operations
**Note - To attain the functionality similar to in R numpy uses the concept called broadcasting**


### Broadcasting
Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes 
 when performing arithmetic operations. Frequently we have a smaller array and a larger array, 
and we want to use the smaller array multiple times to perform some operation on the larger array

In [53]:
a = np.arange(3)
print(a)
b = 1
print(b)
print(a+b)

[0 1 2]
1
[1 2 3]


In [54]:
a = np.arange(6).reshape(2,3)
print(a)
b=np.array([0,1,2])
print(b)
print(a+b)

[[0 1 2]
 [3 4 5]]
[0 1 2]
[[0 2 4]
 [3 5 7]]


In [55]:
b=np.array([0,1]).reshape(2,1)
print(b)
print(a+b)

[[0]
 [1]]
[[0 1 2]
 [4 5 6]]


In [56]:
b=2
print(b)
print(a+b)

2
[[2 3 4]
 [5 6 7]]


In [57]:
a = np.array([1,2]).reshape(2,1)
b = np.array([4,5,6])
print(a)
print(b)
print(a+b)

[[1]
 [2]]
[4 5 6]
[[5 6 7]
 [6 7 8]]


In [16]:
a = np.array([[1,2,3],[4,5,6]])
b = np.array([1,2,3,4])
print(a+b)

ValueError: operands could not be broadcast together with shapes (2,3) (4,) 

In [29]:
x = np.array([2,4,8,16])
y = np.array([1,1,0,1])

In [30]:
print(x + y)
print(x - y)
print(x * y)

[ 3  5  8 17]
[ 1  3  8 15]
[ 2  4  0 16]


In [40]:
z = np.array([(10,20,30,40),[100,200,300,400]])

print(z + y)
print(z - y)
print(z * y)

[[ 11  21  30  41]
 [101 201 300 401]]
[[  9  19  30  39]
 [ 99 199 300 399]]
[[ 10  20   0  40]
 [100 200   0 400]]


In [5]:
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1,0,1])
y = x + v  # Add v to each row of x using broadcasting
print(y) 

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


### Matrix multiplication

In [12]:
s = np.array([1,2])
t = np.array([[10,20,30,40],[100,200,300,400]])
print(s)
print(t)
print(np.matmul(s,t))

[1 2]
[[ 10  20  30  40]
 [100 200 300 400]]
[210 420 630 840]


### Advanced array operations

In [6]:
a4 = np.random.random_sample(10)
print(a4)
print(np.min(a4))
print(np.max(a4))
print(np.sum(a4))
print(np.cumsum(a4))
print(np.sqrt(a4))
print(np.log(a4))



[0.71933087 0.75118965 0.91572881 0.47081865 0.97909579 0.68919244
 0.45911813 0.45073426 0.61224489 0.44654843]
0.4465484338686575
0.979095786737866
6.4940019247865015
[0.71933087 1.47052053 2.38624934 2.85706799 3.83616377 4.52535621
 4.98447434 5.4352086  6.04745349 6.49400192]
[0.84813376 0.86671198 0.9569372  0.68616226 0.98949269 0.83017615
 0.67758256 0.67136746 0.78246079 0.6682428 ]
[-0.32943384 -0.28609712 -0.08803502 -0.75328229 -0.0211258  -0.37223474
 -0.77844774 -0.79687733 -0.49062293 -0.80620741]


### Logical operations

In [9]:
a=np.array([1,7,9])
b=np.array([[1,2,9],[5,8,7]])

print(a==b)
print(b>4)

[[ True False  True]
 [False False False]]
[[False False  True]
 [ True  True  True]]


### Subset and slicing

In [86]:
# given the array c, get
# get  second row 
# top right 2 X 2 matrix
# get  third column 

c=np.array([[4,5,10],[5,10,15],[7,8,3],[4,6,9],[10,15,20]])
#print(c)
print(c[:,:])
print(c.shape)
print(c[0:2,1:3])
print(c[:,2])
print(c[[0,1,2,3,4],[2,2,2,2,2]])
print(c[:,:])

[[ 4  5 10]
 [ 5 10 15]
 [ 7  8  3]
 [ 4  6  9]
 [10 15 20]]
(5, 3)
[[ 5 10]
 [10 15]]
[10 15  3  9 20]
[10 15  3  9 20]
[[ 4  5 10]
 [ 5 10 15]
 [ 7  8  3]
 [ 4  6  9]
 [10 15 20]]


In [85]:
a = np.array([1, 2, 3, 4, 5])
b = a[1:4]
print(b)
b[0] = 200
print(b)
print(a[1])
print(a)

[2 3 4]
[200   3   4]
200
[  1 200   3   4   5]


### Advantage of Numpy Arrays

In [8]:
import time
start_time = time.time()
np.sum(np.arange(10000))
print("--- %s seconds ---" % (time.time() - start_time))

--- 0.0034635066986083984 seconds ---


In [62]:
start_time = time.time()
total = 0
for i in np.arange(10000):
     total = i + total
total
print("--- %s seconds ---" % (time.time() - start_time))

--- 0.0023949146270751953 seconds ---


In [10]:
size_of_vec = 100000

In [11]:
#Add two big vectors
t1 = time.time()
X = range(size_of_vec)
Y = range(size_of_vec)
Z_list = []
for i in range(len(X)):
    Z_list.append(X[i] + Y[i])
time.time() - t1
print(Z_list[:50])

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98]


In [24]:
t1 = time.time()
X = np.arange(size_of_vec)
Y = np.arange(size_of_vec)
Z_numpy = X + Y
float(time.time() - t1)

0.004049062728881836

In [58]:
Z_list == Z_numpy

array([ True,  True,  True, ...,  True,  True,  True])

##### <font color='red'> Student Activity Question 6 [5 mins]</font>
    * iris is a popular dataset often used in R and Python for learning data manipulation and visualization
    * It has about 150 different measuremets of iris flower. 
    * Attributes which are measured are sepal length, sepal width, petal length and petal width
    * you can load that dataset from popular ML library scikit learn using the code below (Copy or type the code in python cell block)

```python
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data

```
**Questions**
1. print the type of data
2. print shape of this array X
2. Noting that first 2 columns are sepal data and last 2 columns are petal data, slice a sub-array whith only petal data and measuremets (rows) from 100 to 140. Call this submatrix as Xsub and print it explicitly in the cell block
4. print the column-wise maximum and minimum of the the submatrix

In [25]:
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
print(X.shape)
Xsub=X[100:141,2:4]
print(Xsub.shape)

print(np.max(X,axis=0))
print(np.min(X,axis=0))

(150, 4)
(41, 2)
[7.9 4.4 6.9 2.5]
[4.3 2.  1.  0.1]


#Important Links
#Explains the numpy array shape
https://stackoverflow.com/questions/22053050/difference-between-numpy-array-shape-r-1-and-r
#Vectorization
https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/VectorizedOperations.html
#Broadcasting
https://machinelearningmastery.com/broadcasting-with-numpy-arrays/