<a href="https://colab.research.google.com/github/vvthakral/Python-for-AI-Data-Science/blob/master/Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center>Numpy Basics</center>

It is the core library for scientific computing in python and provides a high-performance multidimensional array object, and tools for working with these arrays.



What we cover in this notebook?

Introduction to numpy arrays: 
1.   Creation
2.   Shape /Size
3.   Access/Slice 
4.   Reverse
5.   Masking

Represent missing values and infinite.

Compute mean, min, max on the ndarray?

Reshaping and flattening Multidimensional arrays

Creating sequences, repetitions, and random numbers


In [24]:
import numpy as np

# Create a numpy array from list
list_a = [1,2,3]
a = np.array(list_a)   
print(type(a))

#Observe the class of the array printed            

#Shape of the array
print(a.shape)            

#Accessing the array elements, very similar to lists
print(a[0], a[1], a[2])

#updating the array element
a[0] = 0                
print(a)          

#np array has ranks, the one below is 2D array
b = np.array([[1,2,3],[4,5,6]])    
print(b.shape)                     
print(b[0, 0], b[0, 1])

<class 'numpy.ndarray'>
(3,)
1 2 3
[0 2 3]
(2, 3)
1 2


In [25]:
#Vector 1 dimensional
list_a = [1,2,3]
a = np.array(list_a)
a.shape

(3,)

The key difference between an array and a list is, arrays are designed to handle vectorized operations while a python list is not.

Numpy array behaves different to the python lists, any function performed on numpy array applies to every single item in the array, instead of the whole array object.












In [26]:
list_ = [1, 2]
try:
  print(list_ + 2)
except:
  print('We can\'t do such things in python. Can we?')

#We can do such updation in Numpy
print('Numpy array a is: ',a)
a += 2
print('Updated value of a is: ',a)

We can't do such things in python. Can we?
Numpy array a is:  [1 2 3]
Updated value of a is:  [3 4 5]


In [27]:
f = 201
print(f)
def somefunc():
  global f
  f = 100
print('Before',f)
print('changing the value')
somefunc()
print('after',f)

201
Before 201
changing the value
after 100


We can't increase the size of numpy array like we do for python list. If we do need to add something we need to reassign the value(arary) to the variable.

We can specify the datatype of the elements in the numpy arrray by setting the dtype argument while initiating the array. 

We can use dtypes like: 'float', 'int', 'bool', 'str' and 'object'.

In [28]:
list1 = [1, 2, 3, 4]
custom_a = np.array(list1)
print(custom_a)

# Create a boolean array
arr_b = np.array([1, 0, 10], dtype='bool')
print('The boolean array is: ',arr_b)

# Convert to 'int' datatype
print('\nThe integer dtype')
print(custom_a.astype('int'))

custom_a = custom_a.astype('int')
# Convert to int then to str datatype
custom_a.astype('int').astype('str')

[1 2 3 4]
The boolean array is:  [ True False  True]

The integer dtype
[1 2 3 4]


array(['1', '2', '3', '4'], dtype='<U21')

In [29]:
# Convert an array to a list
a.tolist()

[3, 4, 5]

Recap!

1.Unlike the lists, numpy array can support vectorised operations.

2.We can't update array size after creation.

3.All items in an array are of the same dtype, this along with size condition are reason behind numpy array being faster than the python list.

In [30]:
list2 = [[1, 2, 3, 4],[3, 4, 5, 6], [5, 6, 7, 8]]
arr2 = np.array(list2, dtype='float')
arr2
print('Shape: ', arr2.shape)

# dtype
print('Datatype: ', arr2.dtype)

# size
print('Size: ', arr2.size)

# ndim
print('Num Dimensions: ', arr2.ndim)


Shape:  (3, 4)
Datatype:  float64
Size:  12
Num Dimensions:  2


In [31]:
arr2[:2, :2]
list2[:2][:2]  # error

[[1, 2, 3, 4], [3, 4, 5, 6]]

In [32]:
#Masking
b = arr2 > 4
b

array([[False, False, False, False],
       [False, False,  True,  True],
       [ True,  True,  True,  True]])

In [33]:
arr2[b]

array([5., 6., 5., 6., 7., 8.])

Missing Values

In [34]:
arr2[1,1] = np.nan  # not a number
arr2[1,2] = np.inf  # infinite
arr2

array([[ 1.,  2.,  3.,  4.],
       [ 3., nan, inf,  6.],
       [ 5.,  6.,  7.,  8.]])

In [35]:
missing_bool = np.isnan(arr2) | np.isinf(arr2)
missing_bool

array([[False, False, False, False],
       [False,  True,  True, False],
       [False, False, False, False]])

In [36]:
arr2[missing_bool] = -1  
arr2

array([[ 1.,  2.,  3.,  4.],
       [ 3., -1., -1.,  6.],
       [ 5.,  6.,  7.,  8.]])

How to compute mean, min, max on the ndarray?

In [37]:
print("Mean value is: ", arr2.mean())
print("Max value is: ", arr2.max())
print("Min value is: ", arr2.min())

Mean value is:  3.5833333333333335
Max value is:  8.0
Min value is:  -1.0


In [38]:
#Row wise 
print("Column wise minimum: ", np.amin(arr2))
print("Row wise minimum: ", np.amin(arr2, axis=1))

Column wise minimum:  -1.0
Row wise minimum:  [ 1. -1.  5.]


Copying array

In [39]:
# Assign portion of arr2 to arr2a. Doesn't really create a new array.
arr2a = arr2[:2,:2]  
arr2a[:1, :1] = 100  # 100 will reflect in arr2
print(arr2a)
print('\n',arr2[:2,:2])

[[100.   2.]
 [  3.  -1.]]

 [[100.   2.]
 [  3.  -1.]]


Strange right?

In [40]:
#The right method!
arr2b = arr2[:2, :2].copy()
arr2b[:1, :1] = 101  # 101 will not reflect in arr2
print(arr2b)
arr2

[[101.   2.]
 [  3.  -1.]]


array([[100.,   2.,   3.,   4.],
       [  3.,  -1.,  -1.,   6.],
       [  5.,   6.,   7.,   8.]])

Reshaping 

In [41]:
print(arr2)
print('\n')
arr2.reshape(4, 3)

[[100.   2.   3.   4.]
 [  3.  -1.  -1.   6.]
 [  5.   6.   7.   8.]]




array([[100.,   2.,   3.],
       [  4.,   3.,  -1.],
       [ -1.,   6.,   5.],
       [  6.,   7.,   8.]])

How to create sequences, repetitions and random numbers using numpy?

In [42]:
# Lower limit is 0 be default
print(np.arange(5))  

# 0 to 9
print(np.arange(0, 10))  

# 0 to 9 with step of 2
print(np.arange(0, 10, 2))  

# 10 to 1, decreasing order
print(np.arange(10, 0, -1))

[0 1 2 3 4]
[0 1 2 3 4 5 6 7 8 9]
[0 2 4 6 8]
[10  9  8  7  6  5  4  3  2  1]


In [43]:
np.linspace(start=1, stop=50, num=10, dtype=int)

array([ 1,  6, 11, 17, 22, 28, 33, 39, 44, 50])

In [44]:
print(np.zeros([2,3]))
print(' ')
print(np.ones([2,2]))

[[0. 0. 0.]
 [0. 0. 0.]]
 
[[1. 1.]
 [1. 1.]]


In [45]:
# Random numbers between [0,1) of shape 2,2
print(np.random.rand(2,2))
print('\n')
# Normal distribution with mean=0 and variance=1 of shape 2,2
print(np.random.randn(2,2))

# Random integers between [0, 10) of shape 2,2
print(np.random.randint(0, 10, size=[2,2]))

# One random number between [0,1)
print(np.random.random())

# Random numbers between [0,1) of shape 2,2
print(np.random.random(size=[2,2]))

# Pick 10 items from a given list, with equal probability
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10))  

# Pick 10 items from a given list with a predefined probability 'p'
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10, p=[0.3, .1, 0.1, 0.4, 0.1]))  # picks more o's

[[0.51485894 0.79186099]
 [0.53738844 0.62897699]]


[[-0.38192398 -0.54315642]
 [ 0.0672156  -0.80718178]]
[[9 7]
 [7 8]]
0.02715317239891557
[[0.33554238 0.08674587]
 [0.84617247 0.9503866 ]]
['e' 'i' 'e' 'o' 'u' 'u' 'o' 'u' 'i' 'e']
['a' 'a' 'o' 'a' 'o' 'o' 'e' 'u' 'o' 'i']


In [46]:
a = (np.random.choice(['a','b','c','i'], size=10000))
a[a=='i']

array(['i', 'i', 'i', ..., 'i', 'i', 'i'], dtype='<U1')

In [47]:
import numpy as np
options  = ['i', 'o', 'u']
option_prob = [0.5, 0.1, 0.4]
selected = (np.random.choice(options, size=1000, p=option_prob))  # picks more o's
total_size = selected.size
total_size

1000

In [48]:
for i in options:
  count_i = selected[selected==i].size
  print('{} occurs {} times in the custom made array and the ratio of occurence is {}'.format(i,count_i,count_i/total_size))

i occurs 505 times in the custom made array and the ratio of occurence is 0.505
o occurs 113 times in the custom made array and the ratio of occurence is 0.113
u occurs 382 times in the custom made array and the ratio of occurence is 0.382


In [49]:

#Python way of counting 
print('\nPython way of counting\n')
for i in options:
  count = 0
  for j in selected:
    if j==i:
      count+=1
  print('count of {} is {}'.format(i,count))  


Python way of counting

count of i is 505
count of o is 113
count of u is 382


In [50]:
# Create an array

arr_rand = np.array([8, 8, 3, 7, 7, 0, 4, 2, 5, 2])
print("Array: ", arr_rand)

# Positions where value > 5
index_gt5 = np.where(arr_rand >5)
print("Positions where value > 5: ", index_gt5)

Array:  [8 8 3 7 7 0 4 2 5 2]
Positions where value > 5:  (array([0, 1, 3, 4]),)
