# From 1 to 100 NumPy Exercises for Data Analysis (Python)

This is the first serie of numpy exercises collected from different sources.
>**_The goal:_** practicing, learning and also teaching using this set of exercises.

If you find an error or a better solution, please feel free to open an issue or a pull request :D

_Sources: [here](sources.json)_

** 1.- Import numpy as np and see the version **

In [2]:
import numpy as np
np.__version__

'1.14.0'

** 2.- How to create a 1D array? **
   - Create a 1D array of numbers from 0 to 9

In [5]:
a = np.arange(0,10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

** 3.- How to create a boolean array? **
  - Create a 3×3 numpy array of all True’s

In [16]:
a = np.ones((3,3), dtype=bool)
a

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

** 4.- How to extract items that satisfy a given condition from 1D array? **
   - Extract all odd numbers from arr

In [13]:
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
arr[arr%2 != 0]



array([1, 3, 5, 7, 9])

** 5.- How to replace items that satisfy a condition with another value in numpy array? **
  - Replace all odd numbers in arr with -1

In [10]:
arr[arr%2 != 0] = -1
arr

array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])

** 6.- How to replace items that satisfy a condition without affecting the original array? **
  - Replace all odd numbers in arr with -1 without changing arr

In [18]:
arrModified = np.copy(arr)
arrModified[arrModified%2 != 0] = -1
# alternatively use
# arrModified = np.where(arr % 2 == 1,  -1, arr)
print(arr)
print(arrModified)

[0 1 2 3 4 5 6 7 8 9]
[ 0 -1  2 -1  4 -1  6 -1  8 -1]


** 7.- How to reshape an array? **
  - Convert a 1D array to a 2D array with 2 rows

In [25]:
arr = np.arange(10)
arr.reshape(2,5)
# alternative use
# arr.reshape(2, -1)  # Setting to -1 automatically decides the number of cols

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

** 8.- How to stack two arrays vertically? **
  - Stack arrays a and b vertically

In [38]:
a = np.arange(10).reshape(2,-1)
b = np.repeat(1, 10).reshape(2,-1)
np.concatenate((a, b), axis = 0)
# Method 2:
#np.vstack([a, b])
# Method 3:
#np.r_[a, b]

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

** 9.- How to stack two arrays horizontally? **
   - Stack the arrays a and b horizontally.

In [39]:
a = np.arange(10).reshape(2,-1)
b = np.repeat(1, 10).reshape(2,-1)
np.concatenate((a, b), axis = 1)
# Method 2:
#np.hstack([a, b])

# Method 3:
#np.c_[a, b]

array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
       [5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])

** 10.- How to generate custom sequences in numpy without hardcoding? **
  - Create the following pattern without hardcoding. Use only numpy functions and the below input array a.

In [56]:
a = np.array([1,2,3])
b = np.repeat(a, [3,3,3])
np.hstack([b, np.tile(a, 2)])

# alternative use
#np.r_[np.repeat(a, 3), np.tile(a, 3)]


array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3])

** 11.- How to get the common items between two python numpy arrays? **
   - Get the common items between a and b

In [66]:
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])

np.intersect1d(a,b)


array([2, 4])

** 12.- How to remove from one array those items that exist in another? **
  - From array a remove all items present in array b

In [78]:
a = np.array([1,2,3,4,5])
b = np.array([5,6,7,8,9])
print(a,b)
a[np.isin(a,b, invert=True)]

# Alternative use
# np.setdiff1d(a,b)

[1 2 3 4 5] [5 6 7 8 9]


array([1, 2, 3, 4])

** 13.- How to get the positions where elements of two arrays match? **
   - Get the positions where elements of a and b match

In [81]:
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])

np.where(a == b)

(array([1, 3, 5, 7]),)

** 14.- How to extract all numbers between a given range from a numpy array? **
   - Get all items between 5 and 10 from a.

In [98]:
a = np.array([2, 6, 1, 9, 10, 3, 27])
a
a[(a>=5) & (a<=10)]

# Alternative use
# np.where((a>=5) & (a<=10))

array([ 6,  9, 10])

** 15.- How to make a python function that handles scalars to work on numpy arrays? **
  - Convert the function maxx that works on two scalars, to work on two arrays.

In [110]:
def maxx(x, y):
    """Get the maximum of two items"""
    if x >= y:
        return x
    else:
        return y

maxx(1, 5)

a = np.array([5, 7, 9, 8, 6, 4, 5])
b = np.array([6, 3, 4, 8, 9, 7, 1])

pairMax = np.vectorize(maxx)

pairMax(a,b)
    

array([6, 7, 9, 8, 9, 7, 5])

** 16.- How to swap two columns in a 2d numpy array? **
  - Swap columns 1 and 2 in the array arr.

In [146]:
arr = np.arange(9).reshape(3,3)
print(arr)
arr[:, [1,0,2]]

[[0 1 2]
 [3 4 5]
 [6 7 8]]


array([[1, 0, 2],
       [4, 3, 5],
       [7, 6, 8]])

** 17.- How to swap two rows in a 2d numpy array? **
  - Swap rows 1 and 2 in the array arr:

In [147]:
arr = np.arange(9).reshape(3,3)
print(arr)
arr[[1,0,2],: ]

[[0 1 2]
 [3 4 5]
 [6 7 8]]


array([[3, 4, 5],
       [0, 1, 2],
       [6, 7, 8]])

** 19.- How to reverse the columns of a 2D array? **
  - Reverse the columns of a 2D array arr.

In [154]:
arr = np.arange(9).reshape(3,-1)
print(arr)

arr[:,::-1]


[[0 1 2]
 [3 4 5]
 [6 7 8]]


array([[2, 1, 0],
       [5, 4, 3],
       [8, 7, 6]])

** 20.- How to create a 2D array containing random floats between 5 and 10? **
   - Create a 2D array of shape 5x3 to contain random decimal numbers between 5 and 10.

In [169]:
x = np.random.uniform(5,10, size=(5,3))
x

array([[8.64128082, 7.90304249, 5.69307691],
       [7.1679364 , 9.37886172, 7.51012144],
       [9.23000027, 5.86419845, 8.79863706],
       [5.62736394, 9.94383594, 6.01858064],
       [8.44725249, 8.75168736, 7.32301138]])

** 21.- How to print only 3 decimal places in python numpy array? **
   - Print or show only 3 decimal places of the numpy array rand_arr.

In [25]:
rand_arr = np.random.random((5,3))
print(type(rand_arr))

# Limit to 3 decimal places
np.set_printoptions(precision=3)
rand_arr

<class 'numpy.ndarray'>


array([[0.649, 0.838, 0.912],
       [0.238, 0.545, 0.018],
       [0.404, 0.157, 0.725],
       [0.81 , 0.034, 0.524],
       [0.645, 0.613, 0.508]])

** 22.- How to pretty print a numpy array by suppressing the scientific notation (like 1e10)? **
   - Pretty print rand_arr by suppressing the scientific notation (like 1e10)

In [29]:
np.random.seed(100)
rand_arr = np.random.random([3,3])/1e3
rand_arr

np.set_printoptions(precision=6, suppress=True)
rand_arr

array([[0.000543, 0.000278, 0.000425],
       [0.000845, 0.000005, 0.000122],
       [0.000671, 0.000826, 0.000137]])

** 23.- How to limit the number of items printed in output of numpy array? **
   - Limit the number of items printed in python numpy array a to a maximum of 6 elements.

In [35]:
a = np.arange(15)
a

np.set_printoptions(suppress=False, threshold=6)
a


array([ 0,  1,  2, ..., 12, 13, 14])

** 24.- How to print the full numpy array without truncating **
   - Print the full numpy array a without truncating.

In [39]:
np.set_printoptions(threshold=6)
a = np.arange(15)
a
np.set_printoptions(threshold=1000) # this is the default value
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

** 25.- How to import a dataset with numbers and texts keeping the text intact in python numpy? **
   - Import the iris dataset keeping the text intact.

In [46]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object') # if we do not specify dtype as object, a tuple is returned instead
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

iris[:10] # first 10

array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa'],
       [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa'],
       [b'5.0', b'3.6', b'1.4', b'0.2', b'Iris-setosa'],
       [b'5.4', b'3.9', b'1.7', b'0.4', b'Iris-setosa'],
       [b'4.6', b'3.4', b'1.4', b'0.3', b'Iris-setosa'],
       [b'5.0', b'3.4', b'1.5', b'0.2', b'Iris-setosa'],
       [b'4.4', b'2.9', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.1', b'1.5', b'0.1', b'Iris-setosa']], dtype=object)

** 26. How to extract a particular column from 1D array of tuples? **
   - Extract the text column species from the 1D iris imported in previous question.

In [50]:
iris[:10:,4] # first 10

array([b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa',
       b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa',
       b'Iris-setosa', b'Iris-setosa'], dtype=object)

** 27.- How to convert a 1d array of tuples to a 2d numpy array? **
   - Convert the 1D iris to 2D array iris_2d by omitting the species text field.

In [75]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None, encoding=None, usecols=[0,1,2,3])
iris_1d[:10] # first 10

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1]])

** 28.- How to compute the mean, median, standard deviation of a numpy array? **
   - Find the mean, median, standard deviation of iris's sepallength (1st column)

In [100]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype=float)
mean = np.mean(iris[:,0])
median = np.median(iris[:,0])
std = np.std(iris[:,0])
print("The mean is: ", mean)
print("The median is: ", median)
print("The std is: ", std)

The mean is:  5.843333333333334
The median is:  5.8
The std is:  0.8253012917851409


** 29.- How to normalize an array so the values range exactly between 0 and 1? **
   - Create a normalized form of iris's sepallength whose values range exactly between 0 and 1 so that the minimum has value 0 and maximum has value 1.

In [113]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

sepalMax, sepalMin = sepallength.max(), sepallength.min()
print(Sepalmax, sepalMin)
sepalForm = (sepallength - sepalMin)/(sepalMax - sepalMin)
# Alternatively use
#sepalForm = (sepallength - sepalMin)/sepallength.ptp()
# help(sepallength.ptp)
print(sepalForm)

7.9 4.3
[0.222222 0.166667 0.111111 0.083333 0.194444 0.305556 0.083333 0.194444
 0.027778 0.166667 0.305556 0.138889 0.138889 0.       0.416667 0.388889
 0.305556 0.222222 0.388889 0.222222 0.305556 0.222222 0.083333 0.222222
 0.138889 0.194444 0.194444 0.25     0.25     0.111111 0.138889 0.305556
 0.25     0.333333 0.166667 0.194444 0.333333 0.166667 0.027778 0.222222
 0.194444 0.055556 0.027778 0.194444 0.222222 0.138889 0.222222 0.083333
 0.277778 0.194444 0.75     0.583333 0.722222 0.333333 0.611111 0.388889
 0.555556 0.166667 0.638889 0.25     0.194444 0.444444 0.472222 0.5
 0.361111 0.666667 0.361111 0.416667 0.527778 0.361111 0.444444 0.5
 0.555556 0.5      0.583333 0.638889 0.694444 0.666667 0.472222 0.388889
 0.333333 0.333333 0.416667 0.472222 0.305556 0.472222 0.666667 0.555556
 0.361111 0.333333 0.333333 0.5      0.416667 0.194444 0.361111 0.388889
 0.388889 0.527778 0.222222 0.388889 0.555556 0.416667 0.777778 0.555556
 0.611111 0.916667 0.166667 0.833333 0.666667 0.80555

** 30.- How to compute the softmax score? **
   - Compute the softmax score of sepallength.

In [115]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

softmax = lambda x : np.exp(x)/np.sum(np.exp(x))
softmax(sepallength)

array([0.00222 , 0.001817, 0.001488, 0.001346, 0.002008, 0.002996,
       0.001346, 0.002008, 0.001102, 0.001817, 0.002996, 0.001644,
       0.001644, 0.000997, 0.00447 , 0.004044, 0.002996, 0.00222 ,
       0.004044, 0.00222 , 0.002996, 0.00222 , 0.001346, 0.00222 ,
       0.001644, 0.002008, 0.002008, 0.002453, 0.002453, 0.001488,
       0.001644, 0.002996, 0.002453, 0.003311, 0.001817, 0.002008,
       0.003311, 0.001817, 0.001102, 0.00222 , 0.002008, 0.001218,
       0.001102, 0.002008, 0.00222 , 0.001644, 0.00222 , 0.001346,
       0.002711, 0.002008, 0.01484 , 0.008144, 0.013428, 0.003311,
       0.009001, 0.004044, 0.007369, 0.001817, 0.009947, 0.002453,
       0.002008, 0.00494 , 0.005459, 0.006033, 0.003659, 0.010994,
       0.003659, 0.00447 , 0.006668, 0.003659, 0.00494 , 0.006033,
       0.007369, 0.006033, 0.008144, 0.009947, 0.01215 , 0.010994,
       0.005459, 0.004044, 0.003311, 0.003311, 0.00447 , 0.005459,
       0.002996, 0.005459, 0.010994, 0.007369, 0.003659, 0.003

** 31.- How to find the percentile scores of a numpy array? **
   - Find the 5th and 95th percentile of iris's sepallength

In [118]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

np.percentile(sepallength, [5, 95])
# help(np.percentile)

array([4.6  , 7.255])

** 32.- How to insert values at random positions in an array? **
   - Insert np.nan values at 20 random positions in iris_2d dataset

In [133]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')
print(iris_2d.shape)

iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan
iris_2d[:10]


(150, 5)


array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa'],
       [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa'],
       [b'5.0', b'3.6', b'1.4', b'0.2', b'Iris-setosa'],
       [b'5.4', b'3.9', b'1.7', b'0.4', b'Iris-setosa'],
       [b'4.6', b'3.4', b'1.4', b'0.3', b'Iris-setosa'],
       [b'5.0', b'3.4', b'1.5', b'0.2', b'Iris-setosa'],
       [b'4.4', b'2.9', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.1', b'1.5', b'0.1', b'Iris-setosa']], dtype=object)

** 33.- How to find the position of missing values in numpy array? **
   - Find the number and position of missing values in iris_2d's sepallength (1st column)

In [148]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float')
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

print("Number of missing values: ", np.isnan(iris_2d[::,0]).sum())
print("Positions of missing values: ", np.where(np.isnan(iris_2d[::,0])))
# np.info(np.where)

Number of missing values:  4
Positions of missing values:  (array([  4, 111, 119, 149]),)


** 34.- How to filter a numpy array based on two or more conditions? **
   - Filter the rows of iris_2d that has petallength (3rd column) > 1.5 and sepallength (1st column) < 5.0

In [156]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
#iris_2d[:10] # exploring

iris_2d[np.where((iris_2d[:,2] > 1.5) & (iris_2d[:,0] < 5.0))]

array([[4.8, 3.4, 1.6, 0.2],
       [4.8, 3.4, 1.9, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [4.9, 2.4, 3.3, 1. ],
       [4.9, 2.5, 4.5, 1.7]])

** 35.- How to drop rows that contain a missing value from a numpy array? **
   - Select the rows of iris_2d that does not have any nan value.

In [167]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan
iris_2d[:10] # exploring

iris_2d[np.sum(np.isnan(iris_2d), axis = 1) == 0][:5]

array([[4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4]])

** 36.- How to find the correlation between two columns of a numpy array? **
  - Find the correlation between SepalLength(1st column) and PetalLength(3rd column) in iris_2d

In [176]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

np.corrcoef(iris_2d[:,0], iris_2d[:,2]) #[0,1]

array([[1.      , 0.871754],
       [0.871754, 1.      ]])

** 37.- How to find if a given array has any null values? **
   - Find out if iris_2d has any missing values.

In [177]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

np.isnan(iris_2d).any()

False

** 38.- How to replace all missing values with 0 in a numpy array? **
   - Replace all ocurrences of nan with 0 in numpy array

In [194]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[np.random.randint(iris_2d.shape[0], size=20), np.random.randint(iris_2d.shape[1], size=20)] = np.nan
iris_2d[:20] # exploring

iris_2d[np.isnan(iris_2d)] = 0
iris_2d[:10]


array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [0. , 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1]])

** 39.- How to find the count of unique values in a numpy array? **
   - Find the unique values and the count of unique values in iris's species

In [208]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

species = np.array([row.tolist()[4] for row in iris])
print(species.shape)

# Get the unique values and the counts
np.unique(species, return_counts=True)

(150,)


(array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'],
       dtype='|S15'), array([50, 50, 50]))

** 40.-  How to convert a numeric to a categorical (text) array? **
   - Bin the petal length (3rd) column of iris_2d to form a text array, such that if petal length is:
    
    * Less than 3 --> 'small'
    * 3-5 --> 'medium'
    * '>=5 --> 'large'

In [214]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Bin petallength 
petalLengthBin = np.digitize(iris[:, 2].astype('float'), [0, 3, 5, 10])
# help(np.digitize)

# Map it to respective category
labelMap = {1: 'small', 2: 'medium', 3: 'large', 4: np.nan}
petalLengthCat = [labelMap[x] for x in petalLengthBin]

# View
petalLengthCat[:4]

['small', 'small', 'small', 'small']

** 41.- How to create a new column from existing columns of a numpy array? **
   - Create a new column for volume in iris_2d, where volume is (pi x petallength x sepal_length^2)/3

In [4]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

iris_2d[:10]

array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa'],
       [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa'],
       [b'5.0', b'3.6', b'1.4', b'0.2', b'Iris-setosa'],
       [b'5.4', b'3.9', b'1.7', b'0.4', b'Iris-setosa'],
       [b'4.6', b'3.4', b'1.4', b'0.3', b'Iris-setosa'],
       [b'5.0', b'3.4', b'1.5', b'0.2', b'Iris-setosa'],
       [b'4.4', b'2.9', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.1', b'1.5', b'0.1', b'Iris-setosa']], dtype=object)

In [69]:
piXPetal = np.dot(np.pi, iris_2d[:,2].astype('float'))
volume = (piXPetal * np.square(iris_2d[:,0].astype('float'))) / 3

volume = volume[:, np.newaxis]
output = np.concatenate((iris_2d, volume), axis=1)
output[:10]

# Alternatively use
#output = np.hstack([iris_2d, volume])


array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa',
        38.13265162927291],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa',
        35.200498485922445],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa', 30.0723720777127],
       [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa',
        33.238050274980004],
       [b'5.0', b'3.6', b'1.4', b'0.2', b'Iris-setosa',
        36.65191429188092],
       [b'5.4', b'3.9', b'1.7', b'0.4', b'Iris-setosa',
        51.911677007917746],
       [b'4.6', b'3.4', b'1.4', b'0.3', b'Iris-setosa',
        31.022180256648003],
       [b'5.0', b'3.4', b'1.5', b'0.2', b'Iris-setosa',
        39.269908169872416],
       [b'4.4', b'2.9', b'1.4', b'0.2', b'Iris-setosa',
        28.38324242763259],
       [b'4.9', b'3.1', b'1.5', b'0.1', b'Iris-setosa',
        37.714819806345474]], dtype=object)

** 42.- How to do probabilistic sampling in numpy? **
   - Randomly sample iris's species such that setose is twice the number of versicolor and virginica

In [74]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

species = iris[:, 4]

np.random.seed(100)

# getting uniform samples
probs = np.r_[np.linspace(0, 0.500, num=50), 
              np.linspace(0.501, .750, num=50), 
              np.linspace(.751, 1.0, num=50)]

# getting the indexes
index = np.searchsorted(probs, np.random.random(150))
speciesOutput = species[index]
print(np.unique(speciesOutput, return_counts=True))

(array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'],
      dtype=object), array([77, 37, 36]))


** 43.- How to get the second largest value of an array when grouped by another array? **
   - What is the value of second longest petallength of species setosa

In [81]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Getting all petallength rows that belong to setosa
petalLengthSetosa = iris[iris[:, 4] == b'Iris-setosa', [2]].astype('float')
print(" Petal Length - Setosa rows, ", petalLengthSetosa)
# Getting the second last value
np.unique(np.sort(petalLengthSetosa))[-2]


 Petal Length - Setosa rows,  [1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 1.5 1.3 1.4
 1.7 1.5 1.7 1.5 1.  1.7 1.9 1.6 1.6 1.5 1.4 1.6 1.6 1.5 1.5 1.4 1.5 1.2
 1.3 1.5 1.3 1.5 1.3 1.3 1.3 1.6 1.9 1.4 1.6 1.4 1.5 1.4]


1.7

** 44.- How to sort a 2D array by a column **
   - Sort the iris dataset based on sepallength column.

In [92]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

iris[np.argsort(iris[:,0])][:10] # argsort returns indices

array([[b'4.3', b'3.0', b'1.1', b'0.1', b'Iris-setosa'],
       [b'4.4', b'3.2', b'1.3', b'0.2', b'Iris-setosa'],
       [b'4.4', b'3.0', b'1.3', b'0.2', b'Iris-setosa'],
       [b'4.4', b'2.9', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.5', b'2.3', b'1.3', b'0.3', b'Iris-setosa'],
       [b'4.6', b'3.6', b'1.0', b'0.2', b'Iris-setosa'],
       [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa'],
       [b'4.6', b'3.4', b'1.4', b'0.3', b'Iris-setosa'],
       [b'4.6', b'3.2', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa']], dtype=object)

** 45.- How to find the most frequent value in a numpy array? **
   - Find the most frequent value of petal length (3rd column) in iris dataset.

In [216]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

values, counts = np.unique(iris[:, 2], return_counts=True)
values[np.argmax(counts)]

b'1.5'

** 46.- How to find the position of the first occurrence of a value greater than a given value? **
   - Find the position of the first occurrence of a value greater than 1.0 in petalwidth 4th column of iris dataset.

In [247]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='float')
iris[:10] # exploring

print("Position: ", np.argwhere(iris[:, 3].astype(float) > 1.0)[0])
# help(np.argwhere)

Position:  [50]


** 47.-  How to replace all values greater than a given value to a given cutoff? **
   - From the array a, replace all values greater than 30 to 30 and less than 10 to 10.

In [262]:
np.random.seed(100)
a = np.random.uniform(1,50, 20)

# this modifies a
a[np.argwhere(a>30)] = 30
a[np.argwhere(a<10)] = 10
a

# alternatively use (these does not modifies a)
# np.clip(a, a_min=10, a_max=30) # help(np.clip)
# a
# or
# print(np.where(a < 10, 10, np.where(a > 30, 30, a)))
# a

array([27.62684215, 14.64009987, 21.80136195, 30.        , 10.        ,
       10.        , 30.        , 30.        , 10.        , 29.17957314,
       30.        , 11.25090398, 10.08108276, 10.        , 11.76517714,
       30.        , 30.        , 10.        , 30.        , 14.42961361])

** 48.- How to get the positions of top n values from a numpy array? **
   - Get the positions of top 5 maximum values in a given array a.

In [280]:
np.random.seed(100)
a = np.random.uniform(1,50, 20)

np.argsort(a,)[::-1][:5]

array([15, 10,  3,  7, 18])

** 49.-  How to compute the row wise counts of all possible values in an array? **
   - Compute the counts of unique values row-wise.

In [299]:
np.random.seed(100)
arr = np.random.randint(1,11,size=(6, 10))
print("Input data: \n ", arr)

def counts_of_all_values_rowwise(arr2d):
    # Unique values and its counts row wise
    num_counts_array = [np.unique(row, return_counts=True) for row in arr2d]
    # Counts of all values row wise
    return([[int(b[a==i]) if i in a else 0 for i in np.unique(arr2d)] for a, b in num_counts_array])            

print(np.arange(1,11))
counts_of_all_values_rowwise(arr)

Raw data: 
  [[ 9  9  4  8  8  1  5  3  6  3]
 [ 3  3  2  1  9  5  1 10  7  3]
 [ 5  2  6  4  5  5  4  8  2  2]
 [ 8  8  1  3 10 10  4  3  6  9]
 [ 2  1  8  7  3  1  9  3  6  2]
 [ 9  2  6  5  3  9  4  6  1 10]]
[ 1  2  3  4  5  6  7  8  9 10]


[[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
 [2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
 [0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
 [1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
 [2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
 [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]]

** 50.- How to convert an array of arrays into a flat 1d array? **
   - Convert array_of_arrays into a flat linear 1d array.

In [309]:
arr1 = np.arange(3)
arr2 = np.arange(3,7)
arr3 = np.arange(7,10)

array_of_arrays = np.array([arr1, arr2, arr3])
print("Input data: \n", array_of_arrays)

np.concatenate(array_of_arrays)
    

Raw data: 
 [array([0, 1, 2]) array([3, 4, 5, 6]) array([7, 8, 9])]


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

** 51.- How to generate one-hot encodings for an array in numpy? **
   - Compute the one-hot encodings (dummy binary variables for each unique value in the array)

In [341]:
np.random.seed(101) 
arr = np.random.randint(1,4, size=6)
print("Input data: \n", arr)

maxValue = np.max(arr)
encoding = np.zeros((len(arr),maxValue))

# encoding
j=0
for i in arr:
    if(i!=0):
        encoding[j,i-1] = 1
    j+=1

encoding

# alternativately use
#(arr[:, None] == np.unique(arr)).view(np.int8)

Input data: 
 [2 3 2 2 2 1]


array([[0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.]])

** 52.- How to create row numbers grouped by a categorical variable? **
   - Create row numbers grouped by a categorical variable. Use the following sample from iris species as input.

In [363]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
np.random.seed(100)
speciesSample = np.sort(np.random.choice(species, size=20))
print("Input data: \n", speciesSample)

sampleOutput = []
j = 0
for i in range(len(speciesSample)):
    if i != 0:
        if speciesSample[i] == speciesSample[i-1]:
            sampleOutput.append(j)
        else:
            j=0
            sampleOutput.append(j) 
    else:
        sampleOutput.append(j)
    j+=1

sampleOutput

# Alternatively use
# for val in np.unique(speciesSample):
#        for i, grp in enumerate(speciesSample[speciesSample==val]):
#             sampleOutput.append(i)

# sampleOutput

Input data: 
 ['Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-versicolor' 'Iris-virginica' 'Iris-virginica' 'Iris-virginica'
 'Iris-virginica' 'Iris-virginica' 'Iris-virginica']


[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5]

** 53.- How to create groud ids based on a given categorical variable? **
   - Create group ids based on a given categorical variable. Use the following sample from iris species as input.

In [369]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
np.random.seed(100)
speciesSample = np.sort(np.random.choice(species, size=20))
print("Input data: \n", speciesSample)

sampleOutput = []
j = 0
for i in range(len(speciesSample)):
    if i != 0:
        if speciesSample[i] == speciesSample[i-1]:
            sampleOutput.append(j)
        else:
            j+=1
            sampleOutput.append(j) 
    else:
        sampleOutput.append(j)

sampleOutput

# Alternatively use
# sampleOutput = [np.argwhere(np.unique(speciesSample) == s).tolist()[0][0] \
#           for val in np.unique(speciesSample) for s in speciesSample[speciesSample==val]]
# sampleOutput

Input data: 
 ['Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-versicolor' 'Iris-virginica' 'Iris-virginica' 'Iris-virginica'
 'Iris-virginica' 'Iris-virginica' 'Iris-virginica']


[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]

** 54.- How to rank items in an array using numpy? **
   - Create the ranks for the given numeric array a.

In [380]:
np.random.seed(10)
a = np.random.randint(20, size=10)
print("Input data: \n", a)

print(a.argsort().argsort())

Input data: 
 [ 9  4 15  0 17 16 17  8  9  0]
[4 2 6 0 8 7 9 3 5 1]


** 55.- How to rank items in a multidimensional array using numpy? **
   - Create a rank array of the same shape as a given numeric array a.


In [386]:
np.random.seed(10)
a = np.random.randint(20, size=[2,5])
print("Input data: \n", a)

print(a.ravel().argsort().argsort().reshape(a.shape))
# help(np.ravel)

Input data: 
 [[ 9  4 15  0 17]
 [16 17  8  9  0]]
[[4 2 6 0 8]
 [7 9 3 5 1]]


** 56.- How to find the maximum value in each row of a numpy array 2d? **
   - Compute the maximum for each row in the given array.



In [397]:
np.random.seed(100)
a = np.random.randint(1,10, [5,3])
print("Input data: \n", a)

np.apply_along_axis(np.max, arr=a, axis=1)

Input data: 
 [[9 9 4]
 [8 8 1]
 [5 3 6]
 [3 3 3]
 [2 1 9]]


array([9, 8, 6, 3, 9])

** 57.- How to compute the min-by-max for each row for a numpy array 2d? **
   - Compute the min-by-max for each row for given 2d numpy array.

In [399]:
np.random.seed(100)
a = np.random.randint(1,10, [5,3])
print("Input data: \n", a)

np.apply_along_axis(lambda x: np.min(x)/np.max(x), arr = a, axis=1)

Input data: 
 [[9 9 4]
 [8 8 1]
 [5 3 6]
 [3 3 3]
 [2 1 9]]


array([0.44444444, 0.125     , 0.5       , 1.        , 0.11111111])

** 58. How to find the duplicate records in a numpy array? **
   - Find the duplicate entries (2nd occurrence onwards) in the given numpy array and mark them as True. First time occurrences should be False.



In [408]:
np.random.seed(100)
a = np.random.randint(0, 5, 10)
print('Input data: \n', a)

output = np.full(a.shape[0], True) # an array full of True values
uniquePositions = np.unique(a, return_index=True)[1]
output[uniquePositions] = False # make the unique values False
output

Input data: 
 [0 0 3 0 2 4 2 2 2 2]


array([False,  True, False,  True, False, False,  True,  True,  True,
        True])

** 59.- How to find the grouped mean in numpy? **
   - Find the mean of a numeric column grouped by a categorical column in a 2D numpy array

In [424]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
iris[:10]

species = iris[:,4]
column = iris[:,0].astype("float") #sepallength
output = []
for group_val in np.unique(species):
    output.append([group_val, column[species==group_val].mean()])

output

[[b'Iris-setosa', 5.006],
 [b'Iris-versicolor', 5.936],
 [b'Iris-virginica', 6.587999999999998]]

** 60. How to convert a PIL image to numpy array? **
   - Import the image from the following URL and convert it to a numpy array.

URL = 'https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg'


In [431]:
from io import BytesIO
from PIL import Image
import PIL, requests

# Import image from URL
URL = 'https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg'
response = requests.get(URL)

# Read it as Image
I = Image.open(BytesIO(response.content))

# Optionally resize
I = I.resize([150,150])

# Convert to numpy array
arr = np.asarray(I)

# Optionaly Convert it back to an image and show
im = PIL.Image.fromarray(np.uint8(arr))
Image.Image.show(im)

** 61.-  How to drop all missing values from a numpy array? **
   - Drop all nan values from a 1D numpy array

In [447]:
a = np.array([1,2,3,np.nan,5,6,7,np.nan])
print("Input data: \n",a)

mask = np.where(np.isnan(a))[0]
np.delete(a, mask)

# alternatively use
# a[~np.isnan(a)]

Input data: 
 [ 1.  2.  3. nan  5.  6.  7. nan]


array([1., 2., 3., 5., 6., 7.])

** 62. How to compute the euclidean distance between two arrays? **
   - Compute the euclidean distance between two arrays a and b.

In [448]:
a = np.array([1,2,3,4,5])
b = np.array([4,5,6,7,8])

dist = np.linalg.norm(a-b)
dist

** 63. How to find all the local maxima (or peaks) in a 1d array? **
   - Find all the peaks in a 1D numpy array a. Peaks are points surrounded by smaller values on both sides.


In [481]:
a = np.array([1, 3, 7, 1, 2, 6, 0, 1,1,1,5,2])
print("Input data: \n", a)

output = []
for i in range(len(a)):
    if (i!=0) & (i<len(a)-1):
        if (a[i]> a[i-1]) & (a[i]> a[i+1]):
            output.append(i)
output

# Alternatively use -- this is best
# doublediff = np.diff(np.sign(np.diff(a)))
# doublediff
# peakLocations = np.where(doublediff == -2)[0] + 1
# peakLocations

Input data: 
 [1 3 7 1 2 6 0 1 1 1 5 2]


[2, 5, 10]

** 64. How to subtract a 1d array from a 2d array, where each item of 1d array subtracts from respective row? **
   - Subtract the 1d array b_1d from the 2d array a_2d, such that each item of b_1d subtracts from respective row of a_2d.

In [491]:
a_2d = np.array([[3,3,3],[4,4,4],[5,5,5]])
b_1d = np.array([1,2,3])
print("Array a: \n", a_2d)
print("Array b: \n", b_1d)

print("Result: \n", a_2d - b_1d[:,None])

Array a: 
 [[3 3 3]
 [4 4 4]
 [5 5 5]]
Array b: 
 [1 2 3]
Result: 
 [[2 2 2]
 [2 2 2]
 [2 2 2]]


** 65.- How to find the index of n'th repetition of an item in an array **
   - Find the index of 5th repetition of number 1 in x.

In [497]:
x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])
print("Input data: \n", x)
np.argwhere(x==1)[4]

Input data: 
 [1 2 1 1 3 4 3 1 1 2 1 1 2]


array([8])

** 66.- How to convert numpy's datetime64 object to datetime's datetime object? **
   - Convert numpy's datetime64 object to datetime's datetime object

In [505]:
import datetime
dt64 = np.datetime64('2018-02-25 22:10:10')
print("Input data: \n", dt64)

dt64.astype(datetime.datetime)

Input data: 
 2018-02-25T22:10:10


datetime.datetime(2018, 2, 25, 22, 10, 10)

** 67.- How to compute the moving average of a numpy array? **
   - Compute the moving average of window size 3, for the given 1D array.

In [509]:
np.random.seed(100)
Z = np.random.randint(10, size=10)
print("Input data: \n", Z)

np.convolve(Z, np.ones(3)/3, mode='valid')
# help(np.convolve)

Input data: 
 [8 8 3 7 7 0 4 2 5 2]


array([6.33333333, 6.        , 5.66666667, 4.66666667, 3.66666667,
       2.        , 3.66666667, 3.        ])

** 68.- How to create a numpy array sequence given only the starting point, length and the step? **
   - Create a numpy array of length 10, starting from 5 and has a step of 3 between consecutive numbers

In [516]:
stop = 3 * 10 + 5
np.arange(5,stop,3)

array([ 5,  8, 11, 14, 17, 20, 23, 26, 29, 32])

** 69. How to fill in missing dates in an irregular series of numpy dates? **
   - Given an array of a non-continuous sequence of dates. Make it a continuous sequence of dates, by filling in the missing dates.



In [541]:
dates = np.arange(np.datetime64('2018-02-01'), np.datetime64('2018-02-25'), 2)
print("Input data: \n", dates)

output = []
for date, diffDays in zip(dates, np.diff(dates)):
    output.append(np.arange(date, (date+diffDays)))

completeDates = np.array(output).reshape(-1)
# add the last day
output = np.hstack([completeDates, dates[-1]])
print("Complete sequence of dates: \n",completeDates)

Input data: 
 ['2018-02-01' '2018-02-03' '2018-02-05' '2018-02-07' '2018-02-09'
 '2018-02-11' '2018-02-13' '2018-02-15' '2018-02-17' '2018-02-19'
 '2018-02-21' '2018-02-23']
Complete sequence of dates: 
 ['2018-02-01' '2018-02-02' '2018-02-03' '2018-02-04' '2018-02-05'
 '2018-02-06' '2018-02-07' '2018-02-08' '2018-02-09' '2018-02-10'
 '2018-02-11' '2018-02-12' '2018-02-13' '2018-02-14' '2018-02-15'
 '2018-02-16' '2018-02-17' '2018-02-18' '2018-02-19' '2018-02-20'
 '2018-02-21' '2018-02-22']


** 70.- How to create strides from a given 1D array? **
   - From the given 1d array arr, generate a 2d matrix using strides, with a window length of 4 and strides of 2, like [[0,1,2,3], [2,3,4,5], [4,5,6,7]..]

In [585]:
arr = np.arange(15) 
print("Input data: \n", arr)

def gen_strides(a, stride_len=2, window_len=5):
    n_strides = ((a.size-window_len)//stride_len) + 1
    return np.array([a[s:(s+window_len)] for s in np.arange(0, n_strides*stride_len, stride_len)])

print(gen_strides(np.arange(15), stride_len=2, window_len=4))

Input data: 
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
[[ 0  1  2  3]
 [ 2  3  4  5]
 [ 4  5  6  7]
 [ 6  7  8  9]
 [ 8  9 10 11]
 [10 11 12 13]]


** 71.- Create a null vector of size 10 **

In [588]:
nVector = np.zeros(10)
nVector

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

** 72.- How to find the memory size of any array **

In [601]:
arr = np.arange(20)
print("Input data: \n", arr)

print("%d bytes" % (arr.size * arr.itemsize))

Input data: 
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
160 bytes


** 73.- How to get the documentation of the numpy add function from the command line? **

In [606]:
$ python -c "import numpy; numpy.info(numpy.add)"

** 74.- Create a null vector of size 10 but the fifth value which is 1 **

In [607]:
arr = np.zeros(10)
arr[4]=1
arr

array([0., 0., 0., 0., 1., 0., 0., 0., 0., 0.])

** 75.- Create a vector with values ranging from 10 to 49 **

In [610]:
v = np.arange(10,50)
v

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
       27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
       44, 45, 46, 47, 48, 49])

** 76.- Reverse a vector (first element becomes last) **

In [617]:
v = np.arange(1,11)
v[::-1]

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1])

** 77.- Create a 3x3 matrix with values ranging from 0 to 8 **

In [621]:
m = np.arange(0,9).reshape(3,3)
m

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

** 78.- Find indices of non-zero elements from [1,2,0,0,4,0] **

In [624]:
v = [1,2,0,0,4,0]
np.nonzero(v)[0]

array([0, 1, 4])

** 79.- Create a 3x3 identity matrix **

In [626]:
im = np.eye(3,3)
im

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

** 80.- Create a 3x3x3 array with random values **

In [632]:
np.random.seed(100)
np.random.random((3,3))

array([[0.54340494, 0.27836939, 0.42451759],
       [0.84477613, 0.00471886, 0.12156912],
       [0.67074908, 0.82585276, 0.13670659]])

** 81.- Create a 10x10 array with random values and find the minimum and maximum values **

In [641]:
np.random.seed(100)
arr = np.random.random((10,10))
print("The min value is: ", np.min(arr))
print("The max value is: ", np.max(arr))

The min value is:  0.004718856190972565
The max value is:  0.9921580365105283


** 82.- Create a random vector of size 30 and find the mean value **

In [644]:
np.random.seed(100)
v = np.random.random(30)
print("The vector meas is: ", v.mean())

The vector meas is:  0.44116670266566466


** 83.- Create a 2d array with 1 on the border and 0 inside **

In [649]:
arr = np.ones((10,10))
arr[1:-1,1:-1] = 0
arr

array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

** 84.- How to add a border (filled with 0's) around an existing array? **

In [651]:
arr = np.zeros((10,10))
arr[1:-1, 1:-1] = 1
arr

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 1., 1., 1., 1., 1., 1., 1., 0.],
       [0., 1., 1., 1., 1., 1., 1., 1., 1., 0.],
       [0., 1., 1., 1., 1., 1., 1., 1., 1., 0.],
       [0., 1., 1., 1., 1., 1., 1., 1., 1., 0.],
       [0., 1., 1., 1., 1., 1., 1., 1., 1., 0.],
       [0., 1., 1., 1., 1., 1., 1., 1., 1., 0.],
       [0., 1., 1., 1., 1., 1., 1., 1., 1., 0.],
       [0., 1., 1., 1., 1., 1., 1., 1., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

** 85.- Create a 5x5 matrix with values 1,2,3,4 just below the diagonal** 

In [24]:
m = np.diag(1+np.arange(4),k=-1)
print(m)

[[0 0 0 0 0]
 [1 0 0 0 0]
 [0 2 0 0 0]
 [0 0 3 0 0]
 [0 0 0 4 0]]


** 86.- Create a 8x8 matrix and fill it with a checkerboard pattern **

In [73]:
c = np.zeros((8,8), dtype=int)
c[::2,::2]=1
c[1::2,1::2]=1
c

array([[1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1]])

** 87.- Consider a (6,7,8) shape array, what is the index (x,y,z) of the 100th element? **

In [83]:
np.random.seed(123)
arr = np.random.random((6,7,8))
np.unravel_index(100, (6,7,8))

(1, 5, 4)

** 88.- Create a checkerboard 8x8 matrix using the tile function **

In [90]:
c = np.tile(([0,1],[1,0]), (4,4))
c

array([[0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0]])

** 89.- Normalize a 5x5 random matrix **

In [99]:
np.random.seed(123)
m = np.random.random((5,5))
print("Input data: \n", m)
normalized = (m - m.min())/(m.max()-m.min())
print("Output data: \n", normalized)

Input data: 
 [[0.69646919 0.28613933 0.22685145 0.55131477 0.71946897]
 [0.42310646 0.9807642  0.68482974 0.4809319  0.39211752]
 [0.34317802 0.72904971 0.43857224 0.0596779  0.39804426]
 [0.73799541 0.18249173 0.17545176 0.53155137 0.53182759]
 [0.63440096 0.84943179 0.72445532 0.61102351 0.72244338]]
Output data: 
 [[0.69134813 0.24586343 0.18149608 0.53375766 0.71631841]
 [0.39456516 1.         0.67871147 0.45734477 0.36092125]
 [0.30778888 0.72671997 0.41135597 0.         0.36735576]
 [0.73643209 0.13333586 0.12569274 0.51230105 0.51260093]
 [0.62396223 0.85741574 0.72173197 0.59858193 0.71954765]]


** 90.- Multiply a 5x3 matrix by a 3x2 matrix (real matrix product)  **

In [117]:
np.random.seed(123)
A = np.random.randint(100, size=(5,3))
B = np.random.randint(100, size=(3,2))
print("Matrix A: \n",A)
print("Matrix B: \n",B)
C = np.dot(A,B)
print("Result C: \n",C)

# python 3.5 and above
A@B

Matrix A: 
 [[66 92 98]
 [17 83 57]
 [86 97 96]
 [47 73 32]
 [46 96 25]]
Matrix B: 
 [[83 78]
 [36 96]
 [80 68]]
Result C: 
 [[16630 20644]
 [ 8959 13170]
 [18310 22548]
 [ 9089 12850]
 [ 9274 14504]]


array([[16630, 20644],
       [ 8959, 13170],
       [18310, 22548],
       [ 9089, 12850],
       [ 9274, 14504]])

** 91.- Given a 1D array, negate all elements which are between 3 and 8, in place. **

In [136]:
arr = np.arange(11)
print("Input data: \n",arr)
arr[(3 < arr) & (arr <= 8)] *= -1
print("Output data: \n",arr)


Input data: 
 [ 0  1  2  3  4  5  6  7  8  9 10]
Output data: 
 [ 0  1  2  3 -4 -5 -6 -7 -8  9 10]


** 92.- How to round away from zero a float array ? **

In [154]:
np.random.seed(123)
arr = np.random.uniform(-10,+10,10)
print("Input data: \n", arr)
print(np.copysign(np.ceil(np.abs(arr)), arr))

Input data: 
 [ 3.92938371 -4.2772133  -5.46297093  1.02629538  4.3893794  -1.5378708
  9.61528397  3.69659477 -0.38136197 -2.15764964]
[ 4. -5. -6.  2.  5. -2. 10.  4. -1. -3.]


** 93.- How to find common values between two arrays? **

In [157]:
a = np.r_[1,2,3,4,5,6]
b = np.r_[2,4,6,8,10,12]
np.intersect1d(a,b)

array([2, 4, 6])

** 94.- How to get the dates of yesterday, today and tomorrow? **

In [169]:
today = np.datetime64(datetime.datetime.now())
print("Today date: ", today)
yesterday = today - np.timedelta64(24, 'h')
print("Yesterday date: ", yesterday)
tomorrow = today + np.timedelta64(24, 'h')
print("Tomorrow date: ", tomorrow)

Today date:  2018-05-16T15:55:35.843513
Yesterday date:  2018-05-15T15:55:35.843513
Tomorrow date:  2018-05-17T15:55:35.843513


** 95.- How to get all the dates corresponding to the month of July 2016? **

In [179]:
july = np.arange('2016-07', '2016-08', dtype='datetime64[D]')
july

array(['2016-07-01', '2016-07-02', '2016-07-03', '2016-07-04',
       '2016-07-05', '2016-07-06', '2016-07-07', '2016-07-08',
       '2016-07-09', '2016-07-10', '2016-07-11', '2016-07-12',
       '2016-07-13', '2016-07-14', '2016-07-15', '2016-07-16',
       '2016-07-17', '2016-07-18', '2016-07-19', '2016-07-20',
       '2016-07-21', '2016-07-22', '2016-07-23', '2016-07-24',
       '2016-07-25', '2016-07-26', '2016-07-27', '2016-07-28',
       '2016-07-29', '2016-07-30', '2016-07-31'], dtype='datetime64[D]')

** 96.- Extract the integer part of a random array using 5 different methods **

In [210]:
np.random.seed(123)
arr = np.random.uniform(0,10,10)
print("Input data: \n", arr)
print("Method 1: \n", np.ceil(arr))
print("Method 2: \n", np.floor(arr))
print("Method 3: \n", arr.astype('int'))
print("Method 4: \n", np.trunc(arr))
print("Method 5: \n", np.rint(arr))

Input data: 
 [6.96469186 2.86139335 2.26851454 5.51314769 7.1946897  4.2310646
 9.80764198 6.84829739 4.80931901 3.92117518]
Method 1: 
 [ 7.  3.  3.  6.  8.  5. 10.  7.  5.  4.]
Method 2: 
 [6. 2. 2. 5. 7. 4. 9. 6. 4. 3.]
Method 3: 
 [6 2 2 5 7 4 9 6 4 3]
Method 4: 
 [6. 2. 2. 5. 7. 4. 9. 6. 4. 3.]
Method 5: 
 [ 7.  3.  2.  6.  7.  4. 10.  7.  5.  4.]


** 97.- Create a 5x5 matrix with row values ranging from 0 to 4 **

In [222]:
arr = np.zeros((5,5))
arr += np.arange(5)
print(arr)

# Alternatively use
#np.tile((np.arange(5)),5).reshape(5,-1)

[[0. 1. 2. 3. 4.]
 [0. 1. 2. 3. 4.]
 [0. 1. 2. 3. 4.]
 [0. 1. 2. 3. 4.]
 [0. 1. 2. 3. 4.]]


** 98.- Consider a generator function that generates 10 integers and use it to build an array **

In [231]:
def generator(y):
    iterable = (x for x in range(y))
    return np.fromiter(iterable, int)
    
arr = generator(10)
print("The output: \n", arr)

The output: 
 [0 1 2 3 4 5 6 7 8 9]


** 99.- Create a vector of size 10 with values ranging from 0 to 1, both excluded **

In [241]:
v = np.linspace(0,1,11,endpoint=False)[1:]
v

# Alternatively use
# np.random.seed(123)
# v = random.random(10)
# v

array([0.09090909, 0.18181818, 0.27272727, 0.36363636, 0.45454545,
       0.54545455, 0.63636364, 0.72727273, 0.81818182, 0.90909091])

** 100.- Create a random vector of size 10 and sort it **

In [244]:
np.random.seed(123)
v = np.sort(np.random.random(10))
v

array([0.22685145, 0.28613933, 0.39211752, 0.42310646, 0.4809319 ,
       0.55131477, 0.68482974, 0.69646919, 0.71946897, 0.9807642 ])