<script>
    function findAncestor (el, name) {
        while ((el = el.parentElement) && el.nodeName.toLowerCase() !== name);
        return el;
    }
    function colorAll(el, textColor) {
        el.style.color = textColor;
        Array.from(el.children).forEach((e) => {colorAll(e, textColor);});
    }
    function setBackgroundImage(src, textColor) {
        var section = findAncestor(document.currentScript, 'section');
        if (section) {
            section.setAttribute('data-background-image', src);
			if (textColor) colorAll(section, textColor);
        }
    }
</script>

<style>
h1 {
  border: 1.5px solid #333;
  padding: 8px 12px;
  background-image: linear-gradient(#2774AE,#ebf8e1, #FFD100);
  position: static;
}
</style>

<h1 style='color:white'> Statistics 21 <br/> Python & Other Technologies for Data Science </h1>

<h3 style='color:white'>Vivian Lew, PhD - Friday, Week 7</h3>

<script>
    setBackgroundImage('Window1.jpg');
</script>

# more Numpy

## Week 7 Friday
## with gratitude to Miles Chen, PhD

Based on Python Data Science Handbook by Jake VanderPlas

In [1]:
import numpy as np


## more on Numpy's features

- Recall Numpy's arrays look like core Python lists
- But lists contain different data types.
- Numpy's arrays need less space (less memory/resources)
- Operations are faster in Numpy and less awkward

In [2]:
## what does it mean to be awkward. convert temps.

c = [40.4, 28.1, 14.4, 16.8, -2.3] # list

print([x * 9/5 + 32 for x in c]) # list comprehension

print(np.array(c) * 9/5 + 32) # convert to numpy array instead

[104.72, 82.58, 57.92, 62.24, 27.86]
[104.72  82.58  57.92  62.24  27.86]


If one of the goals of Python programming is readability, Numpy wins.

In [3]:
# what about speed?
import time

vec_size = 100000
def base():
    t1 = time.time()
    x = range(vec_size)
    y = range(vec_size)
    z = [x[i] + y[i] for i in range(len(x)) ]
    return time.time() - t1

def numpy():
    t1 = time.time()
    x = np.arange(vec_size)
    y = np.arange(vec_size)
    z = x + y
    return time.time() - t1

time1 = base()
time2 = numpy()
print(f"Numpy is {round(time1/time2, 1)}x faster!")

Numpy is 24.3x faster!


In [4]:
## maybe perform the operations multiple times

my_times = np.zeros(100)
for i in range(100):
    my_times[i] = round(base()/numpy(), 1)
print(my_times[0:10])
np.min(my_times), np.median(my_times).round(1), np.max(my_times)

[ 25.6  25.9  88.   60.5 183.4 194.8  33.2  77.6 171.6 193.7]


(25.6, 199.6, 212.5)

## Concatenating (combining, merging) Arrays

In [5]:
x = np.arange(4)
y = np.arange(100, 104)
print(x)
print(y)

[0 1 2 3]
[100 101 102 103]


In [6]:
np.concatenate([x, y])

array([  0,   1,   2,   3, 100, 101, 102, 103])

`np.concatenate` has an argument for axis. The axes are 0-indexed.

In [7]:
np.concatenate([x,y], axis = 0)

array([  0,   1,   2,   3, 100, 101, 102, 103])

Let's try to concatenate in the other direction. We specify axis = 1

In [8]:
np.concatenate([x,y], axis = 1) # throws an error

AxisError: axis 1 is out of bounds for array of dimension 1

In [9]:
# you can't use axis with index 1, 
# because axis index 1 does not exist for x
x.shape 

(4,)

In [10]:
np.vstack([x,y])   # vstack will vertically stack unidimensional arrays

array([[  0,   1,   2,   3],
       [100, 101, 102, 103]])

In [11]:
# alternate, we could concatenate
x.reshape(1,4)

array([[0, 1, 2, 3]])

In [12]:
y.reshape(1,4)

array([[100, 101, 102, 103]])

In [13]:
np.concatenate([x.reshape(1,4), y.reshape(1,4)], axis = 0)

array([[  0,   1,   2,   3],
       [100, 101, 102, 103]])

note that when I concatenate along axis 0 for a 2-dimensional array, it concatenates by rows. In a 2D array, index 0 is for rows, and index 1 is for columns.

In [14]:
np.concatenate([x.reshape(1,4), y.reshape(1,4)], axis = 1)

array([[  0,   1,   2,   3, 100, 101, 102, 103]])

In [15]:
xm = np.arange(6).reshape((2,3))
ym = np.arange(100,106,1).reshape((2,3))
print(xm)
print(ym)

[[0 1 2]
 [3 4 5]]
[[100 101 102]
 [103 104 105]]


In [16]:
xm.shape

(2, 3)

In [17]:
ym.shape

(2, 3)

In [18]:
print(np.concatenate([xm,ym], axis = 0))  # default concatenates on axis 0
# axes are reported as rows, then columns.
# concatenating along axis 0 will concatenate along rows

[[  0   1   2]
 [  3   4   5]
 [100 101 102]
 [103 104 105]]


In [19]:
print(np.concatenate([xm,ym], axis = 1))
# concatenating along axis 1 will concatenate along columns

[[  0   1   2 100 101 102]
 [  3   4   5 103 104 105]]


In [20]:
np.vstack([xm, ym])

array([[  0,   1,   2],
       [  3,   4,   5],
       [100, 101, 102],
       [103, 104, 105]])

In [21]:
np.hstack([xm, ym])

array([[  0,   1,   2, 100, 101, 102],
       [  3,   4,   5, 103, 104, 105]])

You can always use vstack and hstack for 2D arrays.

## Math Operators with numpy arrays

In [22]:
print(x)
print(y)

[0 1 2 3]
[100 101 102 103]


In [23]:
x + 5

array([5, 6, 7, 8])

In [24]:
x + y  # elementwise addition

array([100, 102, 104, 106])

In [25]:
x * y # elementwise multiplication

array([  0, 101, 204, 309])

In [26]:
np.sum(x * y)

614

In [27]:
np.dot(x,y)   # 0 * 100 + 1 * 101 + 2 * 102 + 3 * 103

614

In [28]:
x @ y # matrix multiplication

614

In [29]:
print(xm)
print(ym)

[[0 1 2]
 [3 4 5]]
[[100 101 102]
 [103 104 105]]


In [30]:
xm + 5

array([[ 5,  6,  7],
       [ 8,  9, 10]])

In [31]:
xm + ym  # elementwise addition

array([[100, 102, 104],
       [106, 108, 110]])

In [32]:
print(xm)
print(ym)

[[0 1 2]
 [3 4 5]]
[[100 101 102]
 [103 104 105]]


In [33]:
xm * ym # element-wise multiplication

array([[  0, 101, 204],
       [309, 416, 525]])

In [34]:
np.multiply(xm, ym) # element-wise multiplication

array([[  0, 101, 204],
       [309, 416, 525]])

In [35]:
print(xm)
print(ym)

[[0 1 2]
 [3 4 5]]
[[100 101 102]
 [103 104 105]]


In [36]:
np.dot(xm, ym.T)

array([[ 305,  314],
       [1214, 1250]])

In [37]:
xm.dot(ym.T)

array([[ 305,  314],
       [1214, 1250]])

In [38]:
xm @ ym.T

array([[ 305,  314],
       [1214, 1250]])

## Basic Math

In [39]:
x = np.arange(10) 
print(x)

[0 1 2 3 4 5 6 7 8 9]


In [40]:
print(x + 1)

[ 1  2  3  4  5  6  7  8  9 10]


In [41]:
print(x + 1.) # convert to float

[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]


In [42]:
#subtraction
print(x - 5)

[-5 -4 -3 -2 -1  0  1  2  3  4]


In [43]:
print(x - 5.0) # convert to float

[-5. -4. -3. -2. -1.  0.  1.  2.  3.  4.]


In [44]:
# multiplication
print(x * 2)

[ 0  2  4  6  8 10 12 14 16 18]


In [45]:
print(x * 2.) # convert to float

[ 0.  2.  4.  6.  8. 10. 12. 14. 16. 18.]


In [46]:
# division
print(x / 2)

[0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5]


In [47]:
print(-x)

[ 0 -1 -2 -3 -4 -5 -6 -7 -8 -9]


In [48]:
print(x ** 2)

[ 0  1  4  9 16 25 36 49 64 81]


In [49]:
print(x % 2) # modulo division

[0 1 0 1 0 1 0 1 0 1]


In [50]:
print(abs(-x)) 

[0 1 2 3 4 5 6 7 8 9]


## Trig functions
note that the functions are preceeded by np. and pi is not builtin but can be found in numpy

In [51]:
theta = np.linspace(0, np.pi, 6)
print(theta)

[0.         0.62831853 1.25663706 1.88495559 2.51327412 3.14159265]


In [52]:
print(np.sin(theta))

[0.00000000e+00 5.87785252e-01 9.51056516e-01 9.51056516e-01
 5.87785252e-01 1.22464680e-16]


In [53]:
print(np.cos(theta).round(decimals = 4)) 
# use the method ndarray.round( ), faster

[ 1.     0.809  0.309 -0.309 -0.809 -1.   ]


In [54]:
print(np.tan(theta))

[ 0.00000000e+00  7.26542528e-01  3.07768354e+00 -3.07768354e+00
 -7.26542528e-01 -1.22464680e-16]


## Log and Exp

In [55]:
x = np.array([1, 10, 100])
print(np.log(x))   # natural log
print(np.log10(x)) # common log

[0.         2.30258509 4.60517019]
[0. 1. 2.]


In [56]:
y = np.arange(3)
print(np.exp(y))  # e^y

[1.         2.71828183 7.3890561 ]


In [57]:
print(np.exp2(y))  # 2^y

[1. 2. 4.]


In [58]:
print(np.power(3, y)) # power ^ y

[1 3 9]


# Aggregates

you can use `sum()`

or `np.sum()`

`np.sum()` is faster than sum

In [59]:
x = np.arange(100)
print(x)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99]


In [60]:
print(sum(x))

4950


In [61]:
print(np.sum(x))

4950


In [62]:
big_array = np.random.rand(10000)
%timeit sum(big_array)
%timeit np.sum(big_array)  # the np version is much faster

616 µs ± 7.39 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
3.82 µs ± 12.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


%timeit chooses a lower number of loops (1000 loops) to keep the overall execution time within a reasonable range.

## min and max

In [63]:
print(min(big_array))
print(max(big_array))

0.00010151167156935426
0.9998909135354603


In [64]:
print(np.min(big_array))
print(np.max(big_array))

0.00010151167156935426
0.9998909135354603


In [65]:
%timeit min(big_array)
%timeit np.min(big_array)  # the np version is much faster

410 µs ± 1.06 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
5.07 µs ± 22.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


## summaries for matrices

In [66]:
np.random.seed(1)
# M = np.random.random((3, 4))
M = np.arange(12)
np.random.shuffle(M)
M = np.reshape(M, (3,4))
print(M)

[[ 2  3  4 10]
 [ 1  6  0  7]
 [11  9  8  5]]


In [67]:
sum(M) # regular sum function

array([14, 18, 12, 22])

In [68]:
np.sum(M) # np.sum function

66

In [69]:
print(M)

[[ 2  3  4 10]
 [ 1  6  0  7]
 [11  9  8  5]]


In [70]:
np.sum(M, axis = 0)  # np.sum function with axis specified
# matrices have two dimensions
# 0 is rows, 1 is columns
# np.sum axis = 0, will sum over rows, so you end up getting column totals

array([14, 18, 12, 22])

In [71]:
np.sum(M, axis = 1)

array([19, 14, 33])

In [72]:
np.min(M, axis = 0)

array([1, 3, 0, 5])

In [73]:
print(M)

[[ 2  3  4 10]
 [ 1  6  0  7]
 [11  9  8  5]]


In [74]:
np.std(M)

3.452052529534663

In [75]:
np.std(M, axis = 0)

array([4.49691252, 2.44948974, 3.26598632, 2.05480467])

In [76]:
np.mean(M, axis = 1)

array([4.75, 3.5 , 8.25])

## Summaries for higher dimensional arrays

In [77]:
np.random.seed(1)
A = np.ones(24)
np.random.shuffle(A)
A = np.reshape(A, (2, 3, 4)) # two sheets, 3 rows, 4 columns
print(A)

[[[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]]


In [78]:
np.sum(A, axis = 0) # sum across "sheets"

array([[2., 2., 2., 2.],
       [2., 2., 2., 2.],
       [2., 2., 2., 2.]])

In [79]:
np.sum(A, axis = 1) # sum across rows

array([[3., 3., 3., 3.],
       [3., 3., 3., 3.]])

In [80]:
np.sum(A, axis = 2) # sum across columns

array([[4., 4., 4.],
       [4., 4., 4.]])

## dealing with nan
nan is the float value for something that is not a number. We often use it in the place of a missing value.
nan only exists in float type.

In [81]:
x = float("nan")  # direct creation of nan
print(x)
print(type(x))

nan
<class 'float'>


In [82]:
y = float("inf")  # y is the float representation of infinity
print(y / y)  # these calculations will yield a nan result
print(y - y)

nan
nan


In [83]:
np.sum([x, 2])

nan

In [84]:
np.nansum([x, 2])   # in R you have the option na.rm = TRUE

2.0

The following table provides a list of useful aggregation functions available in NumPy:

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |

## Broadcasting

This is a similar concept to recyling values in R, but only works when the dimensions are compatible

In [85]:
a = np.array([1,2,3])
b = np.array([4,5,6])
print(a + b) # this works, element by element

[5 7 9]


In [86]:
c = np.array([7,8])
print(a + c)  # doesn't work shapes are different

ValueError: operands could not be broadcast together with shapes (3,) (2,) 

In [87]:
print(a)

[1 2 3]


In [88]:
e = np.ones([3,3])
print(e)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [89]:
print(e + a)  # the array a gets 'broadcast' across all three rows

[[2. 3. 4.]
 [2. 3. 4.]
 [2. 3. 4.]]


In [90]:
print(a.reshape([3,1]))  # we reshape a to be a 3x1 array

[[1]
 [2]
 [3]]


In [91]:
print(e + a.reshape([3,1])) # the reshaped array is broadcast across columns

[[2. 2. 2.]
 [3. 3. 3.]
 [4. 4. 4.]]


In [92]:
d = np.vstack([a,b])  # we stack the arrays a and b vertically
print(d)

[[1 2 3]
 [4 5 6]]


In [93]:
a

array([1, 2, 3])

In [94]:
print(d + a)  # a is broadcast across row

[[2 4 6]
 [5 7 9]]


In [95]:
print(c)

[7 8]


In [96]:
print(d)

[[1 2 3]
 [4 5 6]]


In [97]:
print(d + c)  # c does not have compatible dimensions

ValueError: operands could not be broadcast together with shapes (2,3) (2,) 

In [98]:
print(d + c.reshape([2,1]))  # after we reshape c to be a column, we can broadcast it

[[ 8  9 10]
 [12 13 14]]


In [99]:
e = np.arange(10).reshape((10, 1))
f = np.arange(11)
print(e)
print(f)

[[0]
 [1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]
 [9]]
[ 0  1  2  3  4  5  6  7  8  9 10]


In [100]:
print(e + f)  ## e and f are broadcast into compatible matrices and then added

[[ 0  1  2  3  4  5  6  7  8  9 10]
 [ 1  2  3  4  5  6  7  8  9 10 11]
 [ 2  3  4  5  6  7  8  9 10 11 12]
 [ 3  4  5  6  7  8  9 10 11 12 13]
 [ 4  5  6  7  8  9 10 11 12 13 14]
 [ 5  6  7  8  9 10 11 12 13 14 15]
 [ 6  7  8  9 10 11 12 13 14 15 16]
 [ 7  8  9 10 11 12 13 14 15 16 17]
 [ 8  9 10 11 12 13 14 15 16 17 18]
 [ 9 10 11 12 13 14 15 16 17 18 19]]


In [101]:
print(e * f)  ## e and f are broadcast into compatible matrices and then multiplied element-wise

[[ 0  0  0  0  0  0  0  0  0  0  0]
 [ 0  1  2  3  4  5  6  7  8  9 10]
 [ 0  2  4  6  8 10 12 14 16 18 20]
 [ 0  3  6  9 12 15 18 21 24 27 30]
 [ 0  4  8 12 16 20 24 28 32 36 40]
 [ 0  5 10 15 20 25 30 35 40 45 50]
 [ 0  6 12 18 24 30 36 42 48 54 60]
 [ 0  7 14 21 28 35 42 49 56 63 70]
 [ 0  8 16 24 32 40 48 56 64 72 80]
 [ 0  9 18 27 36 45 54 63 72 81 90]]


In [102]:
print(d)

[[1 2 3]
 [4 5 6]]


In [103]:
d.reshape((1,6)) + d.reshape((6,1))

array([[ 2,  3,  4,  5,  6,  7],
       [ 3,  4,  5,  6,  7,  8],
       [ 4,  5,  6,  7,  8,  9],
       [ 5,  6,  7,  8,  9, 10],
       [ 6,  7,  8,  9, 10, 11],
       [ 7,  8,  9, 10, 11, 12]])

# Boolean Operators in NumPy

In [104]:
x = np.arange(6)
print(x)

[0 1 2 3 4 5]


In [105]:
print(x < 3)

[ True  True  True False False False]


In [106]:
print(x >= 3)

[False False False  True  True  True]


In [107]:
print(x == 3)

[False False False  True False False]


In [108]:
# the results can then be used to subset
print(x[x >= 3])

[3 4 5]


In [109]:
np.sum(x >= 3) # True = 1, False = 0, so sum counts how many are true

3

In [110]:
np.mean(x >= 3)  # finds the proportion that is True

0.5

In [111]:
print(~(x == 3)) # use the tilde for negation of boolean values

[ True  True  True False  True  True]


In [112]:
print(~x == 3) # be careful if you leave off parenthesis

[False False False False False False]


In [113]:
~x

array([-1, -2, -3, -4, -5, -6])

### Working with matrices

In [114]:
y = np.arange(12).reshape([3,4])
print(y)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [115]:
print(y >= 6)

[[False False False False]
 [False False  True  True]
 [ True  True  True  True]]


In [116]:
np.sum(y >= 6)

6

In [117]:
np.sum(y >= 6, axis = 0)  # you can perform sums and other aggregate functions axis-wise on the boolean matrix

array([1, 1, 2, 2])

In [118]:
np.sum(y >= 6, axis = 1)

array([0, 2, 4])

## Bitwise (element-wise) Boolean operators

In [119]:
a = np.array([True, True, False, False])
b = np.array([True, False, True, False])
print(a)
print(b)

[ True  True False False]
[ True False  True False]


In [120]:
print(a & b) # bitwise and

[ True False False False]


In [121]:
print(a | b) # bitwise or

[ True  True  True False]


In [122]:
print(a ^ b) # bitwise xor (exclusive or) if the same, then false

[False  True  True False]


In [123]:
print(~a)  # bitwise not (like complement)

[False False  True  True]


In [124]:
np.any(a)

True

In [125]:
np.all(a)

False

# fancy indexing
Regular lists in python do not support fancy indexing, but NumPy does!  It's like indexing in R, just simpler.

In [126]:
np.random.seed(1)
x = np.random.randint(100, size = 10)
print(x)

[37 12 72  9 75  5 79 64 16  1]


In [127]:
index = [0, 1, 5]
print(x[index])

[37 12  5]


In [128]:
a = [1, 4, 7]
b = [2, 3, 8]
ind = np.vstack([a,b])
print(ind)

[[1 4 7]
 [2 3 8]]


In [129]:
print(x[ind])

[[12 75 64]
 [72  9 16]]


In [130]:
X = np.arange(12).reshape((3, 4))
print(X)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [131]:
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
X[row, col]

array([ 2,  5, 11])

- `np.sort()` like a sort( ) in R
- `np.argsort()` gives the indexes of the values to have the proper sorting like order( ) in R

In [132]:
np.random.seed(2)
x = np.arange(5)
np.random.shuffle(x)
print(x)

[2 4 1 3 0]


In [133]:
x.sort() # sorts x in place
print(x)

[0 1 2 3 4]


recall sorted()/.sort() in core Python

In [134]:
np.random.shuffle(x)
print(x)
print(np.sort(x)) # makes a copy original unchanged
print(x)

[4 1 0 2 3]
[0 1 2 3 4]
[4 1 0 2 3]


In [135]:
y = np.array([5, 2, 1, 4])
print(y)
print(y.argsort()) # returns the indices to sort an array

[5 2 1 4]
[2 1 3 0]


In [136]:
d = y.argsort()
y[d]

array([1, 2, 4, 5])

## Sorting along rows or columns

A useful feature of NumPy's sorting algorithms is the ability to sort along specific rows or columns of a multidimensional array using the axis argument. For example:

In [137]:
np.random.seed(1)
X = np.random.randint(0, 10, (4, 6))
print(X)

[[5 8 9 5 0 0]
 [1 7 6 9 2 4]
 [5 2 4 2 4 7]
 [7 9 1 7 0 6]]


In [138]:
# sort each column of X
# np.sort returns a copy of X after sorted. It does not modify X
np.sort(X, axis=0)

array([[1, 2, 1, 2, 0, 0],
       [5, 7, 4, 5, 0, 4],
       [5, 8, 6, 7, 2, 6],
       [7, 9, 9, 9, 4, 7]])

In [139]:
# sort each row of X
np.sort(X, axis=1)

array([[0, 0, 5, 5, 8, 9],
       [1, 2, 4, 6, 7, 9],
       [2, 2, 4, 4, 5, 7],
       [0, 1, 6, 7, 7, 9]])

In [140]:
X[0,:] # selecting a row

array([5, 8, 9, 5, 0, 0])

In [141]:
print(X)

[[5 8 9 5 0 0]
 [1 7 6 9 2 4]
 [5 2 4 2 4 7]
 [7 9 1 7 0 6]]


In [142]:
X[0:2,:] # selecting rows

array([[5, 8, 9, 5, 0, 0],
       [1, 7, 6, 9, 2, 4]])

In [143]:
np.random.seed(1)
my_rows = X.shape[0]
my_index = np.random.choice(my_rows, size=2, replace=False)
print(my_index)
print(type(my_index))
print(X[my_index, :]) # indexing an array with an array

[3 2]
<class 'numpy.ndarray'>
[[7 9 1 7 0 6]
 [5 2 4 2 4 7]]


In [144]:
X[:,1].argsort()  # the argsort for the column index 1

array([2, 1, 0, 3])

In [145]:
print(X[ X[:,1].argsort() , : ])  # 'subset' X by the argsort to arrange X by the column

[[5 2 4 2 4 7]
 [1 7 6 9 2 4]
 [5 8 9 5 0 0]
 [7 9 1 7 0 6]]


<h1> Statistics 21 <br/> Have an excellent weekend! </h1>

<script>
    setBackgroundImage('Window1.jpg', 'black');
</script>