# Comparisons, Masks, and Boolean Logic

- Examine and manipulate values within NumPy arrays
- Masking is useful to extract, modify, count or manipulate based on some criteria
- For example: 
1. Count all values greater than a certain value
2. Remove all outliers that are above some threshold

**- Suppose we have a rainfall data from a city and we are interested to know the following questions**
- How many number of days without rain?
- How many number of days with rain?
- How many rainy days were there with more than 5mm of rain?
- How many rainy days were there with less than 15mm of rain?

- NumPy ufuncs are faster than loops to do element-wise arithmatic operations over arrays
- We can use ufuncs to do element-wise comparisons overs arrays to manipulate the results 
to answer the questions we have

- Comparison operators such as <(less than) and > (greater than)
- The results of these comparison operatorsrators is always an array with Boolean data type
- Six standard comparison operators <, >, <=, >=, !=, ==

In [1]:
import numpy as np
import pandas as pd

# Comparison and Boolean operators

## Comparison operators <, >, <=, >=, !=, ==

In [3]:
x = np.array([1,2,3,4,5]) # one-dimensional array

In [4]:
x < 3 #less than

array([ True,  True, False, False, False])

In [5]:
np.less(x,3)

array([ True,  True, False, False, False])

In [8]:
x > 3 # greater than

array([False, False, False,  True,  True])

In [6]:
np.greater(x,3)

array([False, False, False,  True,  True])

In [9]:
x <= 3 # less than or equal

array([ True,  True,  True, False, False])

In [7]:
np.less_equal(x,3)

array([ True,  True,  True, False, False])

In [10]:
x >= 3 # greater than or equal

array([False, False,  True,  True,  True])

In [8]:
np.greater_equal(x,3)

array([False, False,  True,  True,  True])

In [5]:
# Two- dimensional array

rng = np.random.RandomState(0) 
# RandomState exposes a number of methods for generating random numbers 
# drawn from a variety of probability distributions.
x = rng.randint(10, size=(3,4))

In [6]:
x

array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6]])

In [35]:
x <6 

array([[ True,  True,  True,  True],
       [False, False,  True,  True],
       [ True,  True, False, False]])

In [36]:
# So far we were getting the boolean results how about we want to have numbers as results

## Working with Boolean Arrays

In [7]:
print(x)

[[5 0 3 3]
 [7 9 3 5]
 [2 4 7 6]]


In [38]:
# To count the number of True entries in a boolean array
# how many values less than 6?
np.count_nonzero(x<6)

8

In [8]:
# another way to get this information is to use np.sum
np.sum(x<6)

8

In [9]:
# The benefit of sum() is that this summation can be done along the rows or column as well:
# how many values in each row with leass than 6

np.sum(x<6, axis=1)

array([4, 2, 2])

In [10]:
np.sum(x<6, axis=0) #

array([2, 2, 2, 2])

In [11]:
# use of any() or all() values

# are there any values greater than 8?

np.any(x > 8)

True

In [12]:
# are there any values less than zero?

np.any(x < 0)

False

In [13]:
# are all values less than 10?

np.all(x < 10)

True

In [14]:
# are all values equal to 6

np.all(x == 6)

False

In [15]:
# both the functions any() and all() can be used along axis

# are all values in each row less than 8?

np.all(x < 8, axis=1)

array([ True, False,  True])

In [16]:
x

array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6]])

## Example: Seattle rainfall 

In [9]:
df = pd.read_csv(r'seattle-weather.csv', encoding='utf8', engine='python')

In [10]:
df

Unnamed: 0,date,precipitation,temp_max,temp_min,wind,weather
0,2012/1/1,0.0,12.8,5.0,4.7,drizzle
1,2012/1/2,10.9,10.6,2.8,4.5,rain
2,2012/1/3,0.8,11.7,7.2,2.3,rain
3,2012/1/4,20.3,12.2,5.6,4.7,rain
4,2012/1/5,1.3,8.9,2.8,6.1,rain
...,...,...,...,...,...,...
1456,2015/12/27,8.6,4.4,1.7,2.9,fog
1457,2015/12/28,1.5,5.0,1.7,1.3,fog
1458,2015/12/29,0.0,7.2,0.6,2.6,fog
1459,2015/12/30,0.0,5.6,-1.0,3.4,sun


In [11]:
df['year'] = pd.DatetimeIndex(df['date']).year

In [12]:
df

Unnamed: 0,date,precipitation,temp_max,temp_min,wind,weather,year
0,2012/1/1,0.0,12.8,5.0,4.7,drizzle,2012
1,2012/1/2,10.9,10.6,2.8,4.5,rain,2012
2,2012/1/3,0.8,11.7,7.2,2.3,rain,2012
3,2012/1/4,20.3,12.2,5.6,4.7,rain,2012
4,2012/1/5,1.3,8.9,2.8,6.1,rain,2012
...,...,...,...,...,...,...,...
1456,2015/12/27,8.6,4.4,1.7,2.9,fog,2015
1457,2015/12/28,1.5,5.0,1.7,1.3,fog,2015
1458,2015/12/29,0.0,7.2,0.6,2.6,fog,2015
1459,2015/12/30,0.0,5.6,-1.0,3.4,sun,2015


In [13]:
df2012 = df[df['year']==2012]

In [14]:
rainfall = df2012['precipitation']

In [15]:
rainfall

0       0.0
1      10.9
2       0.8
3      20.3
4       1.3
       ... 
361     4.1
362     0.0
363     1.5
364     0.0
365     0.0
Name: precipitation, Length: 366, dtype: float64

## Boolean Operators

**- Bitwise logic operators**
- &  bitwise_and
- |  bitwise_or
- ^  bitwise_xor
- ~  bitwise_not

**Comparison operators and boolean opeartors are useful for compound questions**

In [43]:
# compound operators
# All rainy days with less than 15mm and greater than 5mm
# here parentheses are important because of operator precedence rules

np.sum((rainfall < 15) & (rainfall > 5))

55

In [None]:
# use of negation operator
# A AND B is equivalent to NOT(A OR B)

In [45]:
np.sum(~((rainfall >= 15) | (rainfall <= 5)))

55

In [46]:
rainfall.value_counts()

0.0     189
0.3      12
1.5      10
0.5       9
0.8       8
       ... 
17.3      1
6.9       1
10.4      1
27.4      1
23.9      1
Name: precipitation, Length: 67, dtype: int64

**Using seattle rainfall data answer below questions
- How many number of days without rain in 2012?
- How many number of days with rain in 2012?
- How many rainy days were there with more than 5mm of rain in 2012?
- How many rainy days were there with less than 15mm of rain in 2012?

In [16]:
df

Unnamed: 0,date,precipitation,temp_max,temp_min,wind,weather,year
0,2012/1/1,0.0,12.8,5.0,4.7,drizzle,2012
1,2012/1/2,10.9,10.6,2.8,4.5,rain,2012
2,2012/1/3,0.8,11.7,7.2,2.3,rain,2012
3,2012/1/4,20.3,12.2,5.6,4.7,rain,2012
4,2012/1/5,1.3,8.9,2.8,6.1,rain,2012
...,...,...,...,...,...,...,...
1456,2015/12/27,8.6,4.4,1.7,2.9,fog,2015
1457,2015/12/28,1.5,5.0,1.7,1.3,fog,2015
1458,2015/12/29,0.0,7.2,0.6,2.6,fog,2015
1459,2015/12/30,0.0,5.6,-1.0,3.4,sun,2015


In [17]:
df['precipitation']

0        0.0
1       10.9
2        0.8
3       20.3
4        1.3
        ... 
1456     8.6
1457     1.5
1458     0.0
1459     0.0
1460     0.0
Name: precipitation, Length: 1461, dtype: float64

In [18]:
np.sum(df['precipitation']>5)

263

In [19]:
# rainy days less than 15mm

np.sum((df['precipitation']<15) & (df['precipitation'])>0)

533

In [49]:
# days without rain in 2012
np.sum(rainfall==0)

189

In [50]:
# days with rain in 2012
np.sum(rainfall!=0)

177

In [51]:
# rainy days greater than 5mm in 2012
np.sum(rainfall>5)

78

In [52]:
# rainy days less than 15mm in 2012

np.sum((rainfall<15) & (rainfall)>0)

154

In [53]:
# rainy days greater than 15mm in 2012

np.sum(rainfall>15)

23

# Boolean Arrays as Masks

- Boolean array as amask select an **array of all values in the array** by providing a condition
- For example less than 5

In [54]:
x

array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6]])

In [55]:
x<5

array([[False,  True,  True,  True],
       [False, False,  True, False],
       [ True,  True, False, False]])

In [56]:
# Mask operation is done by applying indexing on boolean array
# return the 1D array with values having True condition

x[x<5]

array([0, 3, 3, 3, 2, 4])

In [57]:
# We are free to operate on these values as we wish
# We can compute some relevant statistics on our seattle rain data

In [91]:
pd.set_option('display.max_rows', 500)

In [92]:
rainfall

0       0.0
1      10.9
2       0.8
3      20.3
4       1.3
5       2.5
6       0.0
7       0.0
8       4.3
9       1.0
10      0.0
11      0.0
12      0.0
13      4.1
14      5.3
15      2.5
16      8.1
17     19.8
18     15.2
19     13.5
20      3.0
21      6.1
22      0.0
23      8.6
24      8.1
25      4.8
26      0.0
27      0.0
28     27.7
29      3.6
30      1.8
31     13.5
32      0.0
33      0.0
34      0.0
35      0.0
36      0.0
37      0.3
38      2.8
39      2.5
40      2.5
41      0.8
42      1.0
43     11.4
44      2.5
45      0.0
46      1.8
47     17.3
48      6.4
49      0.0
50      3.0
51      0.8
52      8.6
53      0.0
54     11.4
55      0.0
56      1.3
57      0.0
58      3.6
59      0.8
60      0.0
61      2.0
62      0.0
63      0.0
64      6.9
65      0.5
66      0.0
67      0.0
68      3.6
69     10.4
70     13.7
71     19.3
72      9.4
73      8.6
74     23.9
75      8.4
76      9.4
77      3.6
78      2.0
79      3.6
80      1.3
81      4.1
82      0.0
83  

In [93]:
# Construct a mask on all rainy days

rain = rainfall>0
rainfall[rain]

1      10.9
2       0.8
3      20.3
4       1.3
5       2.5
8       4.3
9       1.0
13      4.1
14      5.3
15      2.5
16      8.1
17     19.8
18     15.2
19     13.5
20      3.0
21      6.1
23      8.6
24      8.1
25      4.8
28     27.7
29      3.6
30      1.8
31     13.5
37      0.3
38      2.8
39      2.5
40      2.5
41      0.8
42      1.0
43     11.4
44      2.5
46      1.8
47     17.3
48      6.4
50      3.0
51      0.8
52      8.6
54     11.4
56      1.3
58      3.6
59      0.8
61      2.0
64      6.9
65      0.5
68      3.6
69     10.4
70     13.7
71     19.3
72      9.4
73      8.6
74     23.9
75      8.4
76      9.4
77      3.6
78      2.0
79      3.6
80      1.3
81      4.1
86      4.8
87      1.3
88     27.4
89      5.6
90     13.2
91      1.5
93      1.5
95      4.6
96      0.3
101     2.3
102     0.5
106     8.1
107     1.8
108     1.8
109    10.9
110     6.6
114     4.3
115    10.7
116     3.8
117     0.8
119     4.3
120     4.3
121     0.5
122     0.5
123    18.5
124 

In [62]:
# Construct a mask of all summer days (June 21st is the 173 day in 2012)

summer = (np.arange(366)-173<90) & (np.arange(366)-173>0)

In [94]:
rainfall[summer]

174     8.6
175     0.0
176     0.5
177     0.0
178     0.0
179     0.0
180     0.3
181     3.0
182     0.0
183     2.0
184     5.8
185     0.0
186     0.0
187     0.0
188     0.0
189     0.0
190     1.5
191     0.0
192     0.0
193     0.0
194     0.5
195     0.0
196     0.0
197     0.3
198     0.0
199     0.0
200     0.0
201    15.2
202     0.0
203     1.0
204     0.0
205     0.0
206     0.0
207     0.0
208     0.0
209     0.0
210     0.0
211     0.0
212     0.0
213     0.0
214     0.0
215     0.0
216     0.0
217     0.0
218     0.0
219     0.0
220     0.0
221     0.0
222     0.0
223     0.0
224     0.0
225     0.0
226     0.0
227     0.0
228     0.0
229     0.0
230     0.0
231     0.0
232     0.0
233     0.0
234     0.0
235     0.0
236     0.0
237     0.0
238     0.0
239     0.0
240     0.0
241     0.0
242     0.0
243     0.0
244     0.0
245     0.0
246     0.0
247     0.0
248     0.0
249     0.0
250     0.0
251     0.0
252     0.3
253     0.3
254     0.0
255     0.0
256     0.0
257 

In [83]:
# Median precipitation on rainy days in 2012

np.median(rainfall[rain])

4.1

In [84]:
# Min precipitation on rainy days in 2012

np.min(rainfall[rain])

0.3

In [74]:
# Max precipitation on rainy days in 2012

np.max(rainfall[rain])

54.1

In [67]:
# Median precipitation on summer days in 2012

np.median(rainfall[summer])

0.0

In [75]:
# Min precipitation on summer days in 2012

np.min(rainfall[rain])

0.3

In [68]:
# Maximum precipitation on summer days in 2012

np.max(rainfall[summer])

15.2

In [69]:
# Median precipitation on non-summer days in 2012

np.median(rainfall[rain & ~summer])

4.3

In [72]:
# Minimum precipitation on non-summer days in 2012

np.min(rainfall[rain & ~summer])

0.3

In [76]:
# Maximum precipitation on non-summer days in 2012

np.max(rainfall[rain & ~summer])

54.1

- By combining boolean operations, masking operations and aggregates we can quickly answer above questions

# Fancy Indexing

- Previously we saw simple indexing x[0], slices x[:5] , and boolean masks x[x<5]
- In Fancy indexing we pass the array of indices instead of single scalars.
- This allows us to quickly access and modify complicated subsets of an array values.

- To access multiple array elements at once we pass array of indices

## Fancy Indexing in Single dimension

In [113]:
rand = np.random.RandomState(42)

x = rand.randint(100, size=10) #1D array

In [114]:
x

array([51, 92, 14, 71, 60, 20, 82, 86, 74, 74])

In [115]:
# Suppose we want to access three different elements

[x[3], x[7], x[2]]

[71, 86, 14]

In [116]:
# Alternatively we can pass the single list or array of indices to obtain the same result

ind = [3, 7, 2]
x[ind]

array([71, 86, 14])

- With fancy indexing shape of result reflects the shape of index arrays rather than the shape of the array being indexed

In [120]:
ind = np.array([[3,7],
              [4,5]])

In [124]:
x[ind]

array([[71, 86],
       [60, 20]])

## Fancy Indexing in multiple dimensions

In [125]:
X = np.arange(12).reshape(3,4)

In [126]:
X

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [128]:
# In standard indexing first index refers to row and the second to column

row = np.array([0,1,2])
col = np.array([2,1,3])

In [129]:
X[row,col]

array([ 2,  5, 11])

In [130]:
# Notice that first value in result is X[0, 2]
# The pairing of indices in fancy indexing follows all the briadcasting rules.

In [132]:
# If we combine the row vector and column vector then we get the two dimensional result

X[row[:, np.newaxis], col]

array([[ 2,  1,  3],
       [ 6,  5,  7],
       [10,  9, 11]])

# Combined Indexing

- To get more powerful operations, fancy indexing can be combined with other indexing schemes like simple 
indices, slicing, and masking

In [134]:
print(X)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [135]:
# Combine fancy and simple indices

X[2, [2, 0, 1]]

array([10,  8,  9])

In [136]:
# Combine fancy and slicing

X[1:, [2, 0, 1]]

array([[ 6,  4,  5],
       [10,  8,  9]])

In [137]:
# Combine fancy indexing with mask

mask = np.array([1,0,1,0], dtype = bool)

In [138]:
mask

array([ True, False,  True, False])

In [139]:
X[row[:,np.newaxis], mask]

array([[ 0,  2],
       [ 4,  6],
       [ 8, 10]])

- Combining all of these indexing options lead to a very flexible set of operations for accessing and modifying array values

# Modify Values with Fancy Indexing

- As fancy indexing can be used to access parts of an array, it can also be used as parts of an array

In [148]:
x = np.arange(10)
i = np.array([2,1,8,4])

In [150]:
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [151]:
x[i]

array([2, 1, 8, 4])

In [159]:
x[i]=99

In [160]:
print(x)

[ 0 99 99  3 99  5  6  7 99  9]


In [161]:
# We can use any assignment type operator

#x[i] = x[i] - 10
x[i] -= 10

In [162]:
x

array([ 0, 89, 89,  3, 89,  5,  6,  7, 89,  9])

In [163]:
print(x)

[ 0 89 89  3 89  5  6  7 89  9]


In [164]:
x = np.zeros(10)

In [165]:
x

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [166]:
x[[0,0]] = [4,6] # assigning multiple values to single index

In [167]:
print(x)

[6. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [168]:
# addition of values in an array

i = [2,3,3,4,4,4]

In [169]:
x[i] +=1 
# x[i] = x[i] + 1
print(x) 

[6. 0. 1. 1. 1. 0. 0. 0. 0. 0.]


In [170]:
# What if we want to repeat the operation

x = np.zeros(10)
np.add.at(x, i, 1)
print(x)

[0. 0. 1. 2. 3. 0. 0. 0. 0. 0.]
