# NumPy 1D arrays: Conditional filtering and summary statistics

It's useful to know how to selection portions of NumPy arrays based on their values. For example, from a dataset of information about students, you might want to select all students who live more than 20 miles from a location of the school they attend. 

We'll go through accomplishing this kind of selection through a process called conditional filtering with boolean masks.

We'll also cover combining NumPy 1D arrays with `np.concatenate` and calculating summary statistics such as the mean and the standard deviation.

## Setup

Please import NumPy as its accepted alias, `np`.

In [1]:
import numpy as np

## Conditional Filtering with Boolean Masks

NumPy allows conditions to be applied directly to arrays.
The result is a boolean mask that selects only the values that satisfy the condition.

In [2]:
# Initialize array
arr = np.array([1, 4, 2, 7, 3, 9, 0])
arr

array([1, 4, 2, 7, 3, 9, 0])

### Apply a condition: values < 3
To filter to values greater than 3, we first construct a "mask" of boolean values: `True` if the value in the array with the corresponding index matches the condition, `False` if not.

In [3]:
# Note that mask will be an array of boolean values
mask = arr < 3
mask

array([ True, False,  True, False, False, False,  True])

## Use the mask to filter the array
To apply the boolean mask to filter the array, you can place it within `[ ]` just like slicing or selecting values by their index. The resulting array is only the values in the original array that match the conditional filter (that were `True` in the boolean mask).

In [4]:
filtered = arr[mask]
filtered

array([1, 2, 0])

You can also skip the creation of the mask variable and apply the conditional filter directly while slicing/indexing. This is simpler and more concise, but sometimes making a separate boolean mask can be helpful if you want to examine which specific values match the condition.

In [None]:
arr[arr < 3]

## Combining arrays with `np.concatenate`

We can merge multiple 1D Numpy arrays into one array by using the `np.concatenate` function.

In [5]:
array_one = np.array([1, 2, 3])
array_two = np.array([4, 5, 6])

# Concatenate the arrays
concatenated_array = np.concatenate([array_one, array_two])
print("Concatenated 1D Array:", concatenated_array)

Concatenated 1D Array: [1 2 3 4 5 6]


You can also use `np.concat()` instead of `np.concatenate()`. The two functions are the same. Note: `np.concat()` was introduced in NumPy v2.0.0, which Coursera labs do not currently use, so it will raise an error. You can try it in your local python environment.

In [7]:
array_one = np.array([1, 2, 3])
array_two = np.array([4, 5, 6])

concatenated_array = np.concat([array_one, array_two])
print(concatenated_array)

AttributeError: module 'numpy' has no attribute 'concat'

## Summary Statistics in NumPy

You can find the mean and the standard deviation of a NumPy array by using the `mean()` and `std()` functions.

### Mean

In [None]:
arr.mean()

### Standard Deviation and ``ddof=1``
The default setting in NumPy for the `std()` function is unfortunately what's called the "biased" estimate, which divides by the full number of datapoints in the list, $N$. But this means that a standard deviation can be calculated for just 1 datapoint if $N = 1$, which doesn't make sense. It also has been found to underestimate the true population standard deviation from a sample of data.

Instead, we usually want to use the unbiased estimate, which divides by the number of datapoints minus one, $N-1$ (called Bessel's correction). To do this in NumPy, set the `ddof` argument to `1`.

In [None]:
arr.std()   # Default: ddof=0 (biased)

Use `.std(ddof=1)` for the unbiased sample estimator

In [None]:
arr.std(ddof=1)

# Practice Exercise 1

Given the below array of even numbers from 1 to 20, extract values between 9 and 15.

In [8]:
even_num = np.arange(2, 21, 2);
even_num

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

In [10]:
mask = (even_num >= 9) & (even_num <= 15)
mask

array([False, False, False, False,  True,  True,  True, False, False,
       False])

In [11]:
filtered = even_num[mask]
filtered

array([10, 12, 14])

# Practise Exercise 2

Given the two arrays:

- concatenate them into a single array.

- concatenate the result of the above with the even_num array from the previous exercise. Make sure that the even_nums appears first.

In [12]:
a = np.arange(10, 18)
b = np.arange(15, 25)

In [13]:
single_array = np.concatenate([a, b])
single_array

array([10, 11, 12, 13, 14, 15, 16, 17, 15, 16, 17, 18, 19, 20, 21, 22, 23,
       24])

In [14]:
add_even = np.concatenate([even_num, single_array])
add_even

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 10, 11, 12, 13, 14, 15, 16,
       17, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24])