<center><h1>The `NA` Type in R</h1></center>
<center><h3>Paul Stey</h3></center>
<center><h3>2021-10-05</h3></center>


# 1. `NA` Type
  - Used to represent missing data
  - Frequently occurs in "real" data sets

In [None]:
a <- NA           # `NA` is the missing data literal

In [None]:
a + 4             # NA values propagate

In [None]:
(42 + a)/2        # returns NA

## 1.1 Checking for `NA`

  - Not necessarily obvious 

In [None]:
print(a)          # recall `a` is NA

In [None]:
a == NA           # equality check fails because of propagation

In [None]:
NA == NA          # ¯\_(ツ)_/¯

### 1.1.1 Correctly Checking for `NA` Values

In [None]:
is.na(a)          # the `is.na()` funciton lets us check for missingness

In [None]:
is.na(NA)         # yep

In [None]:
is.na(42)         # nope

## 1.2 Containers with `NA` Values

  - Propagated `NA`s can lead to surprising behavior

In [None]:
v <- c(3, 2, NA, 5)            # create vector with NA

In [None]:
mean(v)                        # recall the NA propagates

In [None]:
mean(v, na.rm = TRUE)          # `na.rm = TRUE` removes NAs

### 1.2.1 Finding `NA`s in `vector`

In [None]:
w <- c(4, 5, 33, NA, 7)

is.na(w)

# 2. Applied Example

 - We are given a large vector with many missing values
 - We want to replace the missing values with the mean of the non-missings

In [None]:
n <- 10000                      # make up our sample size

x <- rnorm(n)                   # simulate 10k draws from normal dist'n

n_miss <- 100                   # number of missing values to generate

x[sample(1:n, n_miss)] <- NA    # set `n_miss` values to be NA

## 2.1 Finding `NA` Values

  - We can use `is.na()` to find our missing values

In [None]:
print(x)

### 2.1.1 Replace `NA`s with Mean

In [None]:
idx_miss <- which(is.na(x))          # get indices of missing values

print(idx_miss)

In [None]:
mean_obs <- mean(x, na.rm = TRUE)   # get mean of observed values

In [None]:
x[idx_miss] <- mean_obs             # replace NAs with `mean_obs`

In [None]:
mean_obs == mean(x)                 # confirm that mean of `x` is same as `mean_obs`

In [None]:
any(is.na(x))                       # confirm there are no longer NA values in `x`