# Complete Case Analysis
Complete-case analysis (CCA), also called "list-wise deletion" of cases, consists of discarding observations with any missing values. In other words, we only keep observations with data on all the variables.

`CCA can be applied to both categorical and numerical variables.`

### Assumptions

CCA works well when the data is missing completely at random. In this scenario, excluding observations with missing information would be the same as randomly excluding some observations from the dataset. Therefore, the dataset after CCA remains a fair representation of the original dataset.

#### Advantages

* No data manipulation required.

* Preserves variable distribution (assumming data is missing at random).

#### Disadvantages

* It can discard a large fraction of the original dataset (if missing data is abundant).

* Excluded observations could be informative (if data is not missing at random).

* CCA will create a biased dataset (i.e., the complete dataset differs from the original data) if the data is not missing at     random.

* When using our models in production, the model will not be able to handle missing data.

#### When to use CCA

* Data is missing completely at random

* No more than 5% of the total dataset contains missing data

`In practice, CCA may be an acceptable method when the proportion of missing information is small. There is no rule of thumb to determine how much missing data is small. However, as general guidance, if the proportion of missing data is <= 5%, CCA could be a viable option.`

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# To display all columns of the dataframe.
pd.set_option('display.max_columns', None)

In [3]:
# Let's load the House Prices dataset and explore its shape (rows and columns).

data = pd.read_csv('train.csv')
data.shape

(1460, 81)