# 10-process-data
> Importing, cleaning, testing, and saving data

This series of notebooks reflects operations to scrape, prepare, validate, and save the data.

#### Helpful packages and preliminaries

In [None]:
#data access and processing
import pandas as pd
import numpy as np

#Testing assertions
import great_expectations as ge

In [None]:
data_in_filename = ''
cleaned_data_filename = ''

### Using Great Expectations for data checking

First, we will import the data into a GreatExpectations dataset and take a look at it using head()

In [None]:
ge_df = ge.read_csv(cleaned_data_filename)

In [None]:
ge_df.head()

#### Set target column

In [None]:
target_column_name = ''
target_column = ge_df[target_column_name]

#### Checking value counts

In [None]:
target_column.value_counts()

#### Expecting column values to be in a set

In [None]:
expected_values = []
ge_df.expect_column_values_to_be_in_set(target_column_name,expected_values)

#### Expect column values to be *mostly* from a certain set

In [None]:
# Use the 'mostly' parameter to set an acceptable ratio of unexpected results
ratio_of_expected = 0.95
ge_df.expect_column_values_to_be_in_set(target_column_name,expected_values,mostly = ratio_of_expected)

#### Expect column values to be unique

In [None]:
ge_df.expect_column_values_to_be_unique(target_column_name)

#### Expect column values to not be null

In [None]:
ge_df.expect_column_values_to_not_be_null(target_column_name)

#### Expect column values to be in a certain range

In [None]:
expected_min = 0
expected_max = 0
ge_df.expect_column_values_to_be_between(target_column_name,min_value = expected_min,max_value=expected_max)

#### More possible expectations

The list of all possible expectations can be found at the following link: 
https://legacy.docs.greatexpectations.io/en/latest/reference/glossary_of_expectations.html#expectation-glossary