# Building a Data Validation Workflow with Great_Expecations

Prerequisites:
* Installed great_expectations (>=0.13.12)
* Set up the working depolyment of great_expectations by running the CLI command "great_expetations -v3-api init" 
* Set up the datasource by running the CLI command "great_expetations -v3-api datasource new"

In [1]:
import great_expectations as ge
import pandas as pd
from pandas_profiling import ProfileReport
from pandas_profiling.utils.cache import cache_file

## 1. Loading data

In [2]:
file_name = cache_file(
    "titanic.csv",
    "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
df = pd.read_csv(file_name)

## 2. Profiling data with pandas-profiler

In [3]:
profile = ProfileReport(df, title="Pandas Profiling Report", explorative=True)

## 3. Data Validation

### 3.1 Building the data_context

In [4]:
data_context = ge.data_context.DataContext() #ge.data_context.DataContext(context_root_dir="my_ge_root_directory/")

### 3.2. Generating the suite for validation from profiler report

In [5]:
suite = profile.to_expectation_suite(
    suite_name="titanic_expectations",
    data_context=data_context,
    save_suite=False,
    run_validation=False,
    build_data_docs=False,
)

Summarize dataset:   0%|          | 0/25 [00:00<?, ?it/s]

In [6]:
data_context.save_expectation_suite(suite)

### 3.3 Starting data validation

The following cell is used in legacy version when the action_list_operator needs to be defined in the great_expectations.yaml 

#### Creating a new checkpoint in CLI
> great_expectations --v3-api checkpoint new titanic_checkpoint

In [7]:
batch = ge.dataset.PandasDataset(df, expectation_suite=suite)
results = data_context.run_checkpoint(
    checkpoint_name="titanic_checkpoint"
)
validation_result_identifier = results.list_validation_result_identifiers()[0]

Calculating Metrics:   0%|          | 0/64 [00:00<?, ?it/s]

### 3.4 Building and opening data docs

In [8]:
data_context.build_data_docs()

{'local_site': 'file:///Users/yee/Dropbox/Python/ge_tutorials/great_expectations_pdprofiler/great_expectations/uncommitted/data_docs/local_site/index.html'}

In [9]:
data_context.open_data_docs(validation_result_identifier)