# From Expectations to Synthetic Data generation

## 3. Synthetic data & expectations

After the generation of the synthetic data, we need to assess the quality of the data. For the purpose of this flow we are only going to focus on the data Fidelity assesment both with `pandas-profiling` and `great-expectations`


### The dataset - Real and Synthetic data

In [3]:
import json
import pandas as pd

dataset_name = "Cardiovascular"
real = pd.read_csv('cardio.csv')
synth = pd.read_csv(f'synth_{dataset_name}')

#Read the json_profiling from the real data
f = open(f'.profile_{dataset_name}.json')
json_profile = json.load(f)
json_profile = json.loads(json_profile)

#### Profiling the synthetic data

In [4]:
from pandas_profiling import ProfileReport

In [5]:
title = f"Synth: {dataset_name}"
synth_profile = ProfileReport(synth, title=title)

#### Running the expectations

In [6]:
import great_expectations as ge

data_context = ge.data_context.DataContext(context_root_dir="great_expectations")

#Loading the previously build suite
suite = data_context.get_expectation_suite(f"{dataset_name}_expectations")

In [10]:
batch = ge.dataset.PandasDataset(synth, expectation_suite=suite)

In [11]:
results = data_context.run_validation_operator(
    "action_list_operator", assets_to_validate=[batch]
)
validation_result_identifier = results.list_validation_result_identifiers()[0]

In [12]:
#Building & openning the Data Docs
data_context.build_data_docs()
data_context.open_data_docs(validation_result_identifier)