# Quick start

*EPAM Syngen* is an unsupervised tabular data generation tool based on a variational autoencoder (VAE). 
It supports common tabular datatypes (floats, integers, datetime, text, categorical, binary) and can generate linked tables that sharing keys using the simple statistical approach. 
The SDK exposes simple programmatic entry points for training, inference, report generation, loading and saving data in supported formats - *CSV*, *Avro* and *Excel* format. The data should be located locally and be in UTF-8 encoding.

This notebook demonstrates the SDK usage. Install the package and then you can call the main SDK class `Syngen` to run training, inference or generation of reports, and the class `DataIO` to load and save the data in supported formats.

Python *3.10* or *3.11* is required to run the library. The library is tested on Linux and Windows operating systems.

# Installation

Please, install the library *syngen* (from Pypi):

In [1]:
!pip install  --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ --use-pep517 --no-cache-dir syngen==0.10.33rc0

Looking in indexes: https://test.pypi.org/simple/, https://pypi.org/simple/
Collecting syngen==0.10.33rc0
  Downloading https://test-files.pythonhosted.org/packages/52/a6/9ff634fd9839f9bc7b1a2cc5f91ae63e864b0a18c4355c355d704bd7fb36/syngen-0.10.33rc0-py3-none-any.whl.metadata (45 kB)
Collecting aiohttp>=3.10.11 (from syngen==0.10.33rc0)
  Downloading aiohttp-3.13.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (8.1 kB)
Collecting attrs (from syngen==0.10.33rc0)
  Downloading attrs-25.4.0-py3-none-any.whl.metadata (10 kB)
Collecting avro (from syngen==0.10.33rc0)
  Downloading avro-1.12.1-py2.py3-none-any.whl.metadata (1.6 kB)
Collecting base32-crockford (from syngen==0.10.33rc0)
  Downloading base32_crockford-0.3.0-py2.py3-none-any.whl.metadata (1.7 kB)
Collecting boto3 (from syngen==0.10.33rc0)
  Downloading boto3-1.42.21-py3-none-any.whl.metadata (6.8 kB)
Collecting click (from syngen==0.10.33rc0)
  Downloading click-8.3.1-py3-none-any.whl.m

# Class initialization

```python
Syngen(
    metadata_path: Optional[str] = None,  # use a metadata file in the '.yaml', '.yml' format for a training or an inference of one or multiple tables to centralize all parameters in one place
    table_name: Optional[str] = None,     # required for a single-table training or inference process; an arbitrary string used to name the directories where artifacts are stored
    source: Optional[str] = None,         # optional for a single-table training or inference process; a path to the file that you want to use as a reference
    loader: Optional[Callable[[str], pd.DataFrame]] # optional for a training or inference process of one or multiple tables; a callback function that returns the sample of the original data of a certain table
)
```
### Attributes description:

- **`metadata_path`** *(Optional[str], default: None)*: a path to a metadata file in *'.yaml'* or *'.yml'* format, used during a training or inference processes to centralize all parameters for one or multiple tables in one place.
- **`table_name`** *(Optional[str], default: None)*: a required parameter for a training or inference processes of a single table; an arbitrary string used to name the directories where artifacts are stored.
- **`source`** *(Optional[str], default: None)*: an optional parameter for a single-table training or inference process; a path to the file that you want to use as a reference.
- **`loader`** *(Optional[str], default: None)*: an optional parameter for a training or inference process of one or multiple tables; a callback function that returns the sample of the original data of a certain table.

***Note***: You can provide the information about required attributes in one of three ways:
1. Use `metadata_path` to define the a metadata of one or multiple tables and their relationships in one place.
2. Using `table_name` and `source`: For a single-table training or inference process, provide both `table_name` and `source` if the original data is supplied directly from a `source`.
3. Using `table_name` and `loader`: For a single-table training or inference process, provide both `table_name` and `loader` if the original data is supplied via a callback function (`loader`).
4. Using `metadata_path` and `loader`: For a training or inference process of one or multiple tables if the original data of tables is supplied via a callback function (`loader`).

In [2]:
# The example of the initialization of the instance of the class 'Syngen' for the single-table training or inference process
# by providing the `source` and 'table_name' parameters
from syngen.sdk import Syngen


launcher_for_single_table_with_source = Syngen(
    source="../examples/example-data/housing.csv", 
    table_name="housing"
)

2026-01-05 13:51:37.278531: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2026-01-05 13:51:37.278570: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2026-01-05 13:51:37.279502: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [11]:
# The example of the initialization of the instance of the class 'Syngen' for the single-table training or inference process 
# by providing the `loader` and 'table_name' parameters
import os
import pandas as pd

from syngen.sdk import Syngen



def get_dataframe(table_name: str) -> pd.DataFrame:
    """
    The demonstration callback function to load a CSV file 
    and get a pandas DataFrame from the 'example-data' directory.
    
    Parameters
    ----------
    table_name : str
        The name of the table (CSV file without extension) to load
    
    Returns
    -------
    pd.DataFrame
        The loaded data as a pandas DataFrame
    """
    path_to_example_data = f"../examples/example-data/{table_name}.csv"
    
    if not os.path.exists(path_to_example_data):
        raise FileNotFoundError(f"The CSV file '{table_name}.csv' does not exist")
    
    return pd.read_csv(path_to_example_data)



launcher_for_single_table_with_loader = Syngen(
    loader=get_dataframe, 
    table_name="housing"
)

In [12]:
# The example of the initialization of the instance of the class 'Syngen' for multiple tables
from syngen.sdk import Syngen


launcher_for_multiple_tables = Syngen(
    metadata_path="../examples/example-metadata/housing_metadata.yaml"
)

# Launch training

You can start a training process using the SDK entrypoint `Syngen().train(...)`. This will train a model and save the model artifacts to a disk in the directory *'./model_artifacts'*. The SDK mirrors the CLI options so you can pass the same parameters programmatically. Below is a complete description of all available parameters:

```python
train(
    self,
    epochs: int = 10,                     # a number of training epochs
    drop_null: bool = False,              # whether to drop rows with at least one missing value
    row_limit: Optional[int] = None,      # a number of rows to train over
    reports: Union[str, Tuple[str], List[str]] = "none",  # report types: "none", "accuracy", "sample", "metrics_only", "all"
    log_level: Literal["TRACE", "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"] = "INFO", # a logging level
    batch_size: int = 32,                 # a training batch size
    fernet_key: Optional[str] = None      # a name of the environment variable containing the Fernet key for secure storage of the data subset
)
```

### Parameters description:

- **`epochs`** *(int, default: 10)*: A number of training epochs. Must be ≥ 1. Since the early stopping mechanism is implemented the bigger value of epochs is the better.

- **`drop_null`** *(bool, default: False)*: Whether to drop rows containing at least one missing value before training. When `False`, missing values are handled during the training process.

- **`row_limit`** *(Optional[int], default: None)*: A maximum number of rows to use for training. If specified and less than the total rows, a random subset of the specified size will be selected. Useful for testing or working with large datasets.

- **`reports`** *(Union[str, Tuple[str], List[str]], default: "none")*: Controls generation of quality reports. Accepts single string or list of strings:
  - `"none"` - no reports generated (default)
  - `"accuracy"` - generates an accuracy report comparing synthetic data (same size as original) with original dataset to estimate the quality of training process
  - `"sample"` - generates a sample report showing distribution comparisons between the original data and the subset of this data
  - `"metrics_only"` - outputs metrics to stdout without generation of an accuracy report
  - `"all"` - generates both accuracy and sample reports

  List example: `["accuracy", "sample"]` to generate multiple report types

  *Note*: Report generation may require significant time for large tables (>10,000 rows)

- **`log_level`** *(str, default: "INFO")*: A logging level for the training process. Accepted values: `"TRACE"`, `"DEBUG"`, `"INFO"`, `"WARNING"`, `"ERROR"`, `"CRITICAL"`.

- **`batch_size`** *(int, default: 32)*: A training batch size. Must be ≥ 1. Splits training into batches to optimize memory usage. Smaller batches use less RAM but may increase training time.

- **`fernet_key`** *(Optional[str], default: None)*: A name of the environment variable containing a 44-character URL-safe base64-encoded Fernet key. When provided, the data subset is encrypted on a disk (stored in the `.dat` format). If not provided, data is stored unencrypted in the `.pkl` format. **Important**: The same key must be used during an inference and a report generation to decrypt the data.


*Note:* For full documentation, metadata file format, and additional details, please refer to [README.md](../README.md)

In [5]:
# The example of a training the single table on the provided data fetching it via the 'source' parameter


launcher_for_single_table_with_source.train(
    epochs=5,
    drop_null=False,
    row_limit=1000, 
    batch_size=100,
    reports="all",
    log_level="DEBUG"
)

[32m2026-01-05 13:58:26.039[0m | [34m[1mDEBUG   [0m | [36msyngen.ml.validation_schema.validation_schema[0m:[36mvalidate_schema[0m:[36m348[0m - [34m[1mThe schema of the metadata is valid[0m
[32m2026-01-05 13:58:26.044[0m | [1mINFO    [0m | [36msyngen.ml.config.validation[0m:[36m_collect_errors[0m:[36m435[0m - [1mThe validation of the metadata has been passed successfully[0m
[32m2026-01-05 13:58:26.045[0m | [1mINFO    [0m | [36msyngen.train[0m:[36mlaunch_train[0m:[36m104[0m - [1mThe training process will be executed according to the information mentioned in 'train_settings' in the metadata file. If appropriate information is absent from the metadata file, then the values of parameters sent through CLI will be used. Otherwise, the values of parameters will be defaulted.[0m
[32m2026-01-05 13:58:26.121[0m | [1mINFO    [0m | [36msyngen.ml.processors.processors[0m:[36m_preprocess_data[0m:[36m150[0m - [1mThe subset of rows was set to 1000[0m
[



[32m2026-01-05 13:58:32.433[0m | [1mINFO    [0m | [36msyngen.ml.vae.models.model[0m:[36mfit_sampler[0m:[36m187[0m - [1mCreating BayesianGaussianMixture[0m
[32m2026-01-05 13:58:32.434[0m | [1mINFO    [0m | [36msyngen.ml.vae.models.model[0m:[36mfit_sampler[0m:[36m189[0m - [1mFitting BayesianGaussianMixture[0m
[32m2026-01-05 13:58:34.673[0m | [1mINFO    [0m | [36msyngen.ml.vae.models.model[0m:[36mfit_sampler[0m:[36m191[0m - [1mFinished fitting BayesianGaussianMixture[0m
[32m2026-01-05 13:58:34.724[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36msave_state[0m:[36m545[0m - [1mSaved VAE state in model_artifacts/resources/housing/vae/checkpoints[0m
[32m2026-01-05 13:58:34.724[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36m__fit_model[0m:[36m191[0m - [1mFinished VAE training[0m
[32m2026-01-05 13:58:34.724[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m143[0m - [1mNo

 1/32 [..............................] - ETA: 4s

[32m2026-01-05 13:58:35.227[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36mload_state[0m:[36m554[0m - [1mLoaded VAE state from model_artifacts/resources/housing/vae/checkpoints[0m
[32m2026-01-05 13:58:35.228[0m | [34m[1mDEBUG   [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m490[0m - [34m[1mInfer model with parameters: size=1000, run_parallel=False, batch_size=1000, random_seed=1[0m
[32m2026-01-05 13:58:35.228[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m491[0m - [1mTotal of 1 batch(es)[0m
[32m2026-01-05 13:58:35.228[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 1 of 1[0m
[32m2026-01-05 13:58:35.228[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 13:58:35.229[0m | [1mINFO    [0m | [36ms



Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 7882.68it/s]
[32m2026-01-05 13:58:35.505[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 13:58:35.510[0m | [1mINFO    [0m | [36msyngen.ml.strategies.strategies[0m:[36mrun[0m:[36m236[0m - [1mSynthesis of the table - 'housing' was completed. Synthetic data saved in 'model_artifacts/tmp_store/housing/merged_infer_housing.csv'[0m
[32m2026-01-05 13:58:35.510[0m | [1mINFO    [0m | [36msyngen.ml.reporters.reporters[0m:[36m_log_and_update_progress[0m:[36m287[0m - [1mThe calculation of sample metrics for the table - 'housing' has started[0m
[32m2026-01-05 13:58:38.539[0m | [1mINFO    [0m | [36msyngen.ml.reporters.reporters[0m:[36m_log_and_update_progress[0m:[36m287[0m - [1mThe sample report of the table - 'housing' has been generated[0m
[32m2026-01-05 

In [13]:
# The example of training the single table on the provided data fetching it via the 'loader' parameter


launcher_for_single_table_with_loader.train(
    epochs=5,
    drop_null=False,
    row_limit=1000, 
    batch_size=100, 
    reports="all",
    log_level="DEBUG"
)

[32m2026-01-05 14:05:31.215[0m | [34m[1mDEBUG   [0m | [36msyngen.ml.validation_schema.validation_schema[0m:[36mvalidate_schema[0m:[36m348[0m - [34m[1mThe schema of the metadata is valid[0m
[32m2026-01-05 14:05:31.220[0m | [1mINFO    [0m | [36msyngen.ml.worker.worker[0m:[36m_remove_existed_artifact[0m:[36m104[0m - [1mThe artifacts located in the path - 'model_artifacts/resources/housing/' was removed[0m
[32m2026-01-05 14:05:31.220[0m | [1mINFO    [0m | [36msyngen.ml.config.validation[0m:[36m_collect_errors[0m:[36m435[0m - [1mThe validation of the metadata has been passed successfully[0m
[32m2026-01-05 14:05:31.222[0m | [1mINFO    [0m | [36msyngen.train[0m:[36mlaunch_train[0m:[36m104[0m - [1mThe training process will be executed according to the information mentioned in 'train_settings' in the metadata file. If appropriate information is absent from the metadata file, then the values of parameters sent through CLI will be used. Otherwise, 



[32m2026-01-05 14:05:37.243[0m | [1mINFO    [0m | [36msyngen.ml.vae.models.model[0m:[36mfit_sampler[0m:[36m187[0m - [1mCreating BayesianGaussianMixture[0m
[32m2026-01-05 14:05:37.244[0m | [1mINFO    [0m | [36msyngen.ml.vae.models.model[0m:[36mfit_sampler[0m:[36m189[0m - [1mFitting BayesianGaussianMixture[0m
[32m2026-01-05 14:05:39.193[0m | [1mINFO    [0m | [36msyngen.ml.vae.models.model[0m:[36mfit_sampler[0m:[36m191[0m - [1mFinished fitting BayesianGaussianMixture[0m
[32m2026-01-05 14:05:39.236[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36msave_state[0m:[36m545[0m - [1mSaved VAE state in model_artifacts/resources/housing/vae/checkpoints[0m
[32m2026-01-05 14:05:39.236[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36m__fit_model[0m:[36m191[0m - [1mFinished VAE training[0m
[32m2026-01-05 14:05:39.237[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m143[0m - [1mNo

 1/32 [..............................] - ETA: 4s

[32m2026-01-05 14:05:39.692[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36mload_state[0m:[36m554[0m - [1mLoaded VAE state from model_artifacts/resources/housing/vae/checkpoints[0m
[32m2026-01-05 14:05:39.693[0m | [34m[1mDEBUG   [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m490[0m - [34m[1mInfer model with parameters: size=1000, run_parallel=False, batch_size=1000, random_seed=1[0m
[32m2026-01-05 14:05:39.693[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m491[0m - [1mTotal of 1 batch(es)[0m
[32m2026-01-05 14:05:39.693[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 1 of 1[0m
[32m2026-01-05 14:05:39.693[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:05:39.693[0m | [1mINFO    [0m | [36ms



Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 8929.23it/s]
[32m2026-01-05 14:05:39.956[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:05:39.961[0m | [1mINFO    [0m | [36msyngen.ml.strategies.strategies[0m:[36mrun[0m:[36m236[0m - [1mSynthesis of the table - 'housing' was completed. Synthetic data saved in 'model_artifacts/tmp_store/housing/merged_infer_housing.csv'[0m
[32m2026-01-05 14:05:39.962[0m | [1mINFO    [0m | [36msyngen.ml.reporters.reporters[0m:[36m_log_and_update_progress[0m:[36m287[0m - [1mThe calculation of sample metrics for the table - 'housing' has started[0m
[32m2026-01-05 14:05:39.978[0m | [1mINFO    [0m | [36msyngen.ml.data_loaders.dataframe_fetcher[0m:[36mfetch_data[0m:[36m21[0m - [1mSuccessfully fetched dataframe for table: housing[0m
[32m2026-01-05 14:05:44.071[0m

In [14]:
launcher_for_single_table_with_loader.execution_artifacts

{'housing': {'losses_path': 'model_artifacts/system_store/losses/losses-housing-2026-01-05-14-05-31-243731.csv',
  'path_to_input_data': 'model_artifacts/tmp_store/housing/input_data_housing.pkl',
  'generated_reports': {'sample_report': 'model_artifacts/resources/housing/reports/sample-report-2026_01_05_14_05_44_070491.html',
   'accuracy_report': 'model_artifacts/resources/housing/reports/accuracy-report-2026_01_05_14_06_05_805637.html'}}}

In [7]:
# The example of training of multiple tables with relationships


launcher_for_multiple_tables.train(log_level="DEBUG")

[32m2026-01-05 14:01:03.815[0m | [34m[1mDEBUG   [0m | [36msyngen.ml.validation_schema.validation_schema[0m:[36mvalidate_schema[0m:[36m348[0m - [34m[1mThe schema of the metadata is valid[0m
[32m2026-01-05 14:01:03.817[0m | [1mINFO    [0m | [36msyngen.ml.worker.worker[0m:[36m_remove_existed_artifact[0m:[36m104[0m - [1mThe artifacts located in the path - 'model_artifacts/resources/housing-properties/' was removed[0m
[32m2026-01-05 14:01:03.820[0m | [1mINFO    [0m | [36msyngen.ml.config.validation[0m:[36m_collect_errors[0m:[36m435[0m - [1mThe validation of the metadata has been passed successfully[0m
[32m2026-01-05 14:01:03.820[0m | [1mINFO    [0m | [36msyngen.train[0m:[36mlaunch_train[0m:[36m104[0m - [1mThe training process will be executed according to the information mentioned in 'train_settings' in the metadata file. If appropriate information is absent from the metadata file, then the values of parameters sent through CLI will be used. 



[32m2026-01-05 14:01:19.121[0m | [1mINFO    [0m | [36msyngen.ml.vae.models.model[0m:[36mfit_sampler[0m:[36m187[0m - [1mCreating BayesianGaussianMixture[0m
[32m2026-01-05 14:01:19.122[0m | [1mINFO    [0m | [36msyngen.ml.vae.models.model[0m:[36mfit_sampler[0m:[36m189[0m - [1mFitting BayesianGaussianMixture[0m
[32m2026-01-05 14:01:20.183[0m | [1mINFO    [0m | [36msyngen.ml.vae.models.model[0m:[36mfit_sampler[0m:[36m191[0m - [1mFinished fitting BayesianGaussianMixture[0m
[32m2026-01-05 14:01:20.219[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36msave_state[0m:[36m545[0m - [1mSaved VAE state in model_artifacts/resources/housing-properties/vae/checkpoints[0m
[32m2026-01-05 14:01:20.219[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36m__fit_model[0m:[36m191[0m - [1mFinished VAE training[0m
[32m2026-01-05 14:01:20.220[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m143[



[32m2026-01-05 14:01:35.951[0m | [1mINFO    [0m | [36msyngen.ml.vae.models.model[0m:[36mfit_sampler[0m:[36m187[0m - [1mCreating BayesianGaussianMixture[0m
[32m2026-01-05 14:01:35.952[0m | [1mINFO    [0m | [36msyngen.ml.vae.models.model[0m:[36mfit_sampler[0m:[36m189[0m - [1mFitting BayesianGaussianMixture[0m
[32m2026-01-05 14:01:37.258[0m | [1mINFO    [0m | [36msyngen.ml.vae.models.model[0m:[36mfit_sampler[0m:[36m191[0m - [1mFinished fitting BayesianGaussianMixture[0m
[32m2026-01-05 14:01:37.285[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36msave_state[0m:[36m545[0m - [1mSaved VAE state in model_artifacts/resources/housing-conditions/vae/checkpoints[0m
[32m2026-01-05 14:01:37.286[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36m__fit_model[0m:[36m191[0m - [1mFinished VAE training[0m
[32m2026-01-05 14:01:37.286[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m143[



[32m2026-01-05 14:01:37.633[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36mload_state[0m:[36m554[0m - [1mLoaded VAE state from model_artifacts/resources/housing-properties/vae/checkpoints[0m
[32m2026-01-05 14:01:37.634[0m | [34m[1mDEBUG   [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m490[0m - [34m[1mInfer model with parameters: size=790, run_parallel=False, batch_size=790, random_seed=1[0m
[32m2026-01-05 14:01:37.634[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m491[0m - [1mTotal of 1 batch(es)[0m
[32m2026-01-05 14:01:37.634[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_properties'. Generating the batch 1 of 1[0m
[32m2026-01-05 14:01:37.634[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:01:37.635[0m | [1mI



[32m2026-01-05 14:01:38.043[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36mload_state[0m:[36m554[0m - [1mLoaded VAE state from model_artifacts/resources/housing-conditions/vae/checkpoints[0m
[32m2026-01-05 14:01:38.043[0m | [34m[1mDEBUG   [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m490[0m - [34m[1mInfer model with parameters: size=1799, run_parallel=False, batch_size=1000, random_seed=1[0m
[32m2026-01-05 14:01:38.044[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m491[0m - [1mTotal of 2 batch(es)[0m
[32m2026-01-05 14:01:38.044[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_conditions'. Generating the batch 1 of 2[0m
[32m2026-01-05 14:01:38.044[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:01:38.044[0m | [1

 1/25 [>.............................] - ETA: 0s


[32m2026-01-05 14:01:38.187[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_conditions'. Generating the batch 2 of 2[0m
[32m2026-01-05 14:01:38.187[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:01:38.188[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_conditions' started.[0m




Generation of the data...: 100%|██████████| 3/3 [00:00<00:00, 3974.39it/s]
[32m2026-01-05 14:01:38.280[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mgenerate_keys[0m:[36m428[0m - [1mThe 'households' assigned as a foreign_key feature[0m
[32m2026-01-05 14:01:38.302[0m | [1mINFO    [0m | [36msyngen.ml.strategies.strategies[0m:[36mrun[0m:[36m236[0m - [1mSynthesis of the table - 'housing_conditions' was completed. Synthetic data saved in 'model_artifacts/tmp_store/housing-conditions/merged_infer_housing-conditions.csv'[0m
[32m2026-01-05 14:01:38.303[0m | [1mINFO    [0m | [36msyngen.ml.reporters.reporters[0m:[36m_log_and_update_progress[0m:[36m287[0m - [1mThe calculation of sample metrics for the table - 'housing_conditions' has started[0m
[32m2026-01-05 14:01:39.129[0m | [1mINFO    [0m | [36msyngen.ml.reporters.reporters[0m:[36m_log_and_update_progress[0m:[36m287[0m - [1mThe sample report of the table - 'housing_conditions' has b

The *'execution_artifacts'* attribute of the **Syngen** class provides information about the generated artifacts during the training process:

In [8]:
from pprint import pprint

pprint(launcher_for_multiple_tables.execution_artifacts)

{'housing_conditions': {'generated_reports': {'accuracy_report': 'model_artifacts/resources/housing-conditions/reports/accuracy-report-2026_01_05_14_01_43_779222.html',
                                              'sample_report': 'model_artifacts/resources/housing-conditions/reports/sample-report-2026_01_05_14_01_39_128961.html'},
                        'losses_path': 'model_artifacts/system_store/losses/losses-housing-conditions-2026-01-05-14-01-20-232256.csv',
                        'path_to_input_data': 'model_artifacts/tmp_store/housing-conditions/input_data_housing-conditions.pkl'},
 'housing_properties': {'generated_reports': {'accuracy_report': 'model_artifacts/resources/housing-properties/reports/accuracy-report-2026_01_05_14_01_53_397026.html',
                                              'sample_report': 'model_artifacts/resources/housing-properties/reports/sample-report-2026_01_05_14_01_44_986258.html'},
                        'losses_path': 'model_artifacts/system_sto

# Launch generation of synthetic data

You can start an inference process using the SDK entrypoint `Syngen().infer(...)`. The SDK mirrors the CLI options so you can pass the same parameters programmatically. Below is a complete description of all available parameters:

```python
infer(
    self,
    size: int = 100,                      # the desired number of rows to generate
    run_parallel: bool = False,           # whether to use multiprocessing (feasible for tables > 50000 rows)
    batch_size: Optional[int] = None,     # an inference batch size
    random_seed: Optional[int] = None,    # if specified, generates a reproducible result
    reports: Union[str, List[str]] = "none",  # report types: "none", "accuracy", "metrics_only", "all"
    log_level: str = "INFO",              # a logging level
    fernet_key: Optional[str] = None      # a name of the environment variable containing the Fernet key for decrypting the data subset
)
```
### Parameters description:

- **`size`** *(int, default: 100)*: The desired number of synthetic rows to generate. Must be ≥ 1.

- **`run_parallel`** *(bool, default: False)*: Whether to use multiprocessing for data generation. Set to `True` to enable parallel processing, which is recommended and feasible for generating large tables (> 50000 rows).

- **`batch_size`** *(Optional[int], default: None)*: The inference batch size. Must be ≥ 1. If specified, the generation is split into batches to optimize memory usage and save RAM.

- **`random_seed`** *(Optional[int], default: None)*: A random seed for reproducible generation. Must be ≥ 0.

- **`reports`** *(Union[str, Tuple[str], List[str]], default: "none")*: Controls generation of quality reports. Accepts single string or list of strings:
  - `"none"` - no reports generated (default)
  - `"accuracy"` - generates an accuracy report comparing original and synthetic data patterns to verify quality of a generated data
  - `"metrics_only"` - outputs metrics to stdout without generating an accuracy report
  - `"all"` - generates an accuracy report (same as `"accuracy"`)

  List example: `["accuracy", "metrics_only"]` to generate multiple report types

  *Note*: Report generation may require significant time for large generated tables (>10,000 rows)

- **`log_level`** *(str, default: "INFO")*: A logging level for the inference process. Accepted values: `"TRACE"`, `"DEBUG"`, `"INFO"`, `"WARNING"`, `"ERROR"`, `"CRITICAL"`.

- **`fernet_key`** *(Optional[str], default: None)*: A name of the environment variable containing a 44-character URL-safe base64-encoded Fernet key. When provided, the data subset is decrypted for a report generation. **Important**: The same key used during a training must be used during a report generation to successfully decrypt the data.


*Note:* For full documentation, metadata file format, and additional details, please refer to [README.md](../README.md)

In [15]:
# The example of inference for the single table

launcher_for_single_table_with_source.infer(
    size=100000,
    run_parallel=False,
    batch_size=5000,
    random_seed=42,
    reports="all"
)

[32m2026-01-05 14:18:24.290[0m | [1mINFO    [0m | [36msyngen.ml.worker.worker[0m:[36m_remove_existed_artifact[0m:[36m104[0m - [1mThe artifacts located in the path - 'model_artifacts/tmp_store/housing/merged_infer_housing.csv' was removed[0m
[32m2026-01-05 14:18:24.291[0m | [1mINFO    [0m | [36msyngen.ml.worker.worker[0m:[36m_remove_existed_artifact[0m:[36m104[0m - [1mThe artifacts located in the path - 'model_artifacts/tmp_store/housing/infer_message.success' was removed[0m
[32m2026-01-05 14:18:24.291[0m | [1mINFO    [0m | [36msyngen.ml.config.validation[0m:[36m_collect_errors[0m:[36m435[0m - [1mThe validation of the metadata has been passed successfully[0m
[32m2026-01-05 14:18:24.292[0m | [1mINFO    [0m | [36msyngen.infer[0m:[36mlaunch_infer[0m:[36m74[0m - [1mThe inference process will be executed according to the information mentioned in 'infer_settings' in the metadata file. If appropriate information is absent from the metadata file, t

  1/157 [..............................] - ETA: 21s

[32m2026-01-05 14:18:24.777[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36mload_state[0m:[36m554[0m - [1mLoaded VAE state from model_artifacts/resources/housing/vae/checkpoints[0m
[32m2026-01-05 14:18:24.777[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m491[0m - [1mTotal of 20 batch(es)[0m
[32m2026-01-05 14:18:24.778[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 1 of 20[0m
[32m2026-01-05 14:18:24.778[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:24.778[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4564.44it/s]




[32m2026-01-05 14:18:25.380[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:25.380[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 2 of 20[0m
[32m2026-01-05 14:18:25.380[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:25.381[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4705.97it/s]




[32m2026-01-05 14:18:25.832[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:25.833[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 3 of 20[0m
[32m2026-01-05 14:18:25.833[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:25.833[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4631.80it/s]




[32m2026-01-05 14:18:26.296[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:26.296[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 4 of 20[0m
[32m2026-01-05 14:18:26.297[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:26.297[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4833.67it/s]




[32m2026-01-05 14:18:26.750[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:26.751[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 5 of 20[0m
[32m2026-01-05 14:18:26.751[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:26.751[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4955.14it/s]




[32m2026-01-05 14:18:27.200[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:27.200[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 6 of 20[0m
[32m2026-01-05 14:18:27.200[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:27.201[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4673.56it/s]




[32m2026-01-05 14:18:27.652[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:27.652[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 7 of 20[0m
[32m2026-01-05 14:18:27.652[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:27.652[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 5041.23it/s]




[32m2026-01-05 14:18:28.105[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:28.105[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 8 of 20[0m
[32m2026-01-05 14:18:28.105[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:28.105[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4792.00it/s]




[32m2026-01-05 14:18:28.556[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:28.556[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 9 of 20[0m
[32m2026-01-05 14:18:28.556[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:28.557[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4343.97it/s]




[32m2026-01-05 14:18:29.024[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:29.025[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 10 of 20[0m
[32m2026-01-05 14:18:29.025[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:29.025[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 5061.14it/s]




[32m2026-01-05 14:18:29.477[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:29.478[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 11 of 20[0m
[32m2026-01-05 14:18:29.478[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:29.478[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4957.27it/s]




[32m2026-01-05 14:18:29.930[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:29.930[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 12 of 20[0m
[32m2026-01-05 14:18:29.930[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:29.930[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4731.55it/s]




[32m2026-01-05 14:18:30.391[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:30.391[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 13 of 20[0m
[32m2026-01-05 14:18:30.391[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:30.392[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4973.30it/s]




[32m2026-01-05 14:18:30.857[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:30.857[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 14 of 20[0m
[32m2026-01-05 14:18:30.857[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:30.857[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4665.05it/s]




[32m2026-01-05 14:18:31.315[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:31.315[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 15 of 20[0m
[32m2026-01-05 14:18:31.315[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:31.316[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4906.14it/s]




[32m2026-01-05 14:18:31.769[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:31.769[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 16 of 20[0m
[32m2026-01-05 14:18:31.769[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:31.770[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4971.70it/s]




[32m2026-01-05 14:18:32.217[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:32.217[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 17 of 20[0m
[32m2026-01-05 14:18:32.217[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:32.218[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 5031.88it/s]




[32m2026-01-05 14:18:32.678[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:32.679[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 18 of 20[0m
[32m2026-01-05 14:18:32.679[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:32.679[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4805.97it/s]




[32m2026-01-05 14:18:33.146[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:33.147[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 19 of 20[0m
[32m2026-01-05 14:18:33.147[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:33.147[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4961.00it/s]




[32m2026-01-05 14:18:33.595[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:33.595[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 20 of 20[0m
[32m2026-01-05 14:18:33.596[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:18:33.596[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4457.71it/s]
[32m2026-01-05 14:18:34.078[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:18:34.488[0m | [1mINFO    [0m | [36msyngen.ml.strategies.strategies[0m:[36mrun[0m:[36m236[0m - [1mSynthesis of the table - 'housing' was completed. Synthetic data saved in 'model_artifacts/tmp_store/housing/merged_infer_housing.csv'[0m
[32m2026-01-05 14:18:34.489[0m | [1mINFO    [0m | [36msyngen.ml.reporters.reporters[0m:[36m_log_and_update_progress[0m:[36m287[0m - [1mThe calculation of accuracy metrics for the table - 'housing' has started[0m
[32m2026-01-05 14:18:35.911[0m | [1mINFO    [0m | [36msyngen.ml.metrics.accuracy_test.accuracy_test[0m:[36m_fetch_metrics[0m:[36m193[0m - [1mMedian accuracy is 0.8454[0m
Generating bivariate distributions...: 100%|

In [16]:
launcher_for_single_table_with_source.execution_artifacts

{'housing': {'path_to_input_data': 'model_artifacts/tmp_store/housing/input_data_housing.pkl',
  'path_to_generated_data': 'model_artifacts/tmp_store/housing/merged_infer_housing.csv',
  'generated_reports': {'accuracy_report': 'model_artifacts/tmp_store/housing/reports/accuracy-report-2026_01_05_14_19_03_500677.html'}}}

In [17]:
# The example of inference for the single table

launcher_for_single_table_with_loader.infer(
    size=100000,
    run_parallel=False,
    batch_size=5000,
    random_seed=42,
    reports="all"
)

[32m2026-01-05 14:19:26.392[0m | [1mINFO    [0m | [36msyngen.ml.worker.worker[0m:[36m_remove_existed_artifact[0m:[36m104[0m - [1mThe artifacts located in the path - 'model_artifacts/tmp_store/housing/merged_infer_housing.csv' was removed[0m
[32m2026-01-05 14:19:26.393[0m | [1mINFO    [0m | [36msyngen.ml.worker.worker[0m:[36m_remove_existed_artifact[0m:[36m104[0m - [1mThe artifacts located in the path - 'model_artifacts/tmp_store/housing/infer_message.success' was removed[0m
[32m2026-01-05 14:19:26.394[0m | [1mINFO    [0m | [36msyngen.ml.config.validation[0m:[36m_collect_errors[0m:[36m435[0m - [1mThe validation of the metadata has been passed successfully[0m
[32m2026-01-05 14:19:26.394[0m | [1mINFO    [0m | [36msyngen.infer[0m:[36mlaunch_infer[0m:[36m74[0m - [1mThe inference process will be executed according to the information mentioned in 'infer_settings' in the metadata file. If appropriate information is absent from the metadata file, t

  1/157 [..............................] - ETA: 21s

[32m2026-01-05 14:19:26.864[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36mload_state[0m:[36m554[0m - [1mLoaded VAE state from model_artifacts/resources/housing/vae/checkpoints[0m
[32m2026-01-05 14:19:26.864[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m491[0m - [1mTotal of 20 batch(es)[0m
[32m2026-01-05 14:19:26.864[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 1 of 20[0m
[32m2026-01-05 14:19:26.864[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:26.865[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 3877.09it/s]




[32m2026-01-05 14:19:27.445[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:27.446[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 2 of 20[0m
[32m2026-01-05 14:19:27.446[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:27.446[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4068.55it/s]




[32m2026-01-05 14:19:27.904[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:27.905[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 3 of 20[0m
[32m2026-01-05 14:19:27.905[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:27.905[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4310.69it/s]




[32m2026-01-05 14:19:28.373[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:28.374[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 4 of 20[0m
[32m2026-01-05 14:19:28.374[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:28.374[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4269.60it/s]




[32m2026-01-05 14:19:28.823[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:28.823[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 5 of 20[0m
[32m2026-01-05 14:19:28.823[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:28.823[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4563.54it/s]




[32m2026-01-05 14:19:29.285[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:29.285[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 6 of 20[0m
[32m2026-01-05 14:19:29.285[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:29.286[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 3945.39it/s]




[32m2026-01-05 14:19:29.741[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:29.741[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 7 of 20[0m
[32m2026-01-05 14:19:29.741[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:29.742[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4635.52it/s]




[32m2026-01-05 14:19:30.199[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:30.200[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 8 of 20[0m
[32m2026-01-05 14:19:30.200[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:30.200[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4520.17it/s]




[32m2026-01-05 14:19:30.660[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:30.661[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 9 of 20[0m
[32m2026-01-05 14:19:30.661[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:30.661[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4325.65it/s]




[32m2026-01-05 14:19:31.128[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:31.129[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 10 of 20[0m
[32m2026-01-05 14:19:31.129[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:31.129[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4972.23it/s]




[32m2026-01-05 14:19:31.587[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:31.587[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 11 of 20[0m
[32m2026-01-05 14:19:31.587[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:31.587[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 3612.95it/s]




[32m2026-01-05 14:19:32.038[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:32.039[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 12 of 20[0m
[32m2026-01-05 14:19:32.039[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:32.039[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4518.84it/s]




[32m2026-01-05 14:19:32.500[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:32.501[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 13 of 20[0m
[32m2026-01-05 14:19:32.501[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:32.501[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4858.61it/s]




[32m2026-01-05 14:19:32.964[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:32.964[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 14 of 20[0m
[32m2026-01-05 14:19:32.964[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:32.964[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4772.66it/s]




[32m2026-01-05 14:19:33.424[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:33.425[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 15 of 20[0m
[32m2026-01-05 14:19:33.425[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:33.425[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4664.11it/s]




[32m2026-01-05 14:19:33.887[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:33.888[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 16 of 20[0m
[32m2026-01-05 14:19:33.888[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:33.888[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 5006.22it/s]




[32m2026-01-05 14:19:34.332[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:34.332[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 17 of 20[0m
[32m2026-01-05 14:19:34.332[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:34.332[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4956.21it/s]




[32m2026-01-05 14:19:34.777[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:34.777[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 18 of 20[0m
[32m2026-01-05 14:19:34.777[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:34.778[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4446.54it/s]




[32m2026-01-05 14:19:35.238[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:35.238[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 19 of 20[0m
[32m2026-01-05 14:19:35.238[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:35.239[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4261.32it/s]




[32m2026-01-05 14:19:35.703[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:35.703[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing'. Generating the batch 20 of 20[0m
[32m2026-01-05 14:19:35.703[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:19:35.703[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4578.48it/s]
[32m2026-01-05 14:19:36.155[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:19:36.513[0m | [1mINFO    [0m | [36msyngen.ml.strategies.strategies[0m:[36mrun[0m:[36m236[0m - [1mSynthesis of the table - 'housing' was completed. Synthetic data saved in 'model_artifacts/tmp_store/housing/merged_infer_housing.csv'[0m
[32m2026-01-05 14:19:36.514[0m | [1mINFO    [0m | [36msyngen.ml.reporters.reporters[0m:[36m_log_and_update_progress[0m:[36m287[0m - [1mThe calculation of accuracy metrics for the table - 'housing' has started[0m
[32m2026-01-05 14:19:37.791[0m | [1mINFO    [0m | [36msyngen.ml.metrics.accuracy_test.accuracy_test[0m:[36m_fetch_metrics[0m:[36m193[0m - [1mMedian accuracy is 0.8454[0m
Generating bivariate distributions...: 100%|

In [18]:
launcher_for_single_table_with_loader.execution_artifacts

{'housing': {'path_to_input_data': 'model_artifacts/tmp_store/housing/input_data_housing.pkl',
  'path_to_generated_data': 'model_artifacts/tmp_store/housing/merged_infer_housing.csv',
  'generated_reports': {'accuracy_report': 'model_artifacts/tmp_store/housing/reports/accuracy-report-2026_01_05_14_20_04_759854.html'}}}

In [19]:
# The example of inference of multiple tables with relationships


launcher_for_multiple_tables.infer(log_level="DEBUG")

[32m2026-01-05 14:20:17.683[0m | [34m[1mDEBUG   [0m | [36msyngen.ml.validation_schema.validation_schema[0m:[36mvalidate_schema[0m:[36m348[0m - [34m[1mThe schema of the metadata is valid[0m
[32m2026-01-05 14:20:17.683[0m | [1mINFO    [0m | [36msyngen.ml.worker.worker[0m:[36m_remove_existed_artifact[0m:[36m104[0m - [1mThe artifacts located in the path - 'model_artifacts/tmp_store/housing-properties/merged_infer_housing-properties.csv' was removed[0m
[32m2026-01-05 14:20:17.683[0m | [1mINFO    [0m | [36msyngen.ml.worker.worker[0m:[36m_remove_existed_artifact[0m:[36m104[0m - [1mThe artifacts located in the path - 'model_artifacts/tmp_store/housing-properties/infer_message.success' was removed[0m
[32m2026-01-05 14:20:17.683[0m | [1mINFO    [0m | [36msyngen.ml.worker.worker[0m:[36m_remove_existed_artifact[0m:[36m104[0m - [1mThe artifacts located in the path - 'model_artifacts/tmp_store/housing-conditions/merged_infer_housing-conditions.csv' wa



[32m2026-01-05 14:20:18.042[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36mload_state[0m:[36m554[0m - [1mLoaded VAE state from model_artifacts/resources/housing-properties/vae/checkpoints[0m
[32m2026-01-05 14:20:18.043[0m | [34m[1mDEBUG   [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m490[0m - [34m[1mInfer model with parameters: size=90, run_parallel=False, batch_size=90, random_seed=10, reports - 'accuracy'[0m
[32m2026-01-05 14:20:18.043[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m491[0m - [1mTotal of 1 batch(es)[0m
[32m2026-01-05 14:20:18.043[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_properties'. Generating the batch 1 of 1[0m
[32m2026-01-05 14:20:18.043[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:



[32m2026-01-05 14:20:18.418[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36mload_state[0m:[36m554[0m - [1mLoaded VAE state from model_artifacts/resources/housing-conditions/vae/checkpoints[0m
[32m2026-01-05 14:20:18.419[0m | [34m[1mDEBUG   [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m490[0m - [34m[1mInfer model with parameters: size=90, run_parallel=False, batch_size=90, random_seed=10, reports - 'accuracy'[0m
[32m2026-01-05 14:20:18.419[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m491[0m - [1mTotal of 1 batch(es)[0m
[32m2026-01-05 14:20:18.419[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_conditions'. Generating the batch 1 of 1[0m
[32m2026-01-05 14:20:18.419[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:

The *'execution_artifacts'* attribute of the **Syngen** class provides information about the generated artifacts during the inference process:

In [20]:
pprint(launcher_for_multiple_tables.execution_artifacts)

{'housing_conditions': {'generated_reports': {'accuracy_report': 'model_artifacts/tmp_store/housing-conditions/reports/accuracy-report-2026_01_05_14_20_23_407937.html'},
                        'path_to_generated_data': 'model_artifacts/tmp_store/housing-conditions/merged_infer_housing-conditions.csv',
                        'path_to_input_data': 'model_artifacts/tmp_store/housing-conditions/input_data_housing-conditions.pkl'},
 'housing_properties': {'generated_reports': {'accuracy_report': 'model_artifacts/tmp_store/housing-properties/reports/accuracy-report-2026_01_05_14_20_31_359266.html'},
                        'path_to_generated_data': 'model_artifacts/tmp_store/housing-properties/merged_infer_housing-properties.csv',
                        'path_to_input_data': 'model_artifacts/tmp_store/housing-properties/input_data_housing-properties.pkl'}}


# Data security: Using Fernet Key for encryption

In the current implementation, a sample of the original data is stored on a disk during a training process. To ensure data security and protect sensitive information, you can use a **Fernet key** to encrypt this data.

## What is a Fernet Key?

A Fernet key is a 44-character URL-safe base64-encoded string used for symmetric encryption. When provided, the data subset is encrypted on a disk (stored in the `.dat` format instead of unencrypted `.pkl` format).

## How to Generate a Fernet Key

You can generate a Fernet key using the following code:

In [21]:
# Generate a Fernet key
from cryptography.fernet import Fernet

fernet_key = Fernet.generate_key().decode("utf-8")

## Setting the Fernet Key as an environment variable

After generating the key, you need to store it as an environment variable. This can be done in your terminal or programmatically in Python.

### Option 1: Set in Terminal (Linux/macOS)

```bash
export MY_FERNET_KEY='your_generated_fernet_key_here'
```

### Option 2: Set in Terminal (Windows)

```cmd
set MY_FERNET_KEY=your_generated_fernet_key_here
```

### Option 3: Set programmatically in Python

```python
import os
os.environ['MY_FERNET_KEY'] = 'your_generated_fernet_key_here'
```

## Using the Fernet Key in a training

When training with encryption, pass the name of the environment variable (not the Fernet key itself) to the `fernet_key` parameter:

In [22]:
# The example: the training with the Fernet key encryption

import os
from syngen.sdk import Syngen

# Step 1: Set the Fernet key as an environment variable
os.environ["MY_FERNET_KEY"] = fernet_key  # Using the key generated above

# Step 2: Train with encryption enabled

launcher_for_encrypted_data = Syngen(
    source="../examples/example-data/housing.csv", 
    table_name="housing_encrypted"
)

launcher_for_encrypted_data.train(
    epochs=5,
    row_limit=1000, 
    batch_size=32, 
    fernet_key="MY_FERNET_KEY"  # Pass the environment variable name, not the key itself
)

[32m2026-01-05 14:24:10.227[0m | [1mINFO    [0m | [36msyngen.ml.config.validation[0m:[36m_collect_errors[0m:[36m435[0m - [1mThe validation of the metadata has been passed successfully[0m
[32m2026-01-05 14:24:10.229[0m | [1mINFO    [0m | [36msyngen.train[0m:[36mlaunch_train[0m:[36m104[0m - [1mThe training process will be executed according to the information mentioned in 'train_settings' in the metadata file. If appropriate information is absent from the metadata file, then the values of parameters sent through CLI will be used. Otherwise, the values of parameters will be defaulted.[0m
[32m2026-01-05 14:24:10.301[0m | [1mINFO    [0m | [36msyngen.ml.processors.processors[0m:[36m_preprocess_data[0m:[36m150[0m - [1mThe subset of rows was set to 1000[0m
[32m2026-01-05 14:24:10.305[0m | [1mINFO    [0m | [36msyngen.ml.worker.worker[0m:[36m_train_table[0m:[36m432[0m - [1mTraining process of the table - 'housing_encrypted' has started[0m
[32m2026



[32m2026-01-05 14:24:27.599[0m | [1mINFO    [0m | [36msyngen.ml.vae.models.model[0m:[36mfit_sampler[0m:[36m187[0m - [1mCreating BayesianGaussianMixture[0m
[32m2026-01-05 14:24:27.600[0m | [1mINFO    [0m | [36msyngen.ml.vae.models.model[0m:[36mfit_sampler[0m:[36m189[0m - [1mFitting BayesianGaussianMixture[0m
[32m2026-01-05 14:24:29.227[0m | [1mINFO    [0m | [36msyngen.ml.vae.models.model[0m:[36mfit_sampler[0m:[36m191[0m - [1mFinished fitting BayesianGaussianMixture[0m
[32m2026-01-05 14:24:29.272[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36msave_state[0m:[36m545[0m - [1mSaved VAE state in model_artifacts/resources/housing-encrypted/vae/checkpoints[0m
[32m2026-01-05 14:24:29.272[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36m__fit_model[0m:[36m191[0m - [1mFinished VAE training[0m
[32m2026-01-05 14:24:29.272[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m143[0

In [23]:
launcher_for_encrypted_data.execution_artifacts

{'housing_encrypted': {'losses_path': 'model_artifacts/system_store/losses/losses-housing-encrypted-2026-01-05-14-24-10-306201.csv',
  'path_to_input_data': 'model_artifacts/tmp_store/housing-encrypted/input_data_housing-encrypted.dat',
  'generated_reports': {}}}

## Using the Fernet Key in an inference

**Important**: When generating synthetic data during an inference process, you must use the **same Fernet key** that was used during a training process. This allows the system to decrypt the stored data subset for a report generation.

If the Fernet key is not provided or doesn't match the Fernet key used in the training process, the inference process will fail when trying to access the encrypted data.

In [24]:
# Example: Inference with Fernet key decryption
# The environment variable 'MY_FERNET_KEY' is already set from the training step

# Inference with the same Fernet key
launcher_for_encrypted_data.infer(
    size=100000,
    batch_size=5000, 
    random_seed=42,
    reports="all",
    fernet_key="MY_FERNET_KEY"  # Must use the same key as in a training
)

[32m2026-01-05 14:27:11.952[0m | [1mINFO    [0m | [36msyngen.ml.data_loaders.data_loaders[0m:[36mload_data[0m:[36m706[0m - [1mData stored at the path - 'model_artifacts/tmp_store/housing-encrypted/input_data_housing-encrypted.dat' has been successfully decrypted and loaded.[0m
[32m2026-01-05 14:27:11.953[0m | [1mINFO    [0m | [36msyngen.ml.config.validation[0m:[36m_collect_errors[0m:[36m435[0m - [1mThe validation of the metadata has been passed successfully[0m
[32m2026-01-05 14:27:11.953[0m | [1mINFO    [0m | [36msyngen.infer[0m:[36mlaunch_infer[0m:[36m74[0m - [1mThe inference process will be executed according to the information mentioned in 'infer_settings' in the metadata file. If appropriate information is absent from the metadata file, then the values of parameters sent through CLI will be used. Otherwise, the values of parameters will be defaulted.[0m
[32m2026-01-05 14:27:11.954[0m | [1mINFO    [0m | [36msyngen.ml.worker.worker[0m:[36m_i

  1/157 [..............................] - ETA: 21s

[32m2026-01-05 14:27:12.445[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36mload_state[0m:[36m554[0m - [1mLoaded VAE state from model_artifacts/resources/housing-encrypted/vae/checkpoints[0m
[32m2026-01-05 14:27:12.445[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m491[0m - [1mTotal of 20 batch(es)[0m
[32m2026-01-05 14:27:12.445[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 1 of 20[0m
[32m2026-01-05 14:27:12.446[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:12.446[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 3821.53it/s]




[32m2026-01-05 14:27:13.026[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:13.027[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 2 of 20[0m
[32m2026-01-05 14:27:13.027[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:13.027[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 3885.58it/s]




[32m2026-01-05 14:27:13.483[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:13.484[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 3 of 20[0m
[32m2026-01-05 14:27:13.484[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:13.484[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 3915.25it/s]




[32m2026-01-05 14:27:13.948[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:13.949[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 4 of 20[0m
[32m2026-01-05 14:27:13.949[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:13.949[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4446.97it/s]




[32m2026-01-05 14:27:14.402[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:14.402[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 5 of 20[0m
[32m2026-01-05 14:27:14.402[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:14.403[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4130.10it/s]




[32m2026-01-05 14:27:14.873[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:14.873[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 6 of 20[0m
[32m2026-01-05 14:27:14.873[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:14.874[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4569.86it/s]




[32m2026-01-05 14:27:15.326[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:15.327[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 7 of 20[0m
[32m2026-01-05 14:27:15.327[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:15.327[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4100.37it/s]




[32m2026-01-05 14:27:15.782[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:15.782[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 8 of 20[0m
[32m2026-01-05 14:27:15.782[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:15.783[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4194.69it/s]




[32m2026-01-05 14:27:16.246[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:16.246[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 9 of 20[0m
[32m2026-01-05 14:27:16.246[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:16.247[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4546.90it/s]




[32m2026-01-05 14:27:16.685[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:16.686[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 10 of 20[0m
[32m2026-01-05 14:27:16.686[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:16.686[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4531.71it/s]




[32m2026-01-05 14:27:17.149[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:17.149[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 11 of 20[0m
[32m2026-01-05 14:27:17.149[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:17.150[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4988.90it/s]




[32m2026-01-05 14:27:17.602[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:17.603[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 12 of 20[0m
[32m2026-01-05 14:27:17.603[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:17.603[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4790.50it/s]




[32m2026-01-05 14:27:18.056[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:18.057[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 13 of 20[0m
[32m2026-01-05 14:27:18.057[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:18.057[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 5115.00it/s]




[32m2026-01-05 14:27:18.508[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:18.508[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 14 of 20[0m
[32m2026-01-05 14:27:18.508[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:18.509[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4974.38it/s]




[32m2026-01-05 14:27:18.963[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:18.964[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 15 of 20[0m
[32m2026-01-05 14:27:18.964[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:18.964[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 5049.51it/s]




[32m2026-01-05 14:27:19.421[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:19.421[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 16 of 20[0m
[32m2026-01-05 14:27:19.421[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:19.421[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 5029.14it/s]




[32m2026-01-05 14:27:19.885[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:19.885[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 17 of 20[0m
[32m2026-01-05 14:27:19.885[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:19.886[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 5164.24it/s]




[32m2026-01-05 14:27:20.339[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:20.339[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 18 of 20[0m
[32m2026-01-05 14:27:20.339[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:20.340[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4982.43it/s]




[32m2026-01-05 14:27:20.801[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:20.802[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 19 of 20[0m
[32m2026-01-05 14:27:20.802[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:20.802[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 5063.92it/s]




[32m2026-01-05 14:27:21.258[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:21.259[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mhandle[0m:[36m503[0m - [1mData synthesis for the table - 'housing_encrypted'. Generating the batch 20 of 20[0m
[32m2026-01-05 14:27:21.259[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun[0m:[36m348[0m - [1mStart data synthesis[0m
[32m2026-01-05 14:27:21.259[0m | [1mINFO    [0m | [36msyngen.ml.handlers.handlers[0m:[36mrun_separate[0m:[36m323[0m - [1mVAE generation for 'housing_encrypted' started.[0m




Generation of the data...: 100%|██████████| 11/11 [00:00<00:00, 4893.13it/s]
[32m2026-01-05 14:27:21.722[0m | [1mINFO    [0m | [36msyngen.ml.vae.wrappers.wrappers[0m:[36m_restore_nan_values[0m:[36m135[0m - [1mColumn 'total_bedrooms' has 0 (0.0%) empty values generated[0m
[32m2026-01-05 14:27:22.103[0m | [1mINFO    [0m | [36msyngen.ml.strategies.strategies[0m:[36mrun[0m:[36m236[0m - [1mSynthesis of the table - 'housing_encrypted' was completed. Synthetic data saved in 'model_artifacts/tmp_store/housing-encrypted/merged_infer_housing-encrypted.csv'[0m
[32m2026-01-05 14:27:22.104[0m | [1mINFO    [0m | [36msyngen.ml.reporters.reporters[0m:[36m_log_and_update_progress[0m:[36m287[0m - [1mThe calculation of accuracy metrics for the table - 'housing_encrypted' has started[0m
[32m2026-01-05 14:27:22.105[0m | [1mINFO    [0m | [36msyngen.ml.data_loaders.data_loaders[0m:[36mload_data[0m:[36m706[0m - [1mData stored at the path - 'model_artifacts/tmp_st

In [25]:
launcher_for_encrypted_data.execution_artifacts

{'housing_encrypted': {'path_to_input_data': 'model_artifacts/tmp_store/housing-encrypted/input_data_housing-encrypted.dat',
  'path_to_generated_data': 'model_artifacts/tmp_store/housing-encrypted/merged_infer_housing-encrypted.csv',
  'generated_reports': {'accuracy_report': 'model_artifacts/tmp_store/housing-encrypted/reports/accuracy-report-2026_01_05_14_27_50_276239.html'}}}

## Using the Fernet Key with the metadata file

You can also specify the Fernet key in the metadata file for both training and inference:

```yaml
global:
  encryption:
    fernet_key: MY_FERNET_KEY  # Name of the environment variable

TABLE_NAME:
  train_settings:
    source: "./data/table.csv"
  
  infer_settings:
    size: 100
  
  # You can also specify per-table encryption
  encryption:
    fernet_key: MY_FERNET_KEY
```

Then use it in your code:

```python
# Training with the metadata file and the Fernet key
Syngen(metadata_path="path/to/metadata.yaml").train()

# Inference with the metadata file and the Fernet key
Syngen(metadata_path="path/to/metadata.yaml").infer()
```

## Important security notes

⚠️ **Critical security considerations:**

1. **Store the key securely**: Never hardcode the Fernet key directly in your code or commit it to version control systems.
2. **Key recovery is impossible**: If you lose the Fernet key, encrypted data cannot be recovered
3. **Same key required**: Always use the same Fernet key for a training, an inference and a report generation
4. **Environment variable**: Use an environment variable to store the Fernet key securely
5. **Key length**: The Fernet key must be exactly 44 characters (URL-safe base64-encoded)
6. **Production environments**: In production, use secure secret management services (AWS Secrets Manager, Azure Key Vault, HashiCorp Vault, etc.)

## What happens without a Fernet key?

If you don't provide a `fernet_key` parameter:
- Data subset is stored **unencrypted** in the `.pkl` format
- No decryption is needed during the inference or the report generation
- Suitable for non-sensitive data or development environments

With a `fernet_key`:
- Data subset is stored **encrypted** in the `.dat` format
- Decryption is required during the inference or the report generation using the same Fernet key
- Recommended for sensitive or production data

# Generate quality reports separately

Sometimes you may want to generate quality reports separately after training or/and inference has already been completed to evaluate the quality of the input data or the generated data. The SDK provides the `Syngen().generate_quality_reports(...)` method that allows you to generate quality reports for a table using existing artifacts without re-running the training or/and inference processes.

This method is useful when:
- You completed training/inference without quality reports (with `reports="none"`)
- You want to generate additional report types later
- You want to separate the computation-intensive training/inference from report generation


```python
generate_quality_reports(
    self,
    table_name: str,                        # required: the name of the table to generate quality reports for
    reports: Union[str, Tuple[str], List[str]], # required: report types to generate
    fernet_key: Optional[str] = None        # optional: a Fernet key for decrypting encrypted data
)
```

### Parameters description:

- **`table_name`** *(str, required)*: The name of the table to generate reports for.

- **`reports`** *(Union[str, Tuple[str], List[str]], required)*: Controls which quality reports to generate. Accepts single string or list of strings:
  - `"accuracy"` - generates an accuracy report comparing original and synthetic data
  - `"metrics_only"` - outputs metrics information to stdout without generating an accuracy report
  - `"sample"` - generates a sample report showing distribution comparisons between original data and the data subset used for a training process
  - `"all"` - generates all available reports (*"accuracy"* and *"sample"*)
  
  List example: `["accuracy", "sample"]` to generate multiple report types.

  *Note*: Report generation may require significant time for large tables (>10,000 rows)

- **`fernet_key`** *(Optional[str], default: None)*: The name of the environment variable containing the Fernet key used to decrypt the original data subset. **Important**: Must be the same key used during training if the data was encrypted.

### Required artifacts

To generate quality reports, the following artifacts must exist:

**For accuracy reports (`"accuracy"` or `"metrics_only"`):**
- Training must be completed successfully
- Inference must be completed successfully

**For a `"sample"` report:**
- Training must be completed successfully

### Key notes:

- The method uses existing artifacts and does not re-run a training or an inference process
- All required artifacts must be present in the `model_artifacts` directory
- The `table_name` must match exactly the name of the `table_name` that was used in a training/inference process
- If data was encrypted during a training, the same `fernet_key` must be provided

*Note:* For full documentation and additional details, please refer to [README.md](../README.md)

In [26]:
# Example 1: Generate an accuracy report after a training and inference process completed without a generation of quality reports.
# Assume a training and inference process were already completed with reports="none".
# Now generate an accuracy report separately.


launcher_for_multiple_tables.generate_quality_reports(
    table_name="housing_conditions",
    reports="accuracy",
    log_level="DEBUG"
)

[32m2026-01-05 14:39:44.321[0m | [1mINFO    [0m | [36msyngen.ml.reporters.reporters[0m:[36m_log_and_update_progress[0m:[36m287[0m - [1mThe calculation of accuracy metrics for the table - 'housing_conditions' has started[0m
[32m2026-01-05 14:39:44.485[0m | [1mINFO    [0m | [36msyngen.ml.metrics.accuracy_test.accuracy_test[0m:[36m_fetch_metrics[0m:[36m193[0m - [1mMedian accuracy is 0.8853[0m
Generating bivariate distributions...: 100%|██████████| 3/3 [00:02<00:00,  1.31it/s]
[32m2026-01-05 14:39:47.678[0m | [1mINFO    [0m | [36msyngen.ml.metrics.accuracy_test.accuracy_test[0m:[36m_fetch_metrics[0m:[36m221[0m - [1mMedian of differences of correlations is 0[0m
[32m2026-01-05 14:39:47.828[0m | [1mINFO    [0m | [36msyngen.ml.metrics.accuracy_test.accuracy_test[0m:[36m_fetch_metrics[0m:[36m229[0m - [1mMean clusters homogeneity is 0.1833[0m
[32m2026-01-05 14:39:47.829[0m | [1mINFO    [0m | [36msyngen.ml.metrics.metrics_classes.metrics[0m:[

In [27]:
launcher_for_multiple_tables.execution_artifacts

{'housing_conditions': {'generated_reports': {'accuracy_report': 'model_artifacts/tmp_store/housing-conditions/reports/accuracy-report-2026_01_05_14_39_48_565198.html'}}}

In [28]:
# Example 2: Generate a sample report after training completed

# Generate a sample report to compare original data with its subset
launcher_for_multiple_tables.generate_quality_reports(
    table_name="housing_properties",
    reports="sample"
)

AttributeError: 'TrainConfig' object has no attribute 'loader'

In [None]:
# Example 3: Generate multiple reports at once

# Generate both accuracy and sample reports
launcher_for_multiple_tables.generate_quality_reports(
    table_name="housing_conditions",
    reports=["accuracy", "sample"]
)

# Or use "all" to generate all available reports
launcher_for_multiple_tables.generate_quality_reports(
    table_name="housing_conditions",
    reports="all"
)

In [None]:
# Example 4: Generate quality reports for encrypted data
import os

# Ensure the Fernet key environment variable is set
# (Should be the same key used during training)
os.environ['MY_FERNET_KEY'] = fernet_key

# Generate qulaity reports with decryption
launcher_for_encrypted_data.generate_quality_reports(
    table_name="housing_encrypted",
    reports="accuracy",
    fernet_key="MY_FERNET_KEY"  # Same key used in training
)

In [None]:
# Example 5: Generate the "metrics_only" report (without a full accuracy report)
# Output metrics to stdout


launcher_for_multiple_tables.generate_quality_reports(
    table_name="housing",
    reports="metrics_only"
)

# Using Custom Data Loaders

The `loader_path` parameter allows you to load data from non-standard sources such as databases, APIs, cloud storage, or any custom data source. This is particularly useful when:
- Your data is stored in a database (PostgreSQL, MySQL, MongoDB, etc.)
- You need to fetch data from an API or web service
- Your data is in cloud storage (AWS S3, Azure Blob, Google Cloud Storage)
- You need to apply custom preprocessing before training/inference
- The file format is not directly supported

## Custom Loader Requirements

A custom loader function must meet these requirements:

1. **Accept `table_name` as the first parameter** (str): The name of the table to load
2. **Return a pandas DataFrame**: The loaded data in DataFrame format

## Function Signature

```python
def custom_loader(table_name: str) -> pd.DataFrame:
    """
    Load data for the specified table.
    
    Parameters
    ----------
    table_name : str
        The name of the table to load
    
    Returns
    -------
    pd.DataFrame
        The loaded data as a pandas DataFrame
    """
    # Your loading logic here
    return dataframe
```

## Example: Basic Custom Loader

The repository includes a working example in [examples/example-package/custom_loader.py](../examples/example-package/custom_loader.py):

```python
# File: examples/example-package/custom_loader.py
from pathlib import Path
from typing import Optional
import pandas as pd


def get_dataframe(table_name: str) -> pd.DataFrame:
    """
    Load a CSV file as a pandas DataFrame from the example-data directory.
    
    Parameters
    ----------
    table_name : str
        The name of the table (CSV file without extension) to load
    encoding : Optional[str], default="utf-8"
        The encoding to use when reading the CSV file
    
    Returns
    -------
    pd.DataFrame
        The loaded data as a pandas DataFrame
    """
    current_dir = os.path.dirname(os.path.abspath(__file__))
    path_to_example_data = current_dir / "example-data" / f"{table_name}.csv"
    
    if not path_to_example_data.exists():
        raise FileNotFoundError(f"The CSV file '{table_name}.csv' does not exist")
    
    return pd.read_csv(path_to_example_data)
```

## How to Use the Custom Loader

### Step 1: Ensure the loader module is importable

Make sure your custom loader module is importable.

### Step 2: Use the loader during the initialization of the class Syngen

```python
from syngen.sdk import Syngen

launcher = Syngen(loader=get_dataframe, table_name="housing")

launcher.train(
    epochs=5,
    row_limit=1000,
    batch_size=32
)

launcher.infer(
    size=10000,
    batch_size=5000,
    reports="accuracy"
)
```

## Advanced Examples

### Example: Database Loader

```python
# File: my_loaders/db_loader.py
import pandas as pd
from sqlalchemy import create_engine

def load_from_postgres(table_name: str) -> pd.DataFrame:
    """Load data from PostgreSQL database."""
    engine = create_engine("TEST_CONNECTION_STRING")
    query = f"SELECT * FROM {table_name}"
    return pd.read_sql(query, engine)
```

Usage:
```python
Syngen(loader=load_from_postgres, table_name="test_table")
```

### Example: API Loader

```python
# File: my_loaders/api_loader.py
import pandas as pd
import requests

def load_from_api(table_name: str) -> pd.DataFrame:
    """Load data from REST API."""
    response = requests.get(f"{api_url}/{table_name}")
    data = response.json()
    return pd.DataFrame(data)
```

### Example: Cloud Storage Loader

```python
# File: my_loaders/cloud_loader.py
import pandas as pd
import boto3
from io import StringIO

def load_from_s3(table_name: str) -> pd.DataFrame:
    """Load data from AWS S3."""
    s3_client = boto3.client('s3')
    obj = s3_client.get_object(Bucket=bucket_name, Key=f"{table_name}.csv")
    data = obj['Body'].read().decode('utf-8')
    return pd.read_csv(StringIO(data))
```

# Data loading and saving: DataIO class

The SDK provides the `DataIO` class for loading and saving data in various supported formats with optional encryption and format settings. This class is useful when you need to:
- Load and save data in different file formats (*CSV*, *Avro*, *Excel*, etc.)
- Load and save data with specific format settings
- Work with encrypted data files

## Class initialization

```python
DataIO(
    path: str,                          # required: a path to the data file
    fernet_key: Optional[str] = None,   # optional: a Fernet key for encrypted data
    **kwargs                            # optional: format settings for CSV or Excel tables, or schema for AVRO file
)
```

### Parameters description:

- **`path`** *(str, required)*: the path to the data file to load or save. Supported formats include:
  - CSV files: `.csv`, `.psv`, `.tsv`, `.txt`
  - Avro files: `.avro`
  - Excel files: `.xls`, `.xlsx`

- **`fernet_key`** *(Optional[str], default: None)*: the name of the environment variable containing the Fernet key for encrypted data operations.

- **`**kwargs`**: Optional format settings or/and a schema for reading and writing data. Available parameters depend on the file format:

  **For tables in '.csv', '.psv', '.tsv', '.txt' formats:**
  - `sep` *(str)*: Delimiter to use (e.g., `','`, `';'`, `'\t'`)
  - `quotechar` *(str)*: Character used to denote the start and end of a quoted item (default: `'"'`)
  - `quoting` *(str)*: Quoting behavior - `"all"`, `"minimal"`, `"non-numeric"`, `"none"`
  - `escapechar` *(str)*: Character used to escape other characters
  - `encoding` *(str)*: Encoding to use (e.g., `'utf-8'`, `'latin-1'`)
  - `header` *(Optional[int, List[int], Literal["infer"]])*: Row number(s) containing column labels and marking the start of the data
  - `skiprows` *(Optional[int, List[int]])*: Lines to skip at the start of the file
  - `on_bad_lines` *(Literal["error", "warn", "skip"])*: Action on bad lines - `"error"`, `"warn"`, `"skip"`
  - `engine` *(Optional[Literal["c", "python"]])*: Parser engine - `"c"`, `"python"`
  - `na_values` *(Opional[List[str]])*: Additional strings to recognize as NA/NaN

  **For Excel formats (.xls, .xlsx):**
  - `sheet_name` *(Optional[str, int, List[Union[int, str]])*: Name or index of the sheet to read

### Available methods

`load_data(**kwargs)`

Loads data from the specified file path and returns it as a pandas DataFrame.

`load_schema()`

Returns the original schema of the loaded data, including column names and data types. Available only for data in the *'.avro'* format.

`save_data(df, **kwargs)`

Saves a pandas DataFrame to the specified file path with(without) the configured format settings or schema.

*Note:* For full documentation and additional details, please refer to [README.md](../README.md)

In [None]:
# The example 1: Load CSV data with default settings

from syngen.sdk import DataIO

data_io = DataIO(path="../examples/example-data/housing.csv")

df = data_io.load_data()

print(f"Loaded {df.shape[0]} rows and {df.shape[1]} columns")
print(f"\nFirst few rows:")
print(df.head())

In [None]:
# The example 2: Load CSV data with custom format settings

from syngen.sdk import DataIO

data_io = DataIO(
    path="../examples/example-data/escaped_quoted_table.csv",
    sep=',',           # delimiter
    quotechar='"',     # quote character
    quoting="minimal", # quoting style
    encoding='utf-8',  # encoding
    header=0           # use first row as header
)

df = data_io.load_data()

print(f"Loaded {df.shape[0]} rows and {df.shape[1]} columns")
print(f"\nFirst few rows:")
print(df.head())

In [None]:
# The example 3: Load data and get schema information

from syngen.sdk import DataIO


data_io = DataIO(path="../examples/example-data/avro_file.avro")

df = data_io.load_data()

schema = data_io.load_schema()

print("Data schema:")
print(schema)

In [None]:
# The example 4: Save data to a file

import pandas as pd
from syngen.sdk import DataIO


sample_data = pd.DataFrame({
    'id': [1, 2, 3, 4, 5],
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'age': [25, 30, 35, 40, 45],
    'city': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney']
})


data_io = DataIO(
    path="../examples/example-data/sample_output.csv",
    sep=',',
    encoding='utf-8'
)


data_io.save_data(sample_data)

print("Data saved successfully to '../examples/example-data/sample_output.csv'")