# Tutorial 10: Sequential Synthesis
In this tutorial, we explore the **Sequential Synthesis** approach using
the `syn_seq` plugin in `synthcity`. Sequential synthesis allows us to
model variables one-by-one (column-by-column), using conditional relationships
learned from the real data. The main idea is:
1. Synthesize the first variable (often with sample-without-replacement, "SWR"),
2. Then synthesize the second variable conditioned on the first,
3. And so on for each subsequent variable.
This approach can better preserve complex dependencies among columns than
simple marginal or naive methods.
We'll demonstrate this using the **diabetes** dataset, just like other tutorials,
and compare the resulting synthetic data.


In [22]:
!pip install synthcity



In [23]:

import sys
import warnings

warnings.filterwarnings("ignore")

from sklearn.datasets import load_diabetes

# synthcity absolute
import synthcity.logger as log
from synthcity.plugins import Plugins

log.add(sink=sys.stderr, level="INFO")


## 1) Load the data
We will use the diabetes dataset for simplicity.


In [24]:
import pandas as pd

ods = pd.read_csv("C:\\Users\\hsrhe\\Desktop\\SQRG\\ods.csv")
#전처리 작업 - 합성에 원하는 컬럼 
target = ["sex", "age", "edu", "marital", "income", "smoke", "nociga", "wkabdur", "ls", "wkabint", "date"]
ods = ods[target]

print(ods.head())

      sex  age                   edu  marital  income smoke  nociga  wkabdur  \
0  FEMALE   57    VOCATIONAL/GRAMMAR  MARRIED   800.0    NO      -8       -8   
1    MALE   20    VOCATIONAL/GRAMMAR   SINGLE   350.0    NO      -8       -8   
2  FEMALE   18    VOCATIONAL/GRAMMAR   SINGLE     NaN    NO      -8       -8   
3  FEMALE   78  PRIMARY/NO EDUCATION  WIDOWED   900.0    NO      -8       -8   
4  FEMALE   54    VOCATIONAL/GRAMMAR  MARRIED  1500.0   YES      20       -8   

                 ls wkabint        date  
0           PLEASED      NO  1979-10-07  
1  MOSTLY SATISFIED      NO         NaN  
2           PLEASED      NO         NaN  
3             MIXED      NO  1958-08-11  
4  MOSTLY SATISFIED      NO  1980-06-08  



## 2) Create a Syn_SeqDataLoader
Instead of using a `GenericDataLoader`, we use our specialized
`Syn_SeqDataLoader`. We'll define a `syn_order` — the sequence in which columns
get synthesized. If not provided, it defaults to the data's columns order.


In [25]:
from synthcity.plugins.core.dataloader import Syn_SeqDataLoader

user_custom = {
  'syn_order' : ["sex", "age", "edu", "marital", "income", "smoke", "nociga", "wkabdur", "ls", "wkabint", "date"],
  'method' : {},
  'special_value': {'income': [-8], 'nociga': [-8]},
  'col_type' : {"date": "date"},
  'variable_selection' : {
    "nociga": ["sex", "age", "edu", "marital", "smoke"],
    "ls": ['sex', 'age', 'income'],
  }
}


In [26]:
loader = Syn_SeqDataLoader(
    ods, 
    target_column="income", 
    sensitive_columns=["income"], 
    user_custom = user_custom, 
    max_categories = 15
    )


[INFO] Syn_SeqEncoder summary:
  (column, converted_type, method)

  (sex, category, swr)
    --> 
  (age, numeric, cart)
    --> 
  (edu, category, cart)
    --> 
  (marital, category, cart)
    --> 
  (income, numeric, cart)
    --> 
  (smoke, category, cart)
    --> 
  (nociga, numeric, cart)
    --> 
  (wkabdur, numeric, cart)
    --> 
  (ls, category, cart)
    --> 
  (wkabint, category, cart)
    --> 
  (date, numeric, cart)

  - special_value => {'income': [-8], 'nociga': [-8], 'wkabdur': [-8]}

  - variable_selection_:
         sex  age  edu  marital  income  smoke  nociga  wkabdur  ls  wkabint  \
sex        0    0    0        0       0      0       0        0   0        0   
age        1    0    0        0       0      0       0        0   0        0   
edu        1    1    0        0       0      0       0        0   0        0   
marital    1    1    1        0       0      0       0        0   0        0   
income     1    1    1        1       0      0       0        0   

In [27]:
# test
loader.data.dtypes
loader.info()['converted_type']

{'sex': 'category',
 'age': 'numeric',
 'edu': 'category',
 'marital': 'category',
 'income': 'numeric',
 'smoke': 'category',
 'nociga': 'numeric',
 'wkabdur': 'numeric',
 'ls': 'category',
 'wkabint': 'category',
 'date': 'numeric'}


The `Syn_SeqDataLoader` also prints out debug info, including the
automatically-detected numeric vs categorical columns.

## 3) List available plugins
Recall from earlier tutorials that you can see all generative model plugins
with `Plugins().list()`. We'll specifically focus on `"syn_seq"` here.



In [28]:
Plugins().list()

[2025-01-22T16:11:24.511187+0900][4740][CRITICAL] module disabled: C:\Users\hsrhe\Desktop\synthcity\src\synthcity\plugins\generic\plugin_goggle.py
[2025-01-22T16:11:24.511187+0900][4740][CRITICAL] module disabled: C:\Users\hsrhe\Desktop\synthcity\src\synthcity\plugins\generic\plugin_goggle.py


['image_adsgan',
 'timevae',
 'ctgan',
 'ddpm',
 'nflow',
 'survival_gan',
 'image_cgan',
 'privbayes',
 'bayesian_network',
 'decaf',
 'aim',
 'survival_ctgan',
 'syn_seq',
 'marginal_distributions',
 'adsgan',
 'dpgan',
 'radialgan',
 'uniform_sampler',
 'survae',
 'pategan',
 'fflows',
 'survival_nflow',
 'great',
 'dummy_sampler',
 'tvae',
 'timegan',
 'rtvae',
 'arf']

In [29]:
# #test
# encoded_loader, enc_dict = loader.encode()
# encoded_df = encoded_loader.dataframe()
# print(encoded_df.dtypes)
#     # print(encoded_df)
# print(encoded_df)
# print(enc_dict['syn_order'])
# print(enc_dict['converted_type'])

In [30]:
# #test
# # 2) 디코딩
# decoded_loader = encoded_loader.decode()
# decoded_df = decoded_loader.dataframe()

# print("\n=== Decoded DataFrame ===")
# print(decoded_df.head())
# print(decoded_df.dtypes)

You should see `"syn_seq"` in the returned list.

## 4) Load and train the Sequential Synthesis Model
The `syn_seq` plugin allows you to specify how each column is synthesized:
- `"SWR"` = sample without replacement
- `"CART"`, `"rf"`, `"pmm"`, `"logreg"`, etc. for the rest
Typically, we do `"SWR"` for the first column, and `"CART"` or `"rf"` for subsequent columns.
But you can choose any method for each column.


In [31]:
syn_model = Plugins().get("syn_seq")


[2025-01-22T16:11:24.558967+0900][4740][CRITICAL] module disabled: C:\Users\hsrhe\Desktop\synthcity\src\synthcity\plugins\generic\plugin_goggle.py
[2025-01-22T16:11:24.558967+0900][4740][CRITICAL] module disabled: C:\Users\hsrhe\Desktop\synthcity\src\synthcity\plugins\generic\plugin_goggle.py


In [32]:
syn_model.fit(loader)

[2025-01-22T16:11:24.688211+0900][4740][INFO] [INFO] Syn_Seq aggregator: fitting columns...
[2025-01-22T16:11:24.688211+0900][4740][INFO] Fitting 'sex' => stored distribution from real data. Done.


Fitting 'age' with 'cart' ... Done!
Fitting 'edu' with 'cart' ... Done!
Fitting 'marital' with 'cart' ... Done!
Fitting 'income_cat' with 'cart' ... Done!
Fitting 'income' with 'cart' ... Done!
Fitting 'smoke' with 'cart' ... Done!
Fitting 'nociga_cat' with 'cart' ... Done!
Fitting 'nociga' with 'cart' ... Done!
Fitting 'wkabdur_cat' with 'cart' ... Done!
Fitting 'wkabdur' with 'cart' ... Done!
Fitting 'ls' with 'cart' ... Done!
Fitting 'wkabint' with 'cart' ... Done!
Fitting 'date' with 'cart' ... Done!


<synthcity.plugins.generic.plugin_syn_seq.Syn_SeqPlugin at 0x2e3a4a90640>

In [33]:
#test
df = loader.dataframe()
print(df.dtypes)
loader._encoder._label_encoders

sex         object
age          int64
edu         object
marital     object
income     float64
smoke       object
nociga       int64
wkabdur      int64
ls          object
wkabint     object
date        object
dtype: object


{'sex': LabelEncoder(),
 'edu': LabelEncoder(),
 'marital': LabelEncoder(),
 'income_cat': LabelEncoder(),
 'smoke': LabelEncoder(),
 'nociga_cat': LabelEncoder(),
 'wkabdur_cat': LabelEncoder(),
 'ls': LabelEncoder(),
 'wkabint': LabelEncoder()}


**Note**: During training, you'll see some printed info about which method
is used for each column, plus the final variable selection matrix.

## 5) Generate synthetic data
By default, let's sample 200 synthetic rows.



In [34]:
# constraints = {
#   "target":[
#     ("bmi", ">", 0.15),
#     ("target", ">", 0)
#   ]
# }

In [35]:
# 여기서 X=loader 를 함께 넘긴다고 가정
synthetic_loader = syn_model.generate(
    nrows = len(ods)
    )

[2025-01-22T16:11:24.821242+0900][4740][INFO] Generating 'sex' => done.


Generating 'age' => done.
Generating 'edu' => done.
Generating 'marital' => done.
Generating 'income_cat' => done.
Generating 'income' => done.
Generating 'smoke' => done.
Generating 'nociga_cat' => done.
Generating 'nociga' => done.
Generating 'wkabdur_cat' => done.
Generating 'wkabdur' => done.
Generating 'ls' => done.
Generating 'wkabint' => done.
Generating 'date' => done.


In [36]:
# 6) 결과
synthetic_df = synthetic_loader.dataframe()
print(synthetic_df.head())

      sex   age                       edu   marital  income smoke  nociga  \
0  FEMALE  51.0        VOCATIONAL/GRAMMAR  DIVORCED  1200.0   YES    30.0   
1    MALE  25.0  POST-SECONDARY OR HIGHER    SINGLE  2000.0    NO    20.0   
2  FEMALE  56.0        VOCATIONAL/GRAMMAR  DIVORCED  1300.0   YES     8.0   
3    MALE  53.0        VOCATIONAL/GRAMMAR   MARRIED  1368.0    NO    20.0   
4    MALE  30.0        VOCATIONAL/GRAMMAR   MARRIED  1500.0    NO    20.0   

   wkabdur                   ls             wkabint     date  
0     24.0              UNHAPPY                  NO  13983.0  
1     13.0                MIXED                  NO  24940.0  
2     12.0  MOSTLY DISSATISFIED  YES, TO EU COUNTRY  15517.0  
3      1.0              PLEASED  YES, TO EU COUNTRY  24324.0  
4      1.0              PLEASED                  NO  26122.0  


In [37]:
# # third party
# import matplotlib.pyplot as plt

# syn_model.plot(plt, loader)

# plt.show()

In [38]:
#test
for col in synthetic_df.columns:
    value_ratios = synthetic_df[col].value_counts(normalize=True)
    high_ratios = value_ratios[value_ratios >= 0.7]
    print(f"컬럼 '{col}'에서 비율이 0.8 이상인 값:")
    print(high_ratios, "\n")

컬럼 'sex'에서 비율이 0.8 이상인 값:
Series([], Name: proportion, dtype: float64) 

컬럼 'age'에서 비율이 0.8 이상인 값:
Series([], Name: proportion, dtype: float64) 

컬럼 'edu'에서 비율이 0.8 이상인 값:
Series([], Name: proportion, dtype: float64) 

컬럼 'marital'에서 비율이 0.8 이상인 값:
Series([], Name: proportion, dtype: float64) 

컬럼 'income'에서 비율이 0.8 이상인 값:
Series([], Name: proportion, dtype: float64) 

컬럼 'smoke'에서 비율이 0.8 이상인 값:
smoke
NO    0.7324
Name: proportion, dtype: float64 

컬럼 'nociga'에서 비율이 0.8 이상인 값:
Series([], Name: proportion, dtype: float64) 

컬럼 'wkabdur'에서 비율이 0.8 이상인 값:
Series([], Name: proportion, dtype: float64) 

컬럼 'ls'에서 비율이 0.8 이상인 값:
Series([], Name: proportion, dtype: float64) 

컬럼 'wkabint'에서 비율이 0.8 이상인 값:
Series([], Name: proportion, dtype: float64) 

컬럼 'date'에서 비율이 0.8 이상인 값:
Series([], Name: proportion, dtype: float64) 



In [39]:
# 각 컬럼의 고유값과 빈도수 확인
for col in ods.columns:
    print(f"컬럼 '{col}'의 고유값 빈도수:")
    print(ods[col].value_counts(), "\n")

컬럼 'sex'의 고유값 빈도수:
sex
FEMALE    2818
MALE      2182
Name: count, dtype: int64 

컬럼 'age'의 고유값 빈도수:
age
61    113
58    110
53    110
57    109
60    109
     ... 
90      7
92      4
91      4
96      2
97      1
Name: count, Length: 79, dtype: int64 

컬럼 'edu'의 고유값 빈도수:
edu
VOCATIONAL/GRAMMAR          1613
SECONDARY                   1482
PRIMARY/NO EDUCATION         962
POST-SECONDARY OR HIGHER     936
Name: count, dtype: int64 

컬럼 'marital'의 고유값 빈도수:
marital
MARRIED               2979
SINGLE                1253
WIDOWED                531
DIVORCED               199
DE FACTO SEPARATED      22
LEGALLY SEPARATED        7
Name: count, dtype: int64 

컬럼 'income'의 고유값 빈도수:
income
-8.0       603
 1000.0    282
 1500.0    264
 2000.0    260
 1200.0    212
          ... 
 1115.0      1
 1598.0      1
 2536.0      1
 909.0       1
 3450.0      1
Name: count, Length: 406, dtype: int64 

컬럼 'smoke'의 고유값 빈도수:
smoke
NO     3713
YES    1277
Name: count, dtype: int64 

컬럼 'nociga'의 고유값 빈도수:
nociga

In [40]:
# 각 컬럼의 고유값과 빈도수 확인
for col in synthetic_df.columns:
    print(f"컬럼 '{col}'의 고유값 빈도수:")
    print(synthetic_df[col].value_counts(), "\n")

컬럼 'sex'의 고유값 빈도수:
sex
FEMALE    2767
MALE      2233
Name: count, dtype: int64 

컬럼 'age'의 고유값 빈도수:
age
61.0    121
35.0    115
53.0    111
62.0    108
60.0    104
       ... 
87.0     11
90.0      6
91.0      5
92.0      4
96.0      2
Name: count, Length: 78, dtype: int64 

컬럼 'edu'의 고유값 빈도수:
edu
VOCATIONAL/GRAMMAR          1566
SECONDARY                   1483
PRIMARY/NO EDUCATION        1015
POST-SECONDARY OR HIGHER     929
nan                            7
Name: count, dtype: int64 

컬럼 'marital'의 고유값 빈도수:
marital
MARRIED               2938
SINGLE                1312
WIDOWED                524
DIVORCED               192
DE FACTO SEPARATED      23
nan                      6
LEGALLY SEPARATED        5
Name: count, dtype: int64 

컬럼 'income'의 고유값 빈도수:
income
1000.0    444
2000.0    362
1500.0    335
1200.0    279
1300.0    193
         ... 
865.0       1
2619.0      1
570.0       1
1408.0      1
1037.0      1
Name: count, Length: 331, dtype: int64 

컬럼 'smoke'의 고유값 빈도수:
smoke
NO     36

In [41]:
# 6) 결과
orginal_df = loader.dataframe()
synthetic_df = synthetic_loader.dataframe()
print(orginal_df.head())
print(synthetic_df.head())

      sex  age                   edu  marital  income smoke  nociga  wkabdur  \
0  FEMALE   57    VOCATIONAL/GRAMMAR  MARRIED   800.0    NO      -8       -8   
1    MALE   20    VOCATIONAL/GRAMMAR   SINGLE   350.0    NO      -8       -8   
2  FEMALE   18    VOCATIONAL/GRAMMAR   SINGLE     NaN    NO      -8       -8   
3  FEMALE   78  PRIMARY/NO EDUCATION  WIDOWED   900.0    NO      -8       -8   
4  FEMALE   54    VOCATIONAL/GRAMMAR  MARRIED  1500.0   YES      20       -8   

                 ls wkabint        date  
0           PLEASED      NO  1979-10-07  
1  MOSTLY SATISFIED      NO         NaN  
2           PLEASED      NO         NaN  
3             MIXED      NO  1958-08-11  
4  MOSTLY SATISFIED      NO  1980-06-08  
      sex   age                       edu   marital  income smoke  nociga  \
0  FEMALE  51.0        VOCATIONAL/GRAMMAR  DIVORCED  1200.0   YES    30.0   
1    MALE  25.0  POST-SECONDARY OR HIGHER    SINGLE  2000.0    NO    20.0   
2  FEMALE  56.0        VOCATIONAL/GR

## Benchmarking metrics

| **Metric**                                         | **Description**                                                                                                            |
|----------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|
| sanity.data\_mismatch.score                        | Data types mismatch between the real//synthetic features                                                                   |
| sanity.common\_rows\_proportion.score              | Real data copy-paste in the synthetic data                                                                                 |
| sanity.nearest\_syn\_neighbor\_distance.mean       | Computes the \textless{}reduction\textgreater{}(distance) from the real data to the closest neighbor in the synthetic data |
| sanity.close\_values\_probability.score            | the probability of close values between the real and synthetic data.                                                       |
| sanity.distant\_values\_probability.score          | the probability of distant values between the real and synthetic data.                                                     |
| stats.jensenshannon\_dist.marginal                 | the average Jensen-Shannon distance                                                                                        |
| stats.chi\_squared\_test.marginal                  | the one-way chi-square test.                                                                                               |
| stats.feature\_corr.joint                          | the correlation/strength-of-association of features in data-set with both categorical and continuous features              |
| stats.inv\_kl\_divergence.marginal                 | the average inverse of the Kullback–Leibler Divergence metric.                                                             |
| stats.ks\_test.marginal                            | the Kolmogorov-Smirnov test for goodness of fit.                                                                           |
| stats.max\_mean\_discrepancy.joint                 | Empirical maximum mean discrepancy. The lower the result the more evidence that distributions are the same.                |
| stats.prdc.precision                               | precision between the two manifolds                                                                                        |
| stats.prdc.recall                                  | recall between the two manifolds                                                                                           |
| stats.prdc.density                                 | density between the two manifolds                                                                                          |
| stats.prdc.coverage                                | coverage between the two manifolds                                                                                         |
| stats.alpha\_precision.delta\_precision\_alpha\_OC | Delta precision                                                                                                            |
| stats.alpha\_precision.delta\_coverage\_beta\_OC   | Delta coverage                                                                                                             |
| stats.alpha\_precision.authenticity\_OC            | Authetnticity                                                                                                              |
| stats.survival\_km\_distance.optimism              | Kaplan-Meier distance between real-synthetic data                                                                          |
| stats.survival\_km\_distance.abs\_optimism         | Kaplan-Meier metrics absolute distance between real-syn data                                                               |
| stats.survival\_km\_distance.sightedness           | Kaplan-Meier metrics distance on the temporal axis                                                                         |
| performance.linear\_model.gt.c\_index              | Train on real, test on the test real data using CoxPH: C-Index                                                             |
| performance.linear\_model.gt.brier\_score          | Train on real, test on the test real data using CoxPH: Brier score                                                         |
| performance.linear\_model.syn\_id.c\_index         | Train on synthetic, test on the train real data using CoxPH: C-Index                                                       |
| performance.linear\_model.syn\_id.brier\_score     | Train on synthetic, test on the train real data using CoxPH: Brier score                                                   |
| performance.linear\_model.syn\_ood.c\_index        | Train on synthetic, test on the test real data using CoxPH: C-Index                                                        |
| performance.linear\_model.syn\_ood.brier\_score    | Train on synthetic, test on the test real data using CoxPH: Brier score                                                    |
| performance.mlp.gt.c\_index                        | Train on real, test on the test real data using NN: C-Index                                                                |
| performance.mlp.gt.brier\_score                    | Train on real, test on the test real data using NN : Brier score                                                           |
| performance.mlp.syn\_id.c\_index                   | Train on synthetic, test on the train real data using NN: C-Index                                                          |
| performance.mlp.syn\_id.brier\_score               | Train on synthetic, test on the train real data using NN: Brier score                                                      |
| performance.mlp.syn\_ood.c\_index                  | Train on synthetic, test on the test real data using NN: C-Index                                                           |
| performance.mlp.syn\_ood.brier\_score              | Train on synthetic, test on the test real data using NN: Brier score                                                       |
| performance.xgb.gt.c\_index                        | Train on real, test on the test real data using XGB: C-Index                                                               |
| performance.xgb.gt.brier\_score                    | Train on real, test on the test real data using XGB : Brier score                                                          |
| performance.xgb.syn\_id.c\_index                   | Train on synthetic, test on the train real data using XGB: C-Index                                                         |
| performance.xgb.syn\_id.brier\_score               | Train on synthetic, test on the train real data using XGB: Brier score                                                     |
| performance.xgb.syn\_ood.c\_index                  | Train on synthetic, test on the test real data using XGB: C-Index                                                          |
| performance.xgb.syn\_ood.brier\_score              | Train on synthetic, test on the test real data using XGB: Brier score                                                      |
| performance.feat\_rank\_distance.corr              | Correlation for the rank distances between the feature importance on real and synthetic data                               |
| performance.feat\_rank\_distance.pvalue            | p-vale for the rank distances between the feature importance on real and synthetic data                                    |
| detection.detection\_xgb.mean                      | The average AUCROC score for detecting synthetic data using an XGBoost.                                                    |
| detection.detection\_mlp.mean                      | The average AUCROC score for detecting synthetic data using a NN.                                                          |
| detection.detection\_gmm.mean                      | The average AUCROC score for detecting synthetic data using a GMM.                                                         |
| privacy.delta-presence.score                       | the maximum re-identification probability on the real dataset from the synthetic dataset.                                  |
| privacy.k-anonymization.gt                         | the k-anon for the real data                                                                                               |
| privacy.k-anonymization.syn                        | the k-anon for the synthetic data                                                                                          |
| privacy.k-map.score                                | the minimum value k that satisfies the k-map rule.                                                                         |
| privacy.distinct l-diversity.gt                    | the l-diversity for the real data                                                                                          |
| privacy.distinct l-diversity.syn                   | the l-diversity for the synthetic data                                                                                     |
| privacy.identifiability\_score.score               | the re-identification score on the real dataset from the synthetic dataset.                                                |

## Benchmark the quality of plugins

For survival analysis, general purpose generators can be used as well.

In [42]:
loader

Unnamed: 0,sex,age,edu,marital,income,smoke,nociga,wkabdur,ls,wkabint,date
0,FEMALE,57,VOCATIONAL/GRAMMAR,MARRIED,800.0,NO,-8,-8,PLEASED,NO,1979-10-07
1,MALE,20,VOCATIONAL/GRAMMAR,SINGLE,350.0,NO,-8,-8,MOSTLY SATISFIED,NO,
2,FEMALE,18,VOCATIONAL/GRAMMAR,SINGLE,,NO,-8,-8,PLEASED,NO,
3,FEMALE,78,PRIMARY/NO EDUCATION,WIDOWED,900.0,NO,-8,-8,MIXED,NO,1958-08-11
4,FEMALE,54,VOCATIONAL/GRAMMAR,MARRIED,1500.0,YES,20,-8,MOSTLY SATISFIED,NO,1980-06-08
...,...,...,...,...,...,...,...,...,...,...,...
4995,MALE,56,SECONDARY,MARRIED,2500.0,NO,-8,-8,MOSTLY SATISFIED,NO,1979-12-17
4996,FEMALE,59,SECONDARY,MARRIED,-8.0,YES,25,-8,MOSTLY SATISFIED,NO,1976-06-18
4997,FEMALE,20,SECONDARY,SINGLE,1000.0,NO,-8,-8,MOSTLY SATISFIED,NO,
4998,MALE,34,SECONDARY,MARRIED,2600.0,NO,-8,-8,MOSTLY SATISFIED,NO,2003-06-20


In [21]:
# synthcity absolute
from synthcity.benchmark import Benchmarks

score = Benchmarks.evaluate(
    [(f"test_{model}", model, {}) for model in ["syn_seq"]],
    loader,
    synthetic_size=1000,
    repeats=2,
)

KeyError: 'converted_type'

## Congratulations!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!

### Star [Synthcity](https://github.com/vanderschaarlab/synthcity) on GitHub

- The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we're building.


### Checkout other projects from vanderschaarlab
- [HyperImpute](https://github.com/vanderschaarlab/hyperimpute)
- [AutoPrognosis](https://github.com/vanderschaarlab/autoprognosis)
