TDSbenchmark

This repository benchmarks tools for tabular data synthesis, providing insights and comparisons to help users identify the most suitable tool for generating synthetic data tailored to their specific use case. This repository is the code for the paper: Benchmarking Tabular Data Synthesis: Evaluating Tools, Metrics, and Datasets on Commodity Hardware for End-Users.

Necessary files per tool

Each tool needs a repo with toolname-main and inside:

model code (added here using git submodule)
run_tool.py
python-version.txt
requirements.txt
special-torch.txt (optional for tools that need special torch library before installing reqs.)

Put the real dataset inside the data folder as a file with .csv extension. If you want to compare several tools, pre-processing the original dataset and scaling it is advised.

How to run

Inside TDSbenchmark run:

python benchmark.py

Then, as prompted, provide the experiment .json file. For examples of .json configuration files see the "experiments" folder.

The benchmark will run te shell script "monitor_usage" to create a .csv file with the resource performance (CPU, GPU, memory, time) during the benchmark. For iOS, use monitor_usageMac.sh.

Example

python bechmark.py
experiments/per_dataset/adult.json

Results

When finished, benchmark saves:

A fake dataset under fake_datasets/toolname/toolname dataset.csv
Performance files: one for CPU and memory performance, one for GPU performance and several for other evaluation metrics. For a complete list of the evaluation metrics, see the original paper.
For result plots of our benchmark, see "results".

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
evaluation		evaluation
experiments		experiments
fake_datasets		fake_datasets
fake_datasets_missingValues		fake_datasets_missingValues
performance		performance
privacy		privacy
realMLutility		realMLutility
results/distributionPlots		results/distributionPlots
tools		tools
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
monitor_usage.sh		monitor_usage.sh
monitor_usageMac.sh		monitor_usageMac.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TDSbenchmark

Necessary files per tool

How to run

Example

Results

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mafedavila/TDSbenchmark

Folders and files

Latest commit

History

Repository files navigation

TDSbenchmark

Necessary files per tool

How to run

Example

Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages