# Generate data from scratch - bootstrap your data

This Notebook introduces the concept and methodology of data bootstrapping using YData Fabric. Data bootstrapping is an essential technique for enhancing the robustness and variability of your data, even if you have access or even real data available. Through practical code examples, you will learn how to generate synthetic datasets from scratch. This process is invaluable for building test cases, software development environment, hipothesis testing and privacy-preserving data sharing. 

By the end of this notebook, you'll be equipped with the tools to implement data bootstrapping in your own projects, ensuring you can derive meaningful insights even from minimal or skewed data sources.

### Leverage Metadata to generate new test data

YData Fabric **FakeSynthesizer** leverages the concept of Metadata to automate as much as possible the synthetic data generation. What does this mean? The **FakerSynthesizer** leverages your knowledge on the business expectations for the data, like variabletype (float, int, text, etc.) as well as distributions, formats and behaviours (is it expected to have missing data?) in order to generate new data records even if you have no access or real data available. 

For purposes of demo in this notebook we will be using the Metadata from an existing dataset, but you are able to create and provide your own Metadata.

In [None]:
# Importing YData's packages
from ydata.labs import DataSources
# Reading the Dataset from the DataSource
datasource = DataSources.get(uid='9984f231-f795-453a-93d7-cabc58f3ea75', namespace='b9efe5f3-b8ae-4d4d-8ea6-209ddffc5bbc')
dataset = datasource.dataset
# Getting the calculated Metadata to get the profile overview information in the labs
metadata = datasource.metadata
print(metadata)

## Generate Synthetic data with Fabric FakerSynthesizer

In [None]:
from ydata.synthesizers.faker.model import FakerSynthesizer

synth = FakerSynthesizer()

#fit your faker synthesizer to the desired Metadata
synth.fit(metadata=metadata)

### Generate new data samples

In [None]:
sample = synth.sample(1000)

In [None]:
sample.head()

### Save your Synthesizer for later usage

In [None]:
synth.save('synth.pkl')