# MultiTable synthesis - Managing Parent-Child Table Relationships with Composite Keys

Relational databases often include parent-child table relationships where composite keys—keys composed of multiple columns—play a crucial role in maintaining data integrity. In these relationships, the child table must reference a valid combination of keys from the parent table, ensuring consistency and preserving the relational structure.

When generating synthetic data for such schemas, it's essential to:
- Maintain the integrity of composite key relationships across parent and child tables.
- Ensure that all composite key combinations in the child table exist in the parent table.
- Generate realistic and coherent data that reflects the schema constraints.

This notebook serves as a guide to creating synthetic datasets for databases with composite key dependencies. We will explore techniques to ensure relational integrity, enabling reliable testing, machine learning model training, or database simulation scenarios.

In this example we will explore this can be achive by leveraging the [Football database](https://www.kaggle.com/datasets/technika148/football-database).

### Read the data from the Data Catalog

In [7]:
# Importing YData's packages
from ydata.labs import DataSources
# Reading the Dataset from the DataSource
datasource = DataSources.get(uid='{insert-datasource-id})
dataset = datasource.dataset
# Getting the calculated Metadata to get the profile overview information in the labs
metadata = datasource.metadata
print(metadata)

[1mMultiMetadata Summary 
 
[0m[1mTables Summary [0m
[1mNumber of tables: [0m3 
 
    Table name  # cols  # nrows Primary keys Foreign keys PK characteristics    FK characteristics Notes
0      leagues       3        5   [leagueID]                            [id]                            
1        games      34    12680     [gameID]   [leagueID]               [id]  {'leagueID': ['id']}      
2  appearances      18   356513           []     [gameID]                       {'gameID': ['id']}      
 
[1mRelations Summary [0m
[1mNumber of relations: [0m2 
 
         Table    Column Parent Table Parent Column Relation Type
0        games  leagueID      leagues      leagueID           1-n
1  appearances    gameID        games        gameID           1-n
 



### Configure relations and synthetic data generator details

In this case, our database includes has the child table expecting that the parent table composite relations are kept as is in the output of the synthetic data generation process. For that reason, we will define the composite expected composite relations from both the parent and child tables.

In [2]:
#set the relation between composite keys that exist in different tables
composite_keys = {
        "table": "appearances",
        "columns": ['leagueID', 'gameID'],
        "parent_table": 'games',
        "parent_columns": ['leagueID', 'gameID'],
    }

dataset.schema.add_composite_keys(**composite_keys)

## Synthetic data generation - train & sampling

### Train

In [None]:
from ydata.synthesizers.multitable.model import MultiTableSynthesizer

synthesizer = MultiTableSynthesizer()

synthesizer.fit(
        X=dataset,
        metadata=metadata,
)

### Sample

In [None]:
# Importing YData's packages
from ydata.labs import Connectors
# Getting a previously created Connector
dest_connector = Connectors.get(uid='{insert-connector-id}')

In [None]:
synthesizer.sample(1., connector=dest_connector.connector)