This repository provides Python scripts and examples for generating synthetic non-IID (non-Identically Independently Distributed) datasets suitable for federated learning simulations. The code allows users to generate datasets with customizable heterogeneity levels by adjusting the alpha parameter.
Federated learning scenarios often require datasets distributed unevenly (non-IID) across multiple clients. This repository provides an easy-to-use solution to create synthetic datasets with varying heterogeneity levels, allowing users to simulate and evaluate federated learning algorithms under realistic data distribution conditions.
To install and set up the repository, follow these steps:
git clone https://github.com/AhmadTaheri2021/Federated-NonIID-Data-Generator.git
cd Federated-NonIID-Data-Generator
pip install -r requirements.txt
numpy
matplotlib
seaborn
pandas
jupyter
An illustrative example is provided in the notebook located at:
examples/usage_example.ipynb
Here's a brief example of usage:
import numpy as np
from data_generator.data_partitioning import generate_distributed_datasets
# Example dataset
X_train = np.random.rand(1000, 20)
Y_train = np.random.randint(0, 10, 1000)
# Define client names and alpha values
client_names = [f'Client_{i}' for i in range(5)]
alpha_values = [0.01, 0.1, 1.0]
# Generate datasets
datasets = generate_distributed_datasets(
X_train,
Y_train,
num_of_clients=5,
client_names=client_names,
alpha_values=alpha_values,
show_dist=True
)
X_train
,Y_train
: Input training dataset.num_of_clients
: Number of federated learning clients.client_names
: List of client identifiers.alpha_values
: Controls the degree of non-IID distribution:- Lower
alpha
: More heterogeneous data. - Higher
alpha
: More homogeneous data.
- Lower
show_dist
: Enables visualization of the generated distributions.
The function provides visualization tools to quickly assess how data distribution varies across different clients and alpha settings:
- Bar plots: Display the number of samples allocated per client.
- Heatmaps: Visualize the distribution of different classes across clients.
See the example notebook for detailed visualizations.
Distributed under the MIT License. See the LICENSE
file for more information.