# How to use the FairFL-Data framework

Overall, this framework can be used to generate data for fair Federated Learning settings and Federated Learning settings in general. The data is non-synthetic but based on the ACSIncome and ACSEmployment dataset. It is evaluated with regard to fairness metrics during the generation. As our implementation is based on Flower Datasets Module, we recommend to read the documentation at https://flower.ai/docs/datasets/index.html.


## 1. Generating a FairFL Dataset

To generate a dataset and its partitions, you need to specify the name of the dataset (ACSIncome/ACSEmployment) and the states you want to load. The states are internally stored as "splits". For each state it is additionally required to specify a partitioner if this state should be considered for the internal evaluations. If no states and no partitioner are specified this defaults to lading all states as a single partition. For using partitioners, we refer to https://flower.ai/docs/datasets/tutorial-use-partitioners.html.

In this example, we apply the default IIDPartitioner which is initialized internally by simply giving a number of desired partitions per state.


In [None]:
from src.fairfl_data.FairFederatedDataset import FairFederatedDataset

#specify the FL dataset, sets 5 partitions for CT and 2 for AK
ffds = FairFederatedDataset(dataset="ACSIncome", states=["CT", "AK"],
                                            partitioners={"CT":5, "AK":2 }, fl_setting=None,
                                            fairness_metric="DP", fairness_level="attribute")

#Calling one of the following additionally returns evaluation for the complete dataset, as only then data is downloaded
#load the first partition of CT
partition_CT_0 = ffds.load_partition(split="CT", partition_id=0)

#load the complete state CT
split_CT = ffds.load_split("CT")

# the data can be stored with
ffds.save_dataset("data")


## 2. Modifying the data during creation

## 3. Evaluating the data on a set of models

# Training a