# Data Augmentation Test

Explanation of Data Augmentation:
1. Data Augmentation Function: Introduce small random noise to the existing data to create new samples.
2. Combine Original and Augmented Data: Merge the original data with the augmented data.
3. Re-train the Autoencoder: Train the autoencoder with the combined dataset to improve its ability to generate realistic synthetic data.

These approaches should help create more realistic synthetic data even with limited real-world data.

In [None]:
# Data augmentation function
def augment_data(data, noise_factor=0.1):
    augmented_data = data + noise_factor * np.random.normal(size=data.shape)
    return augmented_data

In [None]:
# Augment the existing data
augmented_data = augment_data(real_data, noise_factor=0.05)

In [None]:
# Combine original and augmented data
combined_data = np.vstack([real_data, augmented_data])

In [None]:
# Re-train the autoencoder with augmented data
autoencoder.fit(combined_data, combined_data, epochs=1000, batch_size=4, shuffle=True, verbose=0)

In [None]:
# Generate synthetic data
encoded_data = np.random.normal(size=(1000, encoding_dim))  # Sample from the latent space
synthetic_data = autoencoder.predict(encoded_data)
synthetic_data = synthetic_data * combined_data.std(axis=0) + combined_data.mean(axis=0)  # De-normalize the data

In [None]:
# Create a DataFrame
synthetic_data_df = pd.DataFrame(synthetic_data, columns=['Age', 'Income', 'Education Level'])

In [None]:
# Display the synthetic data
print(synthetic_data_df.head())

In [None]:
# Optionally save to a CSV file
synthetic_data_df.to_csv('synthetic_data.csv', index=False)