# Use case - Dataset augmentation: Anomaly class augmentation

## Mobile Network Anomaly Detection dataset

<span style="font-size:15px; line-height:1.5em"> Traditionally, the design of a cellular network focuses on the optimization of energy and resources that guarantees a smooth operation even during peak hours (i.e. periods with higher traffic load). In order to adapt to the varying user demands in the most efficient way with regards to energy savings and utilization of frequency resources would be optimal to detect which ones are abnormal bnehaviours in the utilization of the newtork. <br>
In this notebook, we are exploiting the power of YData's Synthesizer to have a better balanced dataset to improve the detecion of these anomalies. <br><br>
<u>Target:</u>
The target variable can assume two values: <br>
• 0 (normal):  activity corresponds to normal behavior of any working day. <br>
• 1 (unusual): activity differs from the behavior usually observed. 
    
The dataset explored for the purpose of demoing dataset augmentation can be found in ["Kaggle - Anomalyt detection in 4G cellular networks"](https://www.kaggle.com/c/anomaly-detection-in-4g-cellular-networks/)

## 0 - Imports 

Importing all the packages needed in the reading data step. 

In [1]:
import json
from ydata.connectors import GCSConnector, LocalConnector
from ydata.connectors.filetype import FileType
from ydata.utils.formats import read_json

## 1 - Load Data

Load the data from Google Cloud Storage with YData's connector. 

In [2]:
# Initialize the YData's connector
token = read_json('gcs_credentials.json')
connector = GCSConnector(token['project_id'], keyfile_dict=token)

In [4]:
# Read the data from the Cloud Storage 
data = connector.read_file('gs://ydata_testdata/timeseries/telco/data.csv', sep=';', file_type = FileType.CSV)
final_df = data.to_pandas()

## 2 - Store Data 

Make the data available for the next steps storing it locally. In this case we will leverage pandas to save the file. But YData LocalConnector could also be used.

In [5]:
final_df.to_csv('data.csv', index=False)

## 3 - Create Artifacts 

Create the artifact to show the downloaded data on the platform's pipeline. 

In [6]:
# Here we create the visualization of the table. This is the metadata that kubeflows need to show some lines of the dataset.
import json

metadata = {
    'outputs' : [{
      'type': 'table',
      'storage': 'inline',
      'format': 'csv',
      'header': list(final_df.columns),
      'source': final_df.to_csv(header=False, index=False)
    }]
  }

with open("mlpipeline-ui-metadata.json", 'w') as metadata_file:
    json.dump(metadata, metadata_file)