# Masterthesis
## Preprocessing

**Imports and Definitions**
- The necessary libraries are loaded here and important variables are defined

**Imports and settings for this script**
- Import libraries and set variables for this script

**Check directory and nessesary config files**
- Create output directorys for results if needed

**Read original CSI-Files, clean up data and saved with a new filename**
- In this step, the original CSI file is read, cleaned up and written to the "PathConverted" directory with a new file name. The configuration file for the mapping is called "FileMapping" and contains the line number, FilenameOld and FilenameNew.

**Calculate amplitude**
- The amplitude value of the CSI data is required for the next steps. This is calculated from the imaginary and real values. The already cleaned CSI data is loaded for this purpose, the amplitude is calculated and saved as a new file with the extension "_ah.csv" in the "PathConverted" directory

**Calculate windowssize of hampel filter**
- The hampel filter is used to remove outliers in the CSI data. To do this, it is necessary to calculate the value of windowssize in advance. This step has been outsourced to the "PreprocessingControll.ipynb" file, as this value only needs to be calculated once. The calculated value of "6" is used in the following step to remove the outliers.

**Create Scenario for Machine Learning**
- The last step is to create the scenario file. These ultimately contain the scenarios, which are then analyzed using machine learning algorithms. The files are saved in the "PathScenario" directory.


## Imports and Definitions

In [1]:
# Import sklearn
import sklearn

# Import pandas
import pandas as pd

# Import numpy
import numpy as np

# To calculate amplitude and phase
import math

# Measure runtime of a jupyter jotebook code cell
from timeit import default_timer as timer

# Used to check if file exists
import os

# Used to check if directory exists
import pathlib

# Import Operation System Calls
import SubOperationSystem

# check os
if os.name == 'nt':
    print("OS is Windows")
    Delimiter = '\\'
    
else:
    print("OS is Linux")
    Delimiter = '/'
    
# Path of datasets (root directory)
PathDataset = 'Dataset' + Delimiter    

# Path of datasets
PathDatasetSub = PathDataset + 'CsiFilesRah' + Delimiter
        
# Path of the converted files
PathConverted = PathDataset + 'Converted' + Delimiter

# Set path for scenario files
PathScenario = PathDataset + 'Scenario' + Delimiter

# Set path for scenario files
PathResult = PathDataset + 'Result' + Delimiter

# Set path for scenario files
PathPlot = PathDataset + 'Plot' + Delimiter

# Set path for scenario files
PathConfig = 'FilesConfig' + Delimiter

# Scenariofile (file with info about the ten scenarios)
FileScenario = 'FileScenario.csv'

# Mappingfile (file with info about original and converted filenames)
FileMapping = 'FileMapping.csv'

OS is Windows


# Imports and settings for this script

In [2]:
# ScenariofilePre - needed to create Scenario10
FileScenarioPre = 'FileScenarioPreconvert.csv'

## Check directory and nessesary config files

In [3]:
# Create directroy "PathConverted" if not exists
pathlib.Path(PathConverted).mkdir(parents = True, exist_ok = True)
  
# Create directroy "PathResult" if not exists
pathlib.Path(PathResult).mkdir(parents = True, exist_ok = True)

# Create directroy "PathScenario" if not exists
pathlib.Path(PathScenario).mkdir(parents = True, exist_ok = True)

# check if mapping file exists else exists
if not SubOperationSystem.checkIfFileExists(FileMapping, True):
    exit()
    
# check if scenario file exists else exists
if not SubOperationSystem.checkIfFileExists(FileScenario, True):
    exit()    


## Scenario10

The scenario10 file is combination of diffent CSI-Rahfile. This Files are needed for the Scenario10. Details see in PreprocessingCreateSpezialFiles.ipynb

Call here **PreprocessingCreateSpezialFiles.ipynb** if nessesary.

## Read original CSI-Files, clean up data and saved with a new filename

- Allow only permitted MAC addresses
- Remove spaces and special character
- Select only nessary colums 
- Calculate amplitude
- Save result with the file extension "_a.csv" the "PathResult" directory.

In [4]:
# Read config file "FileMapping"
dfFiles = pd.read_csv(PathConfig + FileMapping)

# loop through config file "FileMapping"
for ind in dfFiles.index:
        
    # read Linenumber, FilenameOld and Filenamenew
    LineNumber,FilenameOld,FilenameNew = (dfFiles['LineNumber'][ind], dfFiles['FilenameOld'][ind], dfFiles['FilenameNew'][ind])

    # Set Delimiter for Linux (only for FilenameOld)
    if Delimiter == "/":
        FilenameOld = FilenameOld.replace("\\\\", "/")
    
    # check if "FilenameOld" file exists
    if not SubOperationSystem.checkIfFileExists(PathDatasetSub + FilenameOld, True):
        exit()
        
    else:
        # if output file exists -> next file
        if SubOperationSystem.checkIfFileExists(PathConverted + FilenameNew, False):
            continue
    
    # read dataset
    df = pd.read_csv(PathDatasetSub + FilenameOld)
    
    # query only allowed mac addresses
    df.query('source_mac_addr in ["ec:94:cb:6e:73:8c", "ec:94:cb:6e:7c:64"]',inplace=True)
    
    # use only columns with important CSI-Data
    df = pd.concat([df.loc[:,'csi_subcarrier_6':'csi_subcarrier_31'],df.loc[:,'csi_subcarrier_33':'csi_subcarrier_58']],axis=1)

    # replace "("
    df = df.replace('\(','', regex=True)

    # replace ")"
    df = df.replace('\)','', regex=True)
    
    # remove spaces
    df = df.replace('\ ','', regex=True)
    
    # set value of label
    df.loc[:, ["label"]] = LineNumber

    # export new file with "FilenameNew" without index
    df.to_csv(PathConverted + FilenameNew, index=False)


## Calculate amplitude

- In this section we calculate the amplitude of the csi file

In [5]:
# loop through config file "FileMapping"
for ind in dfFiles.index:
        
    # get filenames
    FilenameNew = (dfFiles['FilenameNew'][ind])
    
    # rename filename with amplitude label
    FILEEXTENSION = FilenameNew.replace(".csv", "_a.csv")

    # first check if file exists
    if not SubOperationSystem.checkIfFileExists(PathConverted + FilenameNew, True):
        exit()
    
    else:
        # if output file exists -> next file
        if SubOperationSystem.checkIfFileExists(PathConverted + FILEEXTENSION, False):
            continue
    
    # read converted dataset
    dfAmplitude = pd.read_csv(PathConverted + FilenameNew)
    
    # get count of columns (the last column is the label)
    ENDCOLUMN = len(dfAmplitude.columns)-1

    # loop through rows
    for MAINCOUNTER in range(len(dfAmplitude)):
        
        # loop through columns
        for COUNTER in range(0, ENDCOLUMN):
        
            # split values in cell, separated by comma
            (IMAGINAR,REAL) = (dfAmplitude.iloc[MAINCOUNTER, COUNTER]).split(',')
        
            # calculate amplitude and phase
            AMPLITUDE = (round(math.sqrt(float(IMAGINAR)** 2 + float(REAL)** 2),4))
            PHASE = (round(math.atan2(float(IMAGINAR), float(REAL)),4))
            
            # set new value
            dfAmplitude.iloc[MAINCOUNTER, COUNTER] = AMPLITUDE
            
    # Export dataset without index and other file name
    dfAmplitude.to_csv(PathConverted + FILEEXTENSION, index=False)
    

## Value for Hampelfilter

The script "PreprocessingControll.ipynb" was executed to determine the value for the hampel filter. The calculated value is 6 and was used in the next step. See "PreprocessingControll.ipynb" for more details.


Call **PreprocessingControll.ipynb** if nessessary


# Remove outlier

In [6]:
# import hampel filter
from hampel import hampel

# loop through config file "FileMapping"
for ind in dfFiles.index:
    
    # get filenames
    LineNumber,FilenameNew = (dfFiles['LineNumber'][ind], dfFiles['FilenameNew'][ind])
        
    # rename filename with ampitude label
    DataFileAmplitude = FilenameNew.replace(".csv", "_a.csv")
    
    # check if file exists
    if not SubOperationSystem.checkIfFileExists(PathConverted + DataFileAmplitude, True):
        exit()

    else:
        # rename filename with hampel label
        DataFileFilter = FilenameNew.replace(".csv", "_ah.csv")

        # if output file exists -> next file
        if SubOperationSystem.checkIfFileExists(PathConverted + Delimiter + DataFileFilter, False):
            continue

    # read csv file to dataframe
    dfCSI = pd.read_csv(PathConverted + Delimiter + DataFileAmplitude)

    # copy df to numpy array, because hampel need numpy array
    ArrayCsiUnfiltered = dfCSI.to_numpy()
    ArrayCsiFiltered = dfCSI.to_numpy()

    # hampel filter to remove outlier
    for y in range(dfCSI.shape[1]):
        ArrayCsiFiltered[:,y] = hampel(ArrayCsiUnfiltered[:,y],window_size=6, n_sigma=3.0).filtered_data

        # Round hampel values from 15 to 4 decimal places
        ArrayCsiFilteredRounded = np.round(ArrayCsiFiltered, 4)
    
    # add columnnames
    dfFilter = pd.DataFrame(ArrayCsiFilteredRounded,columns=(list(dfCSI.columns.values.tolist())))
    
    # add label column and add value of label
    dfFilter['label'] = None
    dfFilter['label'] = dfFilter['label'].fillna(LineNumber)
    
    # save pandas dataframe to file
    dfFilter.to_csv(PathConverted + DataFileFilter, index=False)
  


## Create Scenario for Machine Learning

In [7]:
# open scenario mapping file to read
dfScenario = pd.read_csv(PathConfig + FileScenario, index_col=0)

# Read Config File: files
dfFiles = pd.read_csv(PathConfig + FileMapping, index_col=0)

# loop through scenario dataframe
for ind in dfScenario.index:
    
    # get scenarios and dataset from dataframe
    Scenario,Datasets = (dfScenario['Scenario'][ind], dfScenario['Datasets'][ind])

    # create scenario file with all scenario datasets to write
    FileScenario = open(PathScenario + Scenario + "_ah.csv", 'w')
    
    # split to dataset items because we need a int value
    DatasetItems=list(Datasets.split())
    
    # loop through list
    for DatasetItem in DatasetItems:
    
        # filenames of needed dataset in the column 'FilenameNew'
        DatasetFilenames = dfFiles.loc[int(DatasetItem)]['FilenameNew']
                
        # rename filename with amplitude label
        FileDataHampel = DatasetFilenames.replace(".csv", "_ah.csv")
                
        # open Dataset to read
        FileFilter = open(PathConverted + FileDataHampel, "r")
        
        # create dataframe hampel
        dfFilter = pd.read_csv(FileFilter)
        
        # add label column and add value of label
        dfFilter['label'] = None
        dfFilter['label'] = dfFilter['label'].fillna(DatasetItem)
        
        # append DataFrame to file
        if FileScenario.tell() == 0:
            # if file is empty write header
            dfFilter.to_csv(FileScenario, index=False, line_terminator='\n')
        else:
            dfFilter.to_csv(FileScenario, index=False, line_terminator='\n', header=False)
                    
        # close file to read
        FileFilter.close()
        
    # close scenario file
    FileScenario.close()  
