### Inhaltsverzeichnis

* [1. Packete Impotieren](#import_packages)
* [2. Daten einlesen](#read_data)
* [3. Encoding](#encoding)

## 1. Packete Impotieren <a class="anchor" id="import_packages"></a>

In [1]:
import pandas as pd
import numpy as np

from sklearn import preprocessing

import glob
import os

## 2. Daten einlesen <a class="anchor" id="read_data"></a>

<img src="img/model.png" alt="Model" style="width: 700px; float: left;"/>

In [2]:
path = './../CSVs/mergedCSVs'
all_files = glob.glob(path + "/*.csv")

dataframes = []
dataframeNames = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    
    dataframes.append(df)
    dataframeNames.append(os.path.basename(filename))

## 3. Encoding <a class="anchor" id="encoding"></a>

Alle Dataframes haben die gleichen Kategorischen Spalten.

**Nominale Merkmale:**
- region
- gender
- code_module
- disability

**Oridnale Merkmale:**
- imd_band
- age_band
- highest_education
- code_presentation

In [3]:
nominalFeatures = ['region',
                   'gender',
                   'code_module',
                   'disability']

ordinalFeatures = ['imd_band',
                   'age_band',
                   'highest_education',
                   'code_presentation']

In [4]:
for idx in range(len(dataframes)):
    
    # Dataframe und Namen bekommen
    df = dataframes[idx]
    dfName = dataframeNames[idx]
    
    # Alle Numeric Dataframes
    numericDF = df.select_dtypes(include=np.number)

    # Nominale Merkmale Encoden
    for entry in nominalFeatures:
        numericDF = numericDF.join(pd.get_dummies(df[entry]))
        
    # Ordinale Merkmale Encoden 
    label_encoder = preprocessing.LabelEncoder()
    
    for entry in ordinalFeatures:
        numericDF[entry] = label_encoder.fit_transform(df[entry])
    
    # Student ID wird nicht benötigt
    numericDF = numericDF.drop(columns=['id_student'])
    
    finalDF = numericDF.join(df['final_result'])
    
    # Dataframe abspeichern
    finalDF.to_csv('./../CSVs/encodedCSVs/' + dfName, index=False)