# Aging Oven - Data Preparation for Process Mining

## Setup

### Confidentiality
**All information in this document is strictly confidiental**
**Copyright (C) 2019 HES-SO Valais-Wallis - All Rights Reserved**

### Import sub-modules

In [35]:
# Import required sub-modules

# python
import sys
import os
import enum
# import datetime

# iPython
import IPython
# from IPython.display import display

# pandas
import pandas as pd

# numpy
import numpy as np

# plotly
import plotly as ply
# import plotly.figure_factory as ff
ply.offline.init_notebook_mode(connected=True)

# Verbose what we are working with
print("python: {}".format(sys.version))
print("    - os")
print("    - datetime")
print("    - enum")
print("ipython {}".format(IPython.__version__))
print("pandas: {}".format(pd.__version__))
print("numpy: {}".format(np.__version__))
print("plotly: {}".format(ply.__version__))

python: 3.7.1 (default, Dec 14 2018, 19:28:38) 
[GCC 7.3.0]
    - os
    - datetime
    - enum
ipython 7.2.0
pandas: 0.23.4
numpy: 1.15.4
plotly: 3.7.1


### Configurations

In [36]:
# Setup local input directories
inputDir = "input/"

if (os.path.exists(inputDir)) is False:
    os.makedirs(inputDir)
if (os.path.isdir(inputDir)) is False:
    raise NotADirectoryError("{} is not a directory".format(inputDir))

In [37]:
# Setup local input directories
outputDir = "output/"

if (os.path.exists(outputDir)) is False:
    os.makedirs(outputDir)
if (os.path.isdir(outputDir)) is False:
    raise NotADirectoryError("{} is not a directory".format(outputDir))

In [38]:
# Graph output type
class GraphOutputOption(enum.Enum):
    none = 'none'            # Do not generate any plots
    inline = 'inline'        # Generate inline plots only
    htmlFile = 'htmlFile'    # Generate plots in external HTML files
    both = 'both'            # Generate plots both inline and in external html files


notebookGraphingOutputs = GraphOutputOption('both')
GraphAutoOpenHTML = False              # Auto opn external HTML files [True/False]

## Data Import

### Data source

Prepared csv file from jerome: ``2018_DataRevenuPresse_V2_Clean.csv``

### CSV to pandas DF

In [39]:
# Import CSV files into a pandas dataframe
ovenExcelExportFileName = '2018_DataRevenuPresse_V2_Clean.csv'
ovenExcelExportFilePath = inputDir + ovenExcelExportFileName
rawOvenDf = pd.read_csv(ovenExcelExportFilePath, sep=';')

rawOvenDf.head()

Unnamed: 0,ordre,qtePlan,pdtPlan,debut,fin,qteBon,poidsBon,noFour,noCharge,profile,alliage,tempRevenu1,dureeRevenu1,tempRevenu2,dureeRevenu2
0,300306,2,FOUR_7,20180207 083714,20180208 014135,2,430.0,67,2018670046,P62244,6065,153,17.0,,
1,299569,2,FOUR_9,20180110 063615,20180110 234222,2,252.0,65,2018650010,P62147,6084,158,17.0,,
2,301578,44,FOUR_9,20180403 101227,20180404 031148,43,4351.795455,68,2018680107,P62900,6065,153,17.0,,
3,300704,12,FOUR_9,20180306 203310,20180307 133158,12,1450.0,65,2018650071,P63388,6065,153,17.0,,
4,303693,13,FOUR_9,20180514 191318,20180515 051344,13,2837.0,49,2018490139,P63192,6084,140,6.0,175.0,4.0


### Data Header Description

In [40]:
# Count the number of columns in the Data
print("Number of Columns: {}".format(rawOvenDf.shape[1]))
print("Number of Rows   : {}".format(rawOvenDf.shape[0]))
# List all headers
print("Header Elements  : {}".format(list(rawOvenDf)))

Number of Columns: 15
Number of Rows   : 6158
Header Elements  : ['ordre', 'qtePlan', 'pdtPlan', 'debut', 'fin', 'qteBon', 'poidsBon', 'noFour', 'noCharge', 'profile', 'alliage', 'tempRevenu1', 'dureeRevenu1', 'tempRevenu2', 'dureeRevenu2']


***
**'ordre'**: French abbreviation 'ordre de fabrication' or 'O.F.', meanning 'Work Order'. Constellium work order number
***
**'qtePlan'**: To be clarified
***
**'pdtPlan'**: The planned oven location
***
**'debut'**:  The date and time at which the bake for this OF started
***
**'fin'**:  The date and time at which the bake for this OF ended
***
**'qteBon'**:  To be clarified
***
**'poidsBon'**:  To be clarified
***
**'noFour'**: The actual Oven ID in which the bake was done
***
**'noCharge'**:  A unique ID for each oven bake
***
**'profile'**: A unique profile ID number (internal to CVSA)
***
**'alliage'**: CVSA internal Alloy code (Note, it uses the same alloy familly
e.g. 6XXX than the one in the international standard)
***
**'tempRevenu1'**: The duration of the first revenu in the recipe (in hours)
***
**'dureeRevenu1'**: The temperature of the first revenu in the recipe (in °C)
***
**'tempRevenu2'**: The duration of the second revenu, it if exists in the recipe! (in hours)
***
**'dureeRevenu2'**: The temperature of the first revenu, it if exists in the recipe! (in °C)
***

## Data preparation

### Drop unimportant data for processmining

In [41]:
ovenDf = rawOvenDf.drop(['qteBon', 'noCharge', 'pdtPlan', 'poidsBon', 'qtePlan', 'fin', 'tempRevenu1', 'dureeRevenu1', 'tempRevenu2', 'dureeRevenu2'], axis=1)
ovenDf.head()

Unnamed: 0,ordre,debut,noFour,profile,alliage
0,300306,20180207 083714,67,P62244,6065
1,299569,20180110 063615,65,P62147,6084
2,301578,20180403 101227,68,P62900,6065
3,300704,20180306 203310,65,P63388,6065
4,303693,20180514 191318,49,P63192,6084


### Reorder Coloums

In [42]:
cols = ['ordre', 'debut', 'noFour', 'profile', 'alliage']
ovenDf = ovenDf[cols]
ovenDf.head()

Unnamed: 0,ordre,debut,noFour,profile,alliage
0,300306,20180207 083714,67,P62244,6065
1,299569,20180110 063615,65,P62147,6084
2,301578,20180403 101227,68,P62900,6065
3,300704,20180306 203310,65,P63388,6065
4,303693,20180514 191318,49,P63192,6084


## Output data

In [45]:
ovenDf.to_csv(outputDir + 'FullProcess.csv', index=False)