# Big G Express - Data Exploration

## Team: Elden Ring

<img src="https://eldenring.wiki.fextralife.com/file/Elden-Ring/mirel_pastor_of_vow.jpg" alt="PRAISE DOG" style="width:806px;height:600px;"/>

#### PRAISE THE DOG!

In [134]:
import pandas as pd
import numpy as np
from datetime import datetime

In [153]:
faults = pd.read_csv('../data/J1939Faults.csv', low_memory=False, parse_dates=['EventTimeStamp', 'LocationTimeStamp'], index_col='EventTimeStamp')
service_fault = pd.read_excel('../data/Service Fault Codes_1_0_0_167.xlsx')
vehicle_diagnostic = pd.read_csv('../data/VehicleDiagnosticOnboardData.csv')


  for idx, row in parser.parse():


## Exploratory Data Analysis

In [154]:
print(faults.shape)
print(service_fault.shape)
print(vehicle_diagnostic.shape)

(1187335, 19)
(7124, 14)
(12821626, 4)


Faults joins to vehicle_diagnostic with RecordID = FaultID

Also, columns actionDescription and faultValue in the faults are unused.

`faults['actionDescription'].isna().sum()`

We also remove 2169 EquipmentID that have more than 5 characters

In [155]:
faults = (
    faults.drop(['actionDescription', 'faultValue'], axis=1)
    [faults['EquipmentID'].str.len() <= 5]
)

There are three service locations that appear in the dataset. The fault signals might be going on and off there. In order to eliminate those counts, we check if the Latitutde and Longitude coordinates of the truck are within 0.01 units (in both Lat and Long directions) next to a service location. The 0.01 represent, roughly, the distance of a mile.

Doing so, we eliminate 131778 events.

Few keyponts from questions to Josh Treet: 
- throw 2001 dates out (the super old), mistake with an integer overflow that took a few days to correct
- any time telling before a derate is great
- derates are going to be related to emissions conditions
- coolant level codes (and some others) can often flip between on and off
- derate + light continuing on, it's the same event (a pulse of it)
- spn + fmi together determine the fault code
- most trucks fairly similar/same (within like 4 years)
- maybe costs about $500 if misspredicted potential derate 

In [156]:
for lat, lon in [(36.0666667, -86.4347222), (35.5883333, -86.4438888), (36.1950, -83.174722)]:
    
    faults = faults.loc[~((abs(lat - faults['Latitude']) <= 0.01) &
                          (abs(lon - faults['Longitude']) <= 0.01))]

In [157]:
faults.head()

Unnamed: 0_level_0,RecordID,ESS_Id,eventDescription,ecuSoftwareVersion,ecuSerialNumber,ecuModel,ecuMake,ecuSource,spn,fmi,active,activeTransitionCount,EquipmentID,MCTNumber,Latitude,Longitude,LocationTimeStamp
EventTimeStamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2015-02-21 10:47:13,1,990349,Low (Severity Low) Engine Coolant Level,unknown,unknown,unknown,unknown,0,111,17,True,2,1439,105354361,38.857638,-84.626851,2015-02-21 11:34:25
2015-02-21 11:34:34,2,990360,,unknown,unknown,unknown,unknown,11,629,12,True,127,1439,105354361,38.857638,-84.626851,2015-02-21 11:35:10
2015-02-21 11:35:31,3,990364,Incorrect Data Steering Wheel Angle,unknown,unknown,unknown,unknown,11,1807,2,False,127,1369,105336226,41.42125,-87.767361,2015-02-21 11:35:26
2015-02-21 11:35:33,4,990370,Incorrect Data Steering Wheel Angle,unknown,unknown,unknown,unknown,11,1807,2,True,127,1369,105336226,41.421018,-87.767361,2015-02-21 11:36:08
2015-02-21 11:39:41,5,990416,,22281684P01*22357957P01*22362082P01*,13063430,0USA13_13_0415_2238A,VOLVO,0,4364,17,False,2,1674,105427130,38.416481,-89.442638,2015-02-21 11:39:37


In [158]:
faults.shape

(1053758, 17)

For the columns in faults:
- ESS_Id – the event subscriber service event that contained the fault
- EventTimeStamp – when the event took place
- eventDescription – brief text of meaning of the code (not always present)
- actionDescription – never seen this filled in
- ecuSoftwareVersion – version string from the reporting vehicle computer system
- ecuSerialNumber – Serial number of the reporting Engine Control Module (ECM)
- ecuModel -Model of the reporting ECM
- ecuMake – Manufacturer of the reporting ECM
- ecuSource –
- spn – Fault code being reported
- fmi – Failure Mode associated with the Fault Code
- active – whether the code is being set or being removed
- activeTransitionCount – Number of times code has been set/unset
- faultValue – never seen used
- EquipmentID – Assigned truck number of the unit in question - 1122 different trucks
- MCTNumber – Communications Terminal assigned to the truck
- Latitude – Latitude at time of event
- Longitude – Longitude at time of event
- LocationTimeStamp – Time latitude and longitude were obtained

There are 1045 trucks in the dataset, 1185166 rows; 498 have partial derail, 211 total and there's 182 with both.

Finding out below the trucks that have only partial derail, total, both, or neither.

In [110]:
all_trucks = faults['EquipmentID'].unique()
partial_derate = faults.loc[(faults['spn'] == 1569) & (faults['fmi'] == 31)]['EquipmentID'].unique()
total_derate = faults.loc[faults['spn'] == 5246]['EquipmentID'].unique()

partial_derate_only = partial_derate[np.isin(partial_derate, total_derate, invert=True)]
total_derate_only = total_derate[np.isin(total_derate, partial_derate, invert=True)]
partial_and_total_derate = np.intersect1d(partial_derate, total_derate)
no_derate = all_trucks[np.isin(all_trucks, partial_derate_only, invert=True) | np.isin(all_trucks, total_derate_only, invert=True)]



In [111]:
print(len(partial_derate_only))
print(len(total_derate_only))
print(len(partial_and_total_derate))
print(len(no_derate))

316
29
182
527


In [144]:
faults.loc[faults['spn'] == 5246]['fmi'].value_counts()

0     618
16    143
15    103
19     61
14      8
Name: fmi, dtype: int64

In [159]:
faults.head()

Unnamed: 0_level_0,RecordID,ESS_Id,eventDescription,ecuSoftwareVersion,ecuSerialNumber,ecuModel,ecuMake,ecuSource,spn,fmi,active,activeTransitionCount,EquipmentID,MCTNumber,Latitude,Longitude,LocationTimeStamp
EventTimeStamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2015-02-21 10:47:13,1,990349,Low (Severity Low) Engine Coolant Level,unknown,unknown,unknown,unknown,0,111,17,True,2,1439,105354361,38.857638,-84.626851,2015-02-21 11:34:25
2015-02-21 11:34:34,2,990360,,unknown,unknown,unknown,unknown,11,629,12,True,127,1439,105354361,38.857638,-84.626851,2015-02-21 11:35:10
2015-02-21 11:35:31,3,990364,Incorrect Data Steering Wheel Angle,unknown,unknown,unknown,unknown,11,1807,2,False,127,1369,105336226,41.42125,-87.767361,2015-02-21 11:35:26
2015-02-21 11:35:33,4,990370,Incorrect Data Steering Wheel Angle,unknown,unknown,unknown,unknown,11,1807,2,True,127,1369,105336226,41.421018,-87.767361,2015-02-21 11:36:08
2015-02-21 11:39:41,5,990416,,22281684P01*22357957P01*22362082P01*,13063430,0USA13_13_0415_2238A,VOLVO,0,4364,17,False,2,1674,105427130,38.416481,-89.442638,2015-02-21 11:39:37


In [179]:
# not yet working
#faults.sort_index().rolling('24h').apply(lambda s: s['spn'].value_counts())

DataError: No numeric types to aggregate

## Vehicle Diagnostic

For vehicle diagnostic:
- Id -  the record Id
- Name – the name of the diagnostic
- Value – the value for that diagnostic
- FaultId – foreign key to the QCJ1939Fault record

In [30]:
vehicle_diagnostic.head()

Unnamed: 0,Id,Name,Value,FaultId
0,1,IgnStatus,False,1
1,2,EngineOilPressure,0,1
2,3,EngineOilTemperature,96.74375,1
3,4,TurboBoostPressure,0,1
4,5,EngineLoad,11,1


In [25]:
service_fault.head()

Unnamed: 0,Published in CES 14602,Cummins Fault Code,Revision,PID,SID,MID,J1587 FMI,SPN,J1939 FMI,J2012 Pcode,Lamp Color,Lamp Device,Cummins Description,Algorithm Description
0,Y,111,167,Not Mapped,254,0,12,629,12,P0606,Red,Stop / Shutdown,Engine Control Module Critical Internal Failur...,Error internal to the ECM related to memory ha...
1,Y,112,167,Not Mapped,20,128,7,635,7,Not Mapped,Red,Stop / Shutdown,Engine Timing Actuator Driver Circuit - Mechan...,Mechanical failure in the engine timing actuat...
2,Y,113,167,Not Mapped,20,128,3,635,3,Not Mapped,Amber,Warning,Engine Timing Actuator Driver Circuit - Voltag...,High signal voltage detected at the engine tim...
3,Y,114,167,Not Mapped,20,128,4,635,4,Not Mapped,Amber,Warning,Engine Timing Actuator Driver Circuit - Voltag...,Low voltage detected at the engine timing actu...
4,Y,115,167,190,Not Mapped,Not Mapped,2,612,2,P0008,Red,Stop / Shutdown,Engine Magnetic Speed/Position Lost Both of Tw...,The ECM has detected that the primary and back...
