-
Notifications
You must be signed in to change notification settings - Fork 0
Buckets
This article explains the BucketProcessor, its preprocessing methods and the BucketEntry datatype.
BucketEntry is a data type that is needed for converting VaultEntryTypes to a format suitable for machine learning purposes. It holds all relevant and implicit information that is inside a list of VaultEntrys. Since various VaultEntrys influence each other (for example in terms of act times) it is necessary to create a more comprehensive data type capable of modelling these characteristics as they are needed for the machine learning processes. Please see the image below to get a rough overview of the transformation process. For a more detailed description of this process go to the "How To" part in this article.
BucketEntrys hold the information from the attached VaultEntry and all the relevant information for the timestamp for which the BucketEntry stands. The information stored inside the BucketEntry is used to calculate and set all needed values for the machine lerner.
During the creation of the list of BucketEntrys at least one BucketEntry will be created for each timestamp starting with the timestamp from the first given VaultEntry and ending with the timestamp of the last given VaultEntry.
FinalBucketEntrys are created out of the BucketEntrys at the end of the VaultEntry data processing and only contain information that is necessary for the machine learner. FinalBucketEntrys are also the entries that are handed to the ML-exporter.
The BucketEventTriggers class (to be found in the vault.container package) gathers many different lists containing VaultEntryTypes to group them for certain preprocessing steps. The lists are represented as key, value pairs organized in hashmaps. Whenever a new VaultEntryType is needed it has to be added manually to the correct HashSet or HashMap according to its desired handling.
New ML-relevant and onehot VaultEntryTypes are placed into one of these HashSets / HashMaps:
- TRIGGER_EVENT_ACT_TIME_GIVEN
- TRIGGER_EVENT_ACT_TIME_TILL_NEXT_EVENT
- TRIGGER_EVENT_ACT_TIME_ONE
- TRIGGER_EVENTS_NOT_YET_SET
New ML-relevant but not onehot VaultEntryTypes are placed into on of these HashSets / HashMaps :
- TRIGGER_EVENT_NOT_ONE_HOT_ACT_TIME_SET
- TRIGGER_EVENT_NOT_ONE_HOT_ACT_TIME_GIVEN
- TRIGGER_EVENT_NOT_ONE_HOT_ACT_TIME_TILL_NEXT_EVENT
- TRIGGER_EVENT_NOT_ONE_HOT_ACT_TIME_ONE
- TRIGGER_EVENT_NOT_ONE_HOT_VALUE_IS_A_TIMESTAMP
If there are VaultEntryTypes that are to be summed up they have to be added into the HashSet via a new created HashSet that contains all the VaultEntryTypes that should be added together :
- HASHSETS_TO_SUM_UP
All VaultEntryTypes in this HashSet will be interpolated :
- HASHSET_FOR_LINEAR_INTERPOLATION
To start processing call the function runProcess() from the BucketProcessor class. The following will give a summary of the executed steps:
- create a list of BucketEntrys out of the given list of VaultEntrys
- there is atleast one BucketEntry per timestamp starting with the timestamp of the first VaultEntry and ending with the timestamp of the last VaultEntry
- set and update all the data inside the BucketEntrys
- remove all unneeded bucketEntry
- after this the list of BucketEntrys will only have one BucketEntry per timestamp containing all the needed data
- compute all needed values and merge all values that need to be merged
- set up data for interpolation
- run the interpolation
- create a list of FinalBucketEntrys according to the wanted BucketEntry
size
- FinalBucketEntrys only contain data that relevant for the ML
Created by OpenDiabetesVault Team 2018