Skip to content

Buckets

DrDanielR edited this page Mar 31, 2018 · 25 revisions

This article explains the BucketProcessor, its preprocessing methods and the BucketEntry datatype.

Contents

  1. Outline
  2. BucketEntry
  3. FinalBucketEntry
  4. BucketEventTriggers
  5. How To Run

Outline

BucketEntry is a data type that is needed for converting VaultEntryTypes to a format suitable for machine learning purposes. It holds all relevant and implicit information that is inside a list of VaultEntrys. Since various VaultEntrys influence each other (for example in terms of act times) it is necessary to create a more comprehensive data type capable of modelling these characteristics as they are needed for the machine learning processes. Please see the image below to get a rough overview of the transformation process. For a more detailed description of this process go to the "How To" part in this article.

BucketEntry

BucketEntrys hold the information from the attached VaultEntry and all the relevant information for the timestamp for which the BucketEntry stands. The information stored inside the BucketEntry is used to calculate and set all needed values for the machine lerner.

During the creation of the list of BucketEntrys at least one BucketEntry will be created for each timestamp starting with the timestamp from the first given VaultEntry and ending with the timestamp of the last given VaultEntry.

FinalBucketEntry

FinalBucketEntrys are created out of the BucketEntrys at the end of the VaultEntry data processing and only contain information that is necessary for the machine learner. FinalBucketEntrys are also the entries that are handed to the ML-exporter.

BucketEventTriggers

The BucketEventTriggers class (to be found in the vault.container package) gathers many different lists containing VaultEntryTypes to group them for certain preprocessing steps. The lists are represented as key, value pairs organized in hashmaps. Whenever a new VaultEntryType is needed it has to be added manually to the correct HashSet or HashMap according to its desired handling.

New ML-relevant and onehot VaultEntryTypes are placed into one of these HashSets / HashMaps:

  • TRIGGER_EVENT_ACT_TIME_GIVEN
  • TRIGGER_EVENT_ACT_TIME_TILL_NEXT_EVENT
  • TRIGGER_EVENT_ACT_TIME_ONE
  • TRIGGER_EVENTS_NOT_YET_SET

New ML-relevant but not onehot VaultEntryTypes are placed into on of these HashSets / HashMaps :

  • TRIGGER_EVENT_NOT_ONE_HOT_ACT_TIME_SET
  • TRIGGER_EVENT_NOT_ONE_HOT_ACT_TIME_GIVEN
  • TRIGGER_EVENT_NOT_ONE_HOT_ACT_TIME_TILL_NEXT_EVENT
  • TRIGGER_EVENT_NOT_ONE_HOT_ACT_TIME_ONE
  • TRIGGER_EVENT_NOT_ONE_HOT_VALUE_IS_A_TIMESTAMP

If there are VaultEntryTypes that are to be summed up they have to be added into the HashSet via a new created HashSet that contains all the VaultEntryTypes that should be added together :

  • HASHSETS_TO_SUM_UP

All VaultEntryTypes in this HashSet will be interpolated :

  • HASHSET_FOR_LINEAR_INTERPOLATION

How To Run

To start processing call the function runProcess() from the BucketProcessor class. The following will give a summary of the executed steps:

  • create a list of BucketEntrys out of the given list of VaultEntrys
    • there is atleast one BucketEntry per timestamp starting with the timestamp of the first VaultEntry and ending with the timestamp of the last VaultEntry
  • set and update all the data inside the BucketEntrys
  • remove all unneeded bucketEntry
    • after this the list of BucketEntrys will only have one BucketEntry per timestamp containing all the needed data
  • compute all needed values and merge all values that need to be merged
  • set up data for interpolation
  • run the interpolation
  • create a list of FinalBucketEntrys according to the wanted BucketEntry size
    • FinalBucketEntrys only contain data that relevant for the ML