Compact Prediction Tree - Sequence Prediction

An example of sequence prediction based on the CPT algorithm

Docs available here and here

Main part of the algorithm is based on CPT implementation from here https://github.com/NeerajSarwan/CPT

Tweaked for my own needs.

DataSet:

Consisting of over 44k sequences of different lengths (from 2 items to 10) reprensenting over129k items
there exists 370 differents CODES
train.csv file desciption
- ID: gouping id for sequence
- CODE: sequence item
- LINE_NB: position of item in sequence

Target:

Predict the last item in sequence.

Training:

Training the model consist of building the Tree, Inverted index and Lookup Table

Tree: hierachical tree modeling the sequences
Inverted Index: dictionnary giving in which sequence each code is used

here CODE: 'PX9' is used in sequences 8,1,4,5
Lookup Table: dictionnary giving node adress of last element of a sequence:

Predictions:

Concept, For a given sequence:

find all sequences containing any its item using Inverted Index
Rebuild the original sequences using the Lookup Table (avoiding to save the original data)
then for each original similar sequence:
- find position corresponding to the last item in the sequence to predict
- calculate a score for each possible following item (check docs for global description)
return the n elements with biggest score

API:

Simple API using FastAPI and can be run Docker

Next steps:

Due to nature of data, we can see that only the 2 or 3 preceding items are important. Thus, I decided to switch to a more classical approach using Decision tree / Random Forest calssifiers. it also permits more flexible approach allowing additional input feature to improve model performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Compact Prediction Tree - Sequence Prediction

DataSet:

Target:

Training:

Predictions:

API:

Next steps:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Compact Prediction Tree - Sequence Prediction

DataSet:

Target:

Training:

Predictions:

API:

Next steps: