- Build a predictive classification model (ensuring optimal features and classifier).
- Train the model on data entries corresponding to the months of June-Dec, and test the model on data entries corresponding to Feb-March.
- Generate user-bahavior clusters based on the purchasing behavior data for the complete dataset.
- Build a semi-supervised self labelling model to estimate 'Revenue' for the missing records in Oct-Dec (presumably) and then fit your classifier.
- Test classification performance on Feb-March data set with and without the self-labelled data.
https://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset
Of the 12,330 sessions in the dataset, 84.5% (10,422) were negative class samples that did not end with shopping, and the rest (1908) were positive class samples ending with shopping.
Sakar, C.O., Polat, S.O., Katircioglu, M. et al. Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and LSTM recurrent neural networks. Neural Comput & Applic 31, 6893–6908 (2019). https://doi.org/10.1007/s00521-018-3523-0