This repository is the official implementation of the CIKM2023 paper "FATA-Trans: Field And Time-Aware Transformer for Sequential Tabular Data". Some code scripts are adapted from Tabular Transformers for Modeling Multivariate Time Series.
- Python3 >= 3.9
- torch==1.13.0
- transformers==4.26.1
- tqdm==4.64.1
- scikit-learn==1.2.0
- matplotlib==3.6.2
- numpy==1.24.2
- pandas==1.1.5 These packages can be installed directly by running the following command:
pip install -r requirements.txt
Synthetic transaction dataset is provided in the TabFormer github repoistory.
Amazon product reviews datasets are available at here. In our paper, we used the "5-core" subsets.
- run preprocess_IBM_v2.ipynb or preprocess_amazon_liang.ipynb to split the dataset raw files into train/val/test csv files.
- run preload_dataset.ipynb to excute the first stage processing.
- run either process_IBM_dataset.ipynb or process_amazon_dataset.ipynb to get the model-specific dataset.
- run files named as "run_main_....ipynb" to pretrain, finetune, train from scratch, or expert embeddings from a model. (You can also directly run with main_ibm.py or main_amazon.py).
Linux bash scripts under the directory sh_commands can be used to run these jupyter notebooks mentioned above with the Python module papermill (we used the version 2.4.0). For model or dataset specfic settings, you are reffered to these bash scripts.