Code for pretraining neural operator transformers on multiple PDE datasets. We will update more details soon.
All datasets are stored using hdf5 format, containing data
field. Some datasets are stored with individual hdf5 files, others are stored within a single hdf5 file.
In data_generation/preprocess.py
, we have the script for preprocessing the datasets from each source. Download the original file from these sources and preprocess them to /data
folder.
Dataset | Link |
---|---|
FNO data | Here |
PDEBench data | Here |
PDEArena data | Here |
CFDbench data | Here |
In utils/make_master_file.py
, we have all dataset configurations. When new datasets are merged, you should add a configuration dict. It stores all relative paths so that you could run on any places.
mkdir data
Now we have a single GPU pretraining code script train_temporal.py
, you could start it by
python train_temporal.py --model FNO --train_paths ns2d_fno_1e-5 --test_paths ns2d_fno_1e-5 --gpu 0
to start a training process.
Or you could start it by writing a configuration file in configs/ns2d.yaml
and start it by automatically using free GPUs with
python trainer.py --config_file ns2d.yaml
python parallel_trainer.py --config_file ns2d_parallel.yaml
Now I use yaml as the configuration file. You could specify parameters for args. If you want to run multiple tasks, you could move parameters into the tasks
,
model: DPOT
width: 512
tasks:
lr: [0.001,0.0001]
batch_size: [256, 32]
This means that you start 2 tasks if you submit this configuration to trainer.py
.
Install the following packages via conda-forge
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia
conda install matplotlib scikit-learn scipy pandas h5py -c conda-forge
conda install timm einops tensorboard -c conda-forge
README.md
train_temporal.py
: main code of single GPU pre-training auto-regressive modeltrainer.py
: framework of auto scheduling training tasks for parameter tuningutils/
criterion.py
: loss functions of relative errorgriddataset.py
: dataset of mixture of temporal uniform grid datasetmake_master_file.py
: datasets config filenormalizer
: normalization methods (#TODO: implement instance reversible norm)optimizer
: Adam/AdamW/Lamb optimizer supporting complex numbersutilities.py
: other auxiliary functions
configs/
: configuration files for pre-training or fine-tuningmodels/
dpot.py
: DPOT modelfno.py
: FNO with group normalizationmlp.py
data_generation/
: Some code for preprocessing data (ask hzk if you want to use them)darcy/
ns2d/