An evaluation framework from the paper Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping?
This repository provides a unified framework for evaluating modern deep neural networks on small tabular datasets, evaluated on 31 field- and farm-scale digital soil mapping datasets from LimeSoDa.
-
Datasets: Uses soil datasets from the LimeSoDa repository with proximal soil sensing and remote sensing features.
-
Models: Implements 15+ models with a unified interface:
- Classical ML: Linear Regression, Ridge, Lasso, PLSR, Random Forest, XGBoost
- MLP-based NNs: MLP, TabM, RealMLP
- Retrieval-based NNs: TabR, ModernNCA
- Attention-based NNs: AutoInt, FT-Transformer, ExcelFormer, T2G-Former, AMFormer
- In-context learning foundation models: TabPFN
-
Configuration: Experiment settings defined via YAML configuration files. Configuration files for datasets with feature-to-sample ratio < 1 are in the config/pss/ folder, while configurations for high-dimensional datasets with ratio > 1 (including MIR/NIR spectroscopy features) are in the config/spectroscopic/ folder.
-
Preprocessing: Built-in support for PCA, feature scaling, numerical embeddings
Requirements: Python 3.10+
pip install -r requirements.txtRun experiments using YAML configuration files:
python benchmark.py --config config/pss/limesoda_mlp.yamlExample configuration files are provided in config/pss/ and config/spectroscopic/ folders.
Complete experimental results, including optimized hyperparameters for all dataset-model combinations and model predictions, are available: results.tar.gz
@misc{barkov2025modern,
title = {Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale {Digital} {Soil} {Mapping}?},
author = {Viacheslav Barkov and Jonas Schmidinger and Robin Gebbers and Martin Atzmueller},
year = {2025},
eprint = {2508.09888},
archiveprefix = {arXiv},
primaryclass = {cs.LG},
url = {https://arxiv.org/abs/2508.09888},
}