Skip to content

wueth/High-Cardinality-Covariates-Regularization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

High-Cardinality-Covariates-Regularization

This is the R Code of our paper "High-Cardinality Categorical Covariates in Network Regressions" which can be downloaded from SSRN. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4549049

Abstract. High-cardinality (nominal) categorical covariates are challenging in regression modeling because they lead to high-dimensional models. E.g., in generalized linear models (GLMs), categorical covariates can be implemented by dummy coding which results in high-dimensional regression parameters for high-cardinality categorical covariates. It is difficult to find the correct structure of interactions in high-cardinality covariates, and such high-dimensional models are prone to overfitting. Various regularization strategies can be applied to prevent overfitting. In neural network regressions, a popular way of dealing with categorical covariates is entity embedding, and, typically, overfitting is taken care of by exploiting early stopping strategies. In case of high-cardinality categorical covariates, this often leads to a very early stopping, resulting in a poor predictive model. Building on Avanzi, Taylor, Wang and Wong (arXiv 2023), we introduce new versions of random effects entity embedding of categorical covariates. In particular, having a hierarchical structure in the categorical covariates, we propose a recurrent neural network architecture and a Transformer architecture, respectively, for random effects entity embedding that give us very accurate regression models.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published