# Feature Selection

This user guide details how DNAMite can be used for feature selection / feature-sparse prediction.

### Why Bother with Feature Selection?

When training a black-box machine learning model, it is common practice to use all available features even for high-dimensional datasets, as modern ML models can easily handle many features. When training a glass-box model, however, we need to care about both predictive performance as well as accurate and utility of explanations. While glass-box models often have good accurate on high-dimensional datasets, model explanations are much more likely to be impaired in such settings. In particular, when sets of correlated features are all used in the same dataset, additive models like DNAMite run into identifiability issues with how to spread contribution across the feature set.

### DNAMite Example

In [4]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns 
sns.set_theme()
from sklearn.model_selection import train_test_split
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"

df_train = pd.read_csv("mortality_tab_train.csv")
X_train = df_train.drop(["target"], axis=1)
y_train = df_train["target"]

df_test = pd.read_csv("mortality_tab_test.csv")
X_test = df_test.drop(["target"], axis=1)
y_test = df_test["target"]

In [7]:
from dnamite.models import DNAMiteBinaryClassifier

model = DNAMiteBinaryClassifier(n_features=X_train.shape[1], device=device, fit_pairs=False)
model.fit(X_train, y_train)

Discretizing features...


100%|██████████| 714/714 [00:00<00:00, 1236.06it/s]


SPlIT 0
TRAINING MAINS


                                                

Early stopping at 8 epochs: Test loss has not improved for 5 consecutive epochs.
SPlIT 1
TRAINING MAINS


                                                

Early stopping at 7 epochs: Test loss has not improved for 5 consecutive epochs.
SPlIT 2
TRAINING MAINS


                                                

Early stopping at 7 epochs: Test loss has not improved for 5 consecutive epochs.
SPlIT 3
TRAINING MAINS


                                                

Early stopping at 7 epochs: Test loss has not improved for 5 consecutive epochs.
SPlIT 4
TRAINING MAINS


                                                

Early stopping at 7 epochs: Test loss has not improved for 5 consecutive epochs.


IndexError: index 0 is out of bounds for axis 0 with size 0

### Hyperparameters

DNAMite has multiple hyperparameters that can be set to control the 