# Advanced Predictive Models in R: Lasso, Trees, Random Forests, and Boosting

---
This notebook demonstrates how to use more advanced machine learning models in R, including:
- **Lasso regression (linear and logistic, using the `gamlr` package)**
- **Decision Trees**
- **Random Forests**
- **Boosting**

We will use the `banking.csv` dataset as an example. Please ensure you have the required packages installed:

```r
install.packages(c("gamlr", "rpart", "randomForest", "xgboost", "caret", "pROC"))
```


In [ ]:
# Load libraries
library(gamlr)
library(rpart)
library(randomForest)
library(xgboost)
library(caret)
library(pROC)
library(data.table)
library(ggplot2)


In [ ]:
# Load data
data <- fread("banking.csv")
str(data)
summary(data)


## 1. Lasso Regression (Linear and Logistic, using `gamlr`)

### Linear Lasso: Predicting a continuous variable (e.g., age)


In [ ]:
# Prepare data for linear lasso
X <- model.matrix(age ~ . - y, data)[, -1]
y_age <- data$age

# Fit lasso regression
lasso_mod <- gamlr(X, y_age)
plot(lasso_mod)

# Cross-validated lasso
cv_lasso <- cv.gamlr(X, y_age)
plot(cv_lasso)
coef(cv_lasso, select = "min")


### Logistic Lasso: Predicting a binary outcome (e.g., y)


In [ ]:
# Prepare data for logistic lasso
X_bin <- model.matrix(y ~ . - age, data)[, -1]
y_bin <- as.factor(data$y)

# Fit logistic lasso
lasso_logit <- gamlr(X_bin, y_bin, family = "binomial")
plot(lasso_logit)

# Cross-validated logistic lasso
cv_lasso_logit <- cv.gamlr(X_bin, y_bin, family = "binomial")
plot(cv_lasso_logit)
coef(cv_lasso_logit, select = "min")


## 2. Decision Trees


In [ ]:
# Fit a decision tree for classification
tree_mod <- rpart(y ~ . - age, data = data, method = "class")
plot(tree_mod, uniform=TRUE, margin=0.1)
text(tree_mod, use.n=TRUE, all=TRUE, cex=.8)
printcp(tree_mod)


## 3. Random Forests


In [ ]:
# Fit a random forest for classification
set.seed(123)
rf_mod <- randomForest(as.factor(y) ~ . - age, data = data, ntree = 200, importance = TRUE)
print(rf_mod)
varImpPlot(rf_mod)


## 4. Boosting (using xgboost)


In [ ]:
# Prepare data for xgboost
Xmat <- model.matrix(y ~ . - age, data)[, -1]
yvec <- as.numeric(data$y)

# Split into train/test
set.seed(123)
train_idx <- sample(seq_len(nrow(Xmat)), size = 0.8 * nrow(Xmat))
dtrain <- xgb.DMatrix(data = Xmat[train_idx, ], label = yvec[train_idx])
dtest <- xgb.DMatrix(data = Xmat[-train_idx, ], label = yvec[-train_idx])

# Fit xgboost model
xgb_mod <- xgboost(data = dtrain, max.depth = 4, eta = 0.1, nrounds = 100, objective = "binary:logistic", eval_metric = "auc", verbose = 0)

# Feature importance
importance <- xgb.importance(model = xgb_mod)
xgb.plot.importance(importance)

# Predict and evaluate
preds <- predict(xgb_mod, dtest)
roc_obj <- roc(yvec[-train_idx], preds)
plot(roc_obj, main = "ROC Curve (Boosting)")
auc(roc_obj)


## 5. Model Comparison and Summary

You can compare models using cross-validation, ROC/AUC, or other metrics as appropriate for your problem.

This notebook provides a template for using advanced models in R. For your own data, adjust the target variable and predictors as needed.