<h1 style="text-align:center;">XGBoost Alternative Base Learners</h1>

### What are Base Learners?

In the context of XGBoost, a base learner is an individual model that contributes to the overall prediction by focusing on specific aspects or patterns in the data. Think of it as a team member in a relay race, where each runner (base learner) has a specific section of the track (part of the problem) they're best at.

XGBoost, being a boosting algorithm, works by combining multiple base learners to form a strong predictive model. Each new base learner tries to correct the errors made by the previous ones, leading to a more accurate and robust model over time.

The term "base learners" in the context of ensemble learning methods like XGBoost refers to the individual models that constitute the building blocks of the final, more complex model. Let's break this down in a simple, Feynman-style explanation.

#### Analogy: Building a House

Imagine you're building a house. Instead of constructing it in one go, you start with smaller, foundational elements like bricks or wooden beams. Each of these elements is essential, but on their own, they're not very functional as a house. However, when you combine them strategically, they form the robust and complex structure of a house.

### Base Learners in XGBoost

In XGBoost:

- **The "House"**: The final predictive model you're trying to build.
- **The "Bricks/Wooden Beams"**: These are the base learners. Each base learner is a simple model (like a decision tree in the case of `gbtree`).

###
Base learners are called so because they are the basic, fundamental models that serve as the starting point in creating a more sophisticated and powerful ensemble model through methods like boosting in XGBoost. Each one contributes a part of the knowledge, and together, they form a comprehensive, strong predictive model.
 Why "Base"?

- **Foundational Role**: Just like bricks in a house, each base learner forms a part of the foundation of the overall model. They're the starting point.
- **Building Blocks**: They are the basic units that are combined and improved upon to create a more complex and powerful model.
- **Simplicity**: Each base learner is usually a simple model. Their simplicity allows for specialization – each one learns and adapts to different parts or patterns in the data.
- **Combined Strength**: Individually, these learners might be weak (like a single brick isn't much use in a storm), but when combined (like bricks in a house), they significantly improve the mode

### Understanding "Learners"

In the realm of machine learning, a "learner" is a model that learns from data. This learning process involves recognizing patterns, making predictions, or uncovering insights based on the input data it is trained on. The key attributes of learners in this context include:

1. **Pattern Recognition**: Learners analyze data to identify patterns. For example, in a dataset of housing prices, a learner might discover that larger houses tend to be more expensive.

2. **Adaptation**: They adapt their internal parameters based on the data they're exposed to. This is similar to how a student learns new topics in school and gets better with practice and study.

3. **Predictive Ability**: Once trained, learners can make predictions on new, unseen data. If you've trained a learner on past housing data, it can estimate the price of a new house it's never seen before, based on its learned patterns.

4. **Improvement Over Time**: As they are exposed to more data or more rounds of training, learners usually improve. Their predictions become more accurate, and they become better at handling complex data.

### In the Context of XGBoost

- **Individual Decision Makers**: Each base learner in XGBoost can be thought of as an individual decision-maker. It makes predictions based on the part of the data it has been trained on.

- **Sequential Improvement**: In boosting methods like XGBoost, each subsequent learner pays more attention to the errors made by previous learners. This is a learning process where each step builds upon the previous one, refining and improving the overall predictions.

- **Collective Knowledge**: While each learner has its own set of skills and knowledge, the real power lies in their combination. Together, they form a comprehensive model that leverages the collective learning from all individual learners.

In summary, the "learners" in "base learners" are individual models that learn from data. They are the active components in the machine learning process, capable of understanding, adapting, and making predictions based on the data they're trained on. In ensemble methods like XGBoost, these learners work together, each contributing its piece of knowledge, to form a more accurate and robust predictive model.ng predictive model.

### Types of Base Learners in XGBoost

1. **gbtree**: This is the default and most commonly used base learner in XGBoost. It stands for Gradient Boosted Trees. Each gbtree is a decision tree that's optimized to correct the mistakes of the preceding trees in the series.

2. **gblinear**: This base learner uses linear models instead of trees. It's like switching from a sprinter who's great at curves to a long-distance runner for a straight stretch in our relay analogy. It's particularly useful when the relationship in your data is more linear.

3. **dart**: DART stands for Dropouts meet Multiple Additive Regression Trees. It's a variation of gbtree, with a twist: during training, it randomly drops (ignores) some of the trees. This is akin to occasionally skipping a runner to prevent over-reliance on any particular one, enhancing the model's ability to generalize and reducing overfitting.

### Code Examples

Now, let's look at the code examples provided in your file for each of these base learners. I'll retrieve the relevant code snippets.

It appears that there are no direct code snippets associated with `gbtree`, `gblinear`, or `dart` in the provided content of the file. This could be due to the specific structure of the file or the way the content is organized.

However, I can provide a general idea of how these base learners might be implemented in code using XGBoost in Python. Keep in mind that these are illustrative examples and may not exactly match the content of your file.

### Illustrative Code Examples

1. **gbtree**:
   ```python
   import xgboost as xgb

   # parameters for gbtree
   params = {
       'objective': 'binary:logistic',
       'booster': 'gbtree',
       # other parameters as needed
   }

   # training the model
   model = xgb.train(params, dtrain, num_boost_round=10)
   ```

2. **gblinear**:
   ```python
   import xgboost as xgb

   # parameters for gblinear
   params = {
       'objective': 'reg:linear',
       'booster': 'gblinear',
       # other parameters as needed
   }

   # training the model
   model = xgb.train(params, dtrain, num_boost_round=10)
   ```

3. **dart**:
   ```python
   import xgboost as xgb

   # parameters for dart
   params = {
       'objective': 'reg:linear',
       'booster': 'dart',
       'sample_type': 'uniform',
       'normalize_type': 'tree',
       'rate_drop': 0.1,  # dropout rate
       # other parameters as needed
   }

   # training the model
   model = xgb.train(params, dtrain, num_boost_round=10)
   ```

In these examples, `dtrain` would be your training dataset, and `num_boost_round` specifies the number of boosting rounds. The parameters within `params` vary based on the base learner and the specific problem you're addressing.

If you need more specific examples or explanations from your file, please let me know which parts to focus on.

In [54]:
import pandas as pd

import numpy as np

from sklearn.datasets import load_diabetes

from sklearn.model_selection import cross_val_score

from xgboost import XGBRegressor, XGBClassifier, XGBRFRegressor, XGBRFClassifier

from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

from sklearn.linear_model import (LinearRegression, LogisticRegression, 
                                        Lasso, Ridge)

from sklearn.model_selection import GridSearchCV, KFold

from helper_file import *

from sklearn.metrics import mean_squared_error as MSE

import os
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)


os.environ['PYTHONWARNINGS'] = 'ignore::FutureWarning' 

In [2]:
X, y = load_diabetes(return_X_y=True)

In [3]:
# Now, shuffle the data and use 5 splits with KFold using the following parameters

kfold = KFold(n_splits=5, shuffle=True, random_state=43)  

In [4]:
def regression_model(model):

    scores = cross_val_score(model, X, y, scoring='neg_mean_squared_error', cv=kfold)

    rmse = (-scores)**0.5

    return rmse.mean()

In [5]:
regression_model(XGBRegressor(booster='gblinear'))

228.64606316805893

In [6]:
regression_model(LinearRegression())

55.22870075899542

In [7]:
regression_model(Lasso())

61.90246701513051

In [8]:
regression_model(Ridge())

58.67555683787269

In [9]:
regression_model(XGBRegressor(booster='gbtree'))

64.73701409766304

The `gbtree` performs worst than the `gblinear` meaning a linear model is best for this kind of problem. Infact, the simple linear regression did perform better than xgboost with the gbtree as a base learner.

The hyperparameters for the `gblinear` booster in XGBoost are designed specifically for linear models, differing significantly from the `gbtree` booster's hyperparameters, which are more suitable for tree-based models. Let's break down these hyperparameters and explain some new terms:

1. **reg_lambda**
   - **Explanation**: It's a regularization term (L2, similar to Ridge Regression) that helps to prevent overfitting by penalizing large coefficients.
   - **Default**: 0
   - **Range**: [0, ∞)
   - **Example**: Setting `reg_lambda` to a higher value (e.g., 1 or 2) can help reduce model complexity and overfitting.

2. **reg_alpha**
   - **Explanation**: This is another regularization term (L1, similar to Lasso Regression) used to prevent overfitting, particularly useful in situations with high dimensionality.
   - **Default**: 0
   - **Range**: [0, ∞)
   - **Example**: Increasing `reg_alpha` (e.g., setting it to 0.5 or 1) can help in feature selection, as it tends to shrink the coefficients of less important features to zero.

3. **updater**
   - **Explanation**: This parameter dictates the algorithm used for optimizing the linear model in each boosting round.
   - **Options**: `shotgun` (employs hogwild parallelism with coordinate descent for a non-deterministic solution) and `coord_descent` (ordinary coordinate descent for a deterministic solution).
   - **Default**: `shotgun`
   - **Note**: Coordinate descent optimizes the model by improving one parameter at a time while keeping others fixed.

4. **feature_selector**
   - **Explanation**: Determines the method for selecting features during the boosting process.
   - **Options**:
     - `cyclic`: Iterates over features one by one in a fixed order.
     - `shuffle`: Similar to `cyclic`, but shuffles the features in each round.
     - `random`: Chooses features randomly.
     - `greedy`: Selects the feature with the highest gradient magnitude.
     - `thrifty`: Approximately greedy, but reorders features based on the magnitude of their weight changes.
   - **Default**: `cyclic`
   - **Compatibility**: Needs to be paired with an appropriate `updater`. For `shotgun`, use `cyclic` or `shuffle`; for `coord_descent`, use `random`, `greedy`, or `thrifty`.
   - **Note**: The `greedy` method can be computationally expensive, especially for large datasets.

5. **top_k**
   - **Explanation**: Specifies the number of features to be considered by the `greedy` and `thrifty` feature selectors during coordinate descent.
   - **Default**: 0 (considers all features)
   - **Range**: [0, maximum number of features]
   - **Example**: Setting `top_k` to a value like 5 means that only the top 5 features, based on their importance, will be considered in each boosting round when using `greedy` or `thrifty`.

In summary, these hyperparameters allow fine-tuning of the `gblinear` booster in XGBoost, offering control over regularization, feature selection, and the optimization algorithm. Adjusting these settings can significantly impact the model's performance, especially in terms of complexity, overfitting, and computational efficiency.

In [10]:
# Load the diabetes dataset
diabetes_data = load_diabetes()

# Convert the data into a DataFrame
# The independent variables (features)
df_features = pd.DataFrame(diabetes_data.data, columns=diabetes_data.feature_names)

# The dependent variable (target)
df_target = pd.DataFrame(diabetes_data.target, columns=["target"])

# Combining features and target into one DataFrame
df_diabetes = pd.concat([df_features, df_target], axis=1)

# To display the first few rows of the DataFrame
display(df_diabetes.head())

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,target
0,0.038076,0.05068,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019907,-0.017646,151.0
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068332,-0.092204,75.0
2,0.085299,0.05068,0.044451,-0.00567,-0.045599,-0.034194,-0.032356,-0.002592,0.002861,-0.02593,141.0
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022688,-0.009362,206.0
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031988,-0.046641,135.0


In [11]:
def reg_grid_search(params, reg=XGBRegressor(booster='gblinear', objective='reg:squarederror')):

    # Instantiate GridSearchCV as grid_reg
    grid_reg = GridSearchCV(reg, params, scoring='neg_mean_squared_error', cv=kfold)
    
    # Fit grid_reg on X_train and y_train
    grid_reg.fit(X, y)

    # Extract best params
    best_params = grid_reg.best_params_

    # Print best params
    print(f"Best params: {best_params}")
    
    # Compute best score
    best_score = np.sqrt(-grid_reg.best_score_)

    # Print best score
    print(f"Best score: {best_score}")

In [12]:
reg_grid_search(params={'reg_alpha':[0.001, 0.01, 0.1, 0.5, 1, 5]})

Best params: {'reg_alpha': 0.1}
Best score: 55.18001653715114


In [13]:
reg_grid_search(params={'reg_lambda':[0.001, 0.01, 0.1, 0.5, 1, 5]})

Best params: {'reg_lambda': 0.001}
Best score: 56.08981230197895


In [14]:
reg_grid_search(params={'feature_selector':['shuffle']})

Best params: {'feature_selector': 'shuffle'}
Best score: 55.56479371486922


In [15]:
reg_grid_search(params={'feature_selector':['random', 'greedy', 'thrifty'], 'updater':['coord_descent'] })

Best params: {'feature_selector': 'greedy', 'updater': 'coord_descent'}
Best score: 55.29281522007418


In [16]:
reg_grid_search(params={'feature_selector':['greedy', 'thrifty'], 'updater':['coord_descent'], 'top_k':[3, 5, 7, 9]})

Best params: {'feature_selector': 'greedy', 'top_k': 9, 'updater': 'coord_descent'}
Best score: 55.29281522007418


### Linear datasets

Let us create a linear dataset and show that it is truly linear.

In [22]:
#Set the range of X values from 1 to 100
X = np.arange(1,100)

#Declare a random seed using NumPy to ensure the consistency of the results
np.random.seed(2) 

#Create an empty list defined as y
y = []

# Loop through X, multiplying each entry by a random number from -0.2 to 0.2
for i in X:
    y.append(i*np.random.uniform(-0.2, 0.2))

# Transform y to a numpy array for machine learning
y = np.array(y)

# Reshape X and y so that they contain as many rows as members in the array 
# and one column since columns are expected as machine learning inputs with scikit-learn
X = X.reshape(X.shape[0], 1)
y = y.reshape(y.shape[0], 1)

We now have a linear dataset that includes randomness in terms of X and y.

Let's run the regression_model function again with gblinear as the base learner:

In [23]:
regression_model(XGBRegressor(booster='gblinear', objective='reg:squarederror'))

6.3469143783045165

In [24]:
# Run the regression_model function with gbtree as the base learner

regression_model(XGBRegressor(booster='gbtree', objective='reg:squarederror'))

9.275567742823819

It is clear from the above that `gblinear` performs much better in our constructed linear dataset.

We also try `LinearRegression` on the same dataset next.

In [25]:
regression_model(LinearRegression())

6.346935807599345

`gblinear` is a good choice when linear models might outperform tree-based ones. It slightly surpassed `LinearRegression` in real and simulated datasets. Ideal for large, linear datasets, `gblinear` also works for classification tasks, which we will soon explore.

### Comparing dart
The base learner` dar`t is similar to` gbtre`e in the sense that both are gradient boosted tree; they differ  primailr in that ` dar`t removes trees (called dropout) during each round of boosting.

In [29]:
X, y = load_diabetes(return_X_y=True)
regression_model(XGBRegressor(booster='dart', objective='reg:squarederror'))

64.73701334758088

The `dart` base learner gives similar result as the `gbtree` base learner. The similarity of results is on account of the small dataset and the success of the `gbtree` default hyperparameters to prevent overfitting without requiring the dropout technique.

Let's see how dart performs compared to gbtree on a larger dataset; we use classification this time around.

### `dart` with XGBClassifier

In [45]:
census_df = pd.read_csv(file_path)
census_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       303 non-null    int64  
 1   sex       303 non-null    int64  
 2   cp        303 non-null    int64  
 3   trestbps  303 non-null    int64  
 4   chol      303 non-null    int64  
 5   fbs       303 non-null    int64  
 6   restecg   303 non-null    int64  
 7   thalach   303 non-null    int64  
 8   exang     303 non-null    int64  
 9   oldpeak   303 non-null    float64
 10  slope     303 non-null    int64  
 11  ca        303 non-null    int64  
 12  thal      303 non-null    int64  
 13  target    303 non-null    int64  
dtypes: float64(1), int64(13)
memory usage: 33.3 KB


In [46]:
census_X, census_y = splitX_y(census_df, 'target')

print(f"shape of target vector: {census_y.shape}")
print(f"shape of feature matrix: {census_X.shape}")

shape of target vector: (303,)
shape of feature matrix: (303, 13)


In [47]:
def classification_model(model):

    scores = cross_val_score(model, census_X, census_y, scoring='accuracy', cv=kfold)

    return scores.mean()

In [55]:
classification_model(XGBClassifier(booster='gbtree'))

0.795136612021858

In [57]:
classification_model(XGBClassifier(booster='dart'))

0.795136612021858

Seeing how close to the outputs are, it's unclear whether trees have been dropped or the dropping of trees has had zero effect.

We want to also try out `gblinear` on the data set. `gblinear` can work on classification problems by using the sigmoid function to scale the weights just like it does with logistic regression.

In [58]:
classification_model(XGBClassifier(booster='gblinear'))

0.782295081967213

Comparing that with logistic regression.

In [59]:
classification_model(LogisticRegression(max_iter=1000))

0.8281420765027322

In this scenario, `gblinear` from XGBoost slightly underperforms logistic regression; it's important to note that for classification tasks, XGBoost's `gblinear` serves as a strong substitute to traditional logistic regression.

### DART Hyperparameters Overview
DART, an extension of the gbtree approach in XGBoost, introduces unique hyperparameters. These control the dropping of trees, influencing model behavior and performance..

Key DART-specific Hyperparameters:

1. **Sample Type (`sample_type`)**:
   - **Purpose**: Dictates the method for tree dropout.
   - **Options**: 'uniform' (equal chance for all trees) or 'weighted' (chance based on tree weights).
   - **Default**: 'uniform'.
   - **Range**: ['uniform', 'weighted'].

2. **Normalize Type (`normalize_type`)**:
   - **Purpose**: Determines how new tree weights are calculated relative to dropped trees.
   - **Options**: 'tree' (equal to individual dropped tree weight) or 'forest' (equal to cumulative weight of dropped trees).
   - **Default**: 'tree'.
   - **Range**: ['tree', 'forest'].

3. **Rate Drop (`rate_drop`)**:
   - **Purpose**: Sets the proportion of trees to be dropped.
   - **Default**: 0.0 (no dropout).
   - **Range**: [0.0, 1.0].

4. **One Drop (`one_drop`)**:
   - **Purpose**: Guarantees at least one tree drop per round.
   - **Default**: 0 (no guarantee).
   - **Range**: [0, 1].

5. **Skip Drop (`skip_drop`)**:
   - **Purpose**: Adjusts the likelihood of skipping dropout entirely.
   - **Default**: 0.0 (equal probability of dropping each tree).
   - **Range**: [0.0, 1.0].

Experimenting with these parameters allows for fine-tuning DART's behavior, potentially enhancing model scores.

In [60]:
classification_model(XGBClassifier(booster='dart', one_drop=1))

0.8049180327868852

In [61]:
regression_model(XGBRegressor(booster='dart', objective='reg:squarederror', sample_type='weighted'))

64.73701334758088

In [62]:
regression_model(XGBRegressor(booster='dart', objective='reg:squarederror', normalize_type='forest'))

64.73701334758088

In [64]:
regression_model(XGBRegressor(booster='dart', objective='reg:squarederror', one_drop=1))

63.211287303891616

When it comes to `rate_drop`, the percentage of trees that will be dropped, a range of percentages may be used with the `reg_grid_search` function as follows:

In [65]:
reg_grid_search(params={'rate_drop':[0.01, 0.1, 0.2, 0.4]}, reg=XGBRegressor(booster='dart', objective='reg:squarederror', one_drop=1))

Best params: {'rate_drop': 0.2}
Best score: 61.924817223973925


In [66]:
reg_grid_search(params={'skip_drop':[0.01, 0.1, 0.2, 0.4]}, reg=XGBRegressor(booster='dart', objective='reg:squarederror'))

Best params: {'skip_drop': 0.01}
Best score: 64.79907448108045


### Understanding DART in XGBoost:
DART stands out as a notable choice in XGBoost. It takes all gbtree parameters, making it simple to switch from gbtree to DART while tweaking settings. Essentially, the benefit lies in experimenting with DART-specific parameters like one_drop, rate_drop, normalize, among others, to explore further improvements. Trying DART as a foundational component in XGBoost model development is certainly a worthwhile endeavor.

With this grasp of DART, let's now shift our focus to random forests.

### Exploring Random Forests in XGBoost
In XGBoost, random forests can be implemented in two ways. One approach is to use random forests as the base learner. The other is to employ XGBoost's specialized versions: `XGBRFRegressor` for regression tasks and `XGBRFClassifier` for classification. We begin by considering random forests as alternative base learners within the XGBoost framework.

In XGBoost, creating a random forest isn't as straightforward as setting a booster hyperparameter to 'random forest'. Instead, tweak the 'num_parallel_tree' hyperparameter. By default, it's 1, meaning each boosting round builds one tree. Increase this number, and you turn gbtree (or dart) into a boosted random forest. Here, each round constructs several trees in parallel, essentially forming a forest.

Quick Overview of `num_parallel_tree`:
- It specifies the tree count in each boosting round.
- Default is 1.
- Range is [1, infinity).
- A value greater than 1 morphs the booster into a random forest.

With multiple trees per round, the learner isn't just a single tree but a forest. As XGBoost shares hyperparameters with traditional random forests, setting 'num_parallel_tree' above 1 effectively creates a random forest base learner.

Practical Experiment:
- Use `XGBRegressor` with `booster=gbtree` and `num_parallel_tree=25`. This implies each boosting round has 25 trees.
- The observed score was 65.96604877151103, almost identical to boosting a single gbtree. This similarity arises because gradient boosting learns from previous trees' mistakes, and a robust initial random forest leaves little room for improvement.

Key Insight:
Gradient boosting excels through its learning process. It's often more effective to use a smaller `num_parallel_tree` value, like 5.

Experiment with `num_parallel_tree=5`:
- The score marginally improved to 65.96445649315855, a subtle but real enhancement.

Conclusion:
In XGBoost, lower `num_parallel_tree` values often yield better random forests. With this understanding of random forest implementation in XGBoost, the next step is to explore building random forests as original XGBoost models.

### Building Random Forests with XGBoost Models:
XGBoost isn't just for gradient boosting; it also offers tools for creating random forests, namely XGBRFRegressor and XGBRFClassifier.

As per the XGBoost documentation (check it out [here](https://xgboost.readthedocs.io/en/latest/tutorials/rf.html)), their random forest implementation is still experimental. This means default settings might change. Here's a snapshot of the defaults as of 2020:

1. **Number of Estimators (`n_estimators`)**:
   - In XGBRFRegressor and XGBRFClassifier, use 'n_estimators' instead of 'num_parallel_tree'.
   - Default: 100 (translates to the number of parallel trees).
   - Range: [1, infinity).

2. **Learning Rate (`learning_rate`)**:
   - Unlike gradient boosters, XGBRFRegressor and XGBRFClassifier generally don't benefit from adjusting the learning rate since they involve a single round of tree building.
   - Default: 1.
   - Range: [0, 1].

3. **Subsampling Parameters (`subsample`, `colsample_by_node`)**:
   - These are set lower than in Scikit-learn's random forest, reducing the chance of overfitting.
   - Defaults: 0.8.
   - Range: [0, 1].

Remember, when using `XGBRFRegressor` and `XGBRFClassifier`, the method is more akin to bagging, as seen in traditional random forests, rather than gradient boosting. This distinction is key in understanding how these models function and should be configured.

In [67]:
regression_model(XGBRFRegressor(objective='reg:squarederror'))

58.82439992207376

In [68]:
regression_model(RandomForestRegressor())

58.80168283500812

In [69]:
classification_model(XGBRFClassifier())

0.8118579234972676

In [70]:
classification_model(RandomForestClassifier())

0.8117486338797815

### Exploring XGBoost's Random Forest Capabilities:
In XGBoost, setting `num_parallel_tree` above 1 turns your base learner into a random forest. However, remember, boosting thrives on learning from simpler models. Hence, keep `num_parallel_tree` close to 1 for most cases. Use random forests as base learners judiciously, particularly if single-tree boosting isn't cutting it.

On the flip side, XGBoost's own random forest implementations, XGBRFRegressor and XGBRFClassifier, offer a solid alternative to scikit-learn's versions. They've shown comparable, if not better, performance. Considering XGBoost's reputation in the machine learning community, these tools are certainly worth integrating into your toolkit.