# RandomForestClassifier

In [None]:
RandomForestClassifier(name: str,
                       cursor = None,
                       n_estimators: int = 10,
                       max_features = "auto",
                       max_leaf_nodes: int = 1e9, 
                       sample: float = 0.632,
                       max_depth: int = 5,
                       min_samples_leaf: int = 1,
                       min_info_gain: float = 0.0,
                       nbins: int = 32)

Creates a RandomForestClassifier object by using the Vertica Highly Distributed and Scalable Random Forest on the data. It is one of the ensemble learning methods for classification that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes.

### Parameters

<table id="parameters">
    <tr> <th>Name</th> <th>Type</th> <th>Optional</th> <th>Description</th> </tr>
    <tr> <td><div class="param_name">name</div></td> <td><div class="type">str</div></td> <td><div class = "no">&#10060;</div></td> <td>Name of the model to be stored in the database.</td> </tr>
    <tr> <td><div class="param_name">cursor</div></td> <td><div class="type">DBcursor</div></td> <td><div class = "yes">&#10003;</div></td> <td>Vertica DB cursor.</td> </tr>
    <tr> <td><div class="param_name">n_estimators</div></td> <td><div class="type">int</div></td> <td><div class = "yes">&#10003;</div></td> <td>The number of trees in the forest, an integer between 0 and 1000, inclusive.</td> </tr>
    <tr> <td><div class="param_name">max_features</div></td> <td><div class="type">str</div></td> <td><div class = "yes">&#10003;</div></td> <td>The number of randomly chosen features from which to pick the best feature to split on a given tree node. It can be an integer or one of the two following methods.<br><ul>
                                                        <li><b>auto :</b> square root of the total number of predictors.</li>
                                                        <li><b>max :</b> number of predictors.</li></ul></td> </tr>
    <tr> <td><div class="param_name">max_leaf_nodes</div></td> <td><div class="type">int</div></td> <td><div class = "yes">&#10003;</div></td> <td>The maximum number of leaf nodes a tree in the forest can have, an integer between 1 and 1e9, inclusive.</td> </tr>
    <tr> <td><div class="param_name">sample</div></td> <td><div class="type">float</div></td> <td><div class = "yes">&#10003;</div></td> <td>The portion of the input data set that is randomly picked for training each tree, a float between 0.0 and 1.0, inclusive.</td> </tr>
    <tr> <td><div class="param_name">max_depth</div></td> <td><div class="type">int</div></td> <td><div class = "yes">&#10003;</div></td> <td>The maximum depth for growing each tree, an integer between 1 and 100, inclusive.</td> </tr>
    <tr> <td><div class="param_name">min_samples_leaf</div></td> <td><div class="type">int</div></td> <td><div class = "yes">&#10003;</div></td> <td>The minimum number of samples each branch must have after splitting a node, an integer between 1 and 1e6, inclusive. A split that causes fewer remaining samples is discarded.</td> </tr>
    <tr> <td><div class="param_name">min_info_gain</div></td> <td><div class="type">float</div></td> <td><div class = "yes">&#10003;</div></td> <td>The minimum threshold for including a split, a float between 0.0 and 1.0, inclusive. A split with information gain less than this threshold is discarded.</td> </tr>
    <tr> <td><div class="param_name">nbins</div></td> <td><div class="type">int</div></td> <td><div class = "yes">&#10003;</div></td> <td>The number of bins to use for continuous features, an integer between 2 and 1000, inclusive.</td> </tr>
</table>

### Attributes

After the object creation, all the parameters become attributes. The model will also create extra attributes when fitting the model:

<table id="parameters">
    <tr> <th>Name</th> <th>Type</th>  <th>Description</th> </tr>
    <tr> <td><div class="param_name">classes</div></td> <td><div class="type">list</div></td> <td>List of all the response classes.</td> </tr>
    <tr> <td><div class="param_name">input_relation</div></td> <td><div class="type">str</div></td> <td>Train relation.</td> </tr>
    <tr> <td><div class="param_name">X</div></td> <td><div class="type">list</div></td> <td>List of the predictors.</td> </tr>
    <tr> <td><div class="param_name">y</div></td> <td><div class="type">str</div></td> <td>Response column.</td> </tr>
    <tr> <td><div class="param_name">test_relation</div></td> <td><div class="type">float</div></td> <td>Relation used to test the model. All the model methods are abstractions which will simplify the process. The test relation will be used by many methods to evaluate the model. If empty, the train relation will be used as test. You can change it anytime by changing the test_relation attribute of the object.</td> </tr>
</table>

### Methods

<table id="parameters">
    <tr> <th>Name</th> <th>Description</th> </tr>
    <tr> <td><a href="../Classification/classification_report">classification_report</a></td> <td>Computes a classification report using multiple metrics to evaluate the model (AUC, accuracy, PRC AUC, F1...). In case of multiclass classification, it will consider each category as positive and switch to the next one during the computation.</td> </tr>
    <tr> <td><a href="../Classification/confusion_matrix">confusion_matrix</a></td> <td>Computes the model confusion matrix.</td> </tr>
    <tr> <td><a href="../Classification/deploySQL">deploySQL</a></td> <td>Returns the SQL code needed to deploy the model.</td> </tr>
    <tr> <td><a href="../Classification/drop">drop</a></td> <td>Drops the model from the Vertica DB.</td> </tr>
    <tr> <td><a href="../Classification/export_graphviz">export_graphviz</a></td> <td>Converts the input tree to graphviz.</td> </tr>
    <tr> <td><a href="../Classification/features_importance">features_importance</a></td> <td>Computes the model features importance using the Gini Index.</td> </tr>
    <tr> <td><a href="../Classification/fit">fit</a></td> <td>Trains the model.</td> </tr>
    <tr> <td><a href="../Classification/get_tree">get_tree</a></td> <td>Returns a tablesample with all the input tree information.</td> </tr>
    <tr> <td><a href="../Classification/lift_chart">lift_chart</a></td> <td>Draws the model Lift Chart.</td> </tr>
    <tr> <td><a href="../Classification/plot_tree">plot_tree</a></td> <td>Draws the input tree. The module anytree must be installed in the machine.</td> </tr>
    <tr> <td><a href="../Classification/prc_curve">prc_curve</a></td> <td>Draws the model PRC curve.</td> </tr>
    <tr> <td><a href="../Classification/predict">predict</a></td> <td>Predicts using the input relation.</td> </tr>
    <tr> <td><a href="../Classification/roc_curve">roc_curve</a></td> <td>Draws the model ROC curve.</td> </tr>
    <tr> <td><a href="../Classification/score">score</a></td> <td>Computes the model score.</td> </tr>
    
</table>

### Example

In [5]:
from vertica_ml_python.learn.ensemble import RandomForestClassifier
model = RandomForestClassifier(name = "public.rf_titanic",
                               n_estimators = 20,
                               max_features = "auto",
                               max_leaf_nodes = 32, 
                               sample = 0.7,
                               max_depth = 3,
                               min_samples_leaf = 5,
                               min_info_gain = 0.0,
                               nbins = 32)
print(model)

<RandomForestClassifier>
