You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 10, 2025. It is now read-only.
@@ -133,11 +139,14 @@ Here are some explained details for the classifier parameters:
133
139
-**feature_columns**: An iterable containing all the feature columns used by the model. All items in the set should be instances of classes derived from FeatureColumn.
134
140
-**n_classes**: Defaults to 2. The number of classes in a classification problem.
135
141
-**model_dir**: Directory to save model parameters, graph and etc. This can also be used to load checkpoints from the directory into an estimator to continue training a previously saved model.
136
-
-**label_vocabulary**: A list of strings representing all possible label values. If provided, labels must be of string type and their values must be present in label_vocabulary list. If label_vocabulary is omitted, it is assumed that the labels are already encoded as integer or float values within [0, 1] for n_classes=2, or encoded as integer values in {0, 1,..., n_classes-1} for n_classes>2 . If vocabulary is not provided and labels are of string, an error will be generated.
142
+
-**label_vocabulary**: A list of strings representing all possible label values. If provided, labels must be of string type and their values must be present in label_vocabulary list. If label_vocabulary is omitted, it is assumed that the labels are already encoded as integer values within {0, 1} for n_classes=2, or encoded as integer values in {0, 1,..., n_classes-1} for n_classes>2 . If vocabulary is not provided and labels are of string, an error will be generated.
143
+
-**head**: .A `head_lib._Head` instance, the loss would be calculated for metrics purpose and not being used for training. If not provided, one will be automatically created based on params
137
144
-**n_trees**: The number of trees to create. Defaults to 100. There usually isn't any accuracy gain from using higher values (assuming deep enough trees are built).
138
145
-**max_nodes**: Defaults to 10k. No tree is allowed to grow beyond max_nodes nodes, and training stops when all trees in the forest are this large.
139
146
-**num_splits_to_consider**: Defaults to sqrt(num_features). In the extremely randomized tree training algorithm, only this many potential splits are evaluated for each tree node.
140
147
-**split_after_samples**: Defaults to 250. In our online version of extremely randomized tree training, we pick a split for a node after it has accumulated this many training samples.
148
+
-**config**: RunConfig object to configure the runtime settings.
Here are some explained details for the regressor parameters:
174
189
175
-
***feature_columns:** An iterable containing all the feature columns used by the model. All items in the set should be instances of classes derived from `FeatureColumn`.
176
-
***model_dir:** Directory to save model parameters, graph and etc. This can also be used to load checkpoints from the directory into a estimator to continue training a previously saved model.
177
-
***label_dimension:** Defaults to 1. Number of regression targets per example.
178
-
***n_trees:** The number of trees to create. Defaults to 100. There usually isn't any accuracy gain from using higher values.
179
-
***max_nodes:** Defaults to 10,000. No tree is allowed to grow beyond max_nodes nodes, and training stops when all trees in the forest are this large.
180
-
***num_splits_to_consider:** Defaults to `sqrt(num_features)`. In the extremely randomized tree training algorithm, only this many potential splits are evaluated for each tree node.
181
-
***split_after_samples:** Defaults to 250. In our online version of extremely randomized tree training, we pick a split for a node after it has accumulated this many training samples.
182
-
***base_random_seed:**By default (base_random_seed = 0), the random number generator for each tree is seeded by a 64-bit random value when each tree is first created. Using a non-zero value causes tree training to be deterministic, in that the i-th tree's random number generator is seeded with the value base_random_seed + i.
183
-
***config:**`RunConfig` object to configure the runtime settings.
190
+
-**feature_columns:** An iterable containing all the feature columns used by the model. All items in the set should be instances of classes derived from `FeatureColumn`.
191
+
-**model_dir:** Directory to save model parameters, graph and etc. This can also be used to load checkpoints from the directory into a estimator to continue training a previously saved model.
192
+
-**label_dimension:** Defaults to 1. Number of regression targets per example.
193
+
-**head**: .A `head_lib._Head` instance, the loss would be calculated for metrics purpose and not being used for training. If not provided, one will be automatically created based on params
194
+
-**n_trees:**The number of trees to create. Defaults to 100. There usually isn't any accuracy gain from using higher values.
195
+
-**max_nodes:**Defaults to 10,000. No tree is allowed to grow beyond max_nodes nodes, and training stops when all trees in the forest are this large.
196
+
-**num_splits_to_consider:** Defaults to `sqrt(num_features)`. In the extremely randomized tree training algorithm, only this many potential splits are evaluated for each tree node.
197
+
-**split_after_samples:**Defaults to 250. In our online version of extremely randomized tree training, we pick a split for a node after it has accumulated this many training samples.
198
+
-**config:**`RunConfig` object to configure the runtime settings.
184
199
185
200
### First version supported features
186
201
@@ -222,6 +237,11 @@ During inference, for every batch of data, we pass through the tree structure an
222
237
223
238
Since the trees are independent, for the distributed version, we would distribute the number of trees required to train evenly among all the available workers. For every tree, they would have two tf.resources available for training.
224
239
240
+
## Differences from the latest contrib version
241
+
242
+
- Simplified code with only limited subset of features (obviously, excluding all the experimental ones)
243
+
- New estimator interface, support for new feature columns and losses
244
+
225
245
## Future Work
226
246
227
247
Add sample importance, right now we don’t support sample importance, which it’s a widely used [feature](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html#sklearn.ensemble.ExtraTreesClassifier.fit).
0 commit comments