Encode categorical features as a one-hot numeric array.
The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are encoded using a one-hot (aka ‘one-of-K’ or ‘dummy’) encoding scheme. This creates a binary column for each category and returns a sparse matrix or dense array (depending on the sparse\_output
parameter)
By default, the encoder derives the categories based on the unique values in each feature. Alternatively, you can also specify the categories
manually.
This encoding is needed for feeding categorical data to many scikit-learn estimators, notably linear models and SVMs with the standard kernels.
Note: a one-hot encoding of y labels should use a LabelBinarizer instead.
Read more in the User Guide. For a comparison of different encoders, refer to: Comparing Target Encoder with Other Encoders.
new OneHotEncoder(opts?: object): OneHotEncoder;
Name | Type | Description |
---|---|---|
opts? |
object |
- |
opts.categories? |
"auto" |
Categories (unique values) per feature: Default Value 'auto' |
opts.drop? |
any [] | "first" | "if_binary" |
Specifies a methodology to use to drop one of the categories per feature. This is useful in situations where perfectly collinear features cause problems, such as when feeding the resulting data into an unregularized linear regression model. However, dropping one category breaks the symmetry of the original representation and can therefore induce a bias in downstream models, for instance for penalized linear classification or regression models. |
opts.dtype? |
any |
Desired dtype of output. |
opts.feature_name_combiner? |
"concat" |
Callable with signature def callable(input\_feature, category) that returns a string. This is used to create feature names to be returned by get\_feature\_names\_out . "concat" concatenates encoded feature name and category with feature + "\_" + str(category) .E.g. feature X with values 1, 6, 7 create feature names X\_1, X\_6, X\_7 . Default Value 'concat' |
opts.handle_unknown? |
"ignore" | "error" | "infrequent_if_exist" |
Specifies the way unknown categories are handled during transform . Default Value 'error' |
opts.max_categories? |
number |
Specifies an upper limit to the number of output features for each input feature when considering infrequent categories. If there are infrequent categories, max\_categories includes the category representing the infrequent categories along with the frequent categories. If undefined , there is no limit to the number of output features. |
opts.min_frequency? |
number |
Specifies the minimum frequency below which a category will be considered infrequent. |
opts.sparse? |
boolean |
Will return sparse matrix if set true else will return an array. Default Value true |
opts.sparse_output? |
boolean |
Will return sparse matrix if set true else will return an array. Default Value true |
Defined in: generated/preprocessing/OneHotEncoder.ts:31
Disposes of the underlying Python resources.
Once dispose()
is called, the instance is no longer usable.
dispose(): Promise<void>;
Promise
<void
>
Defined in: generated/preprocessing/OneHotEncoder.ts:162
Fit OneHotEncoder to X.
fit(opts: object): Promise<any>;
Name | Type | Description |
---|---|---|
opts |
object |
- |
opts.X? |
ArrayLike [] |
The data to determine the categories of each feature. |
opts.y? |
any |
Ignored. This parameter exists only for compatibility with Pipeline . |
Promise
<any
>
Defined in: generated/preprocessing/OneHotEncoder.ts:179
Fit to data, then transform it.
Fits transformer to X
and y
with optional parameters fit\_params
and returns a transformed version of X
.
fit_transform(opts: object): Promise<any[]>;
Name | Type | Description |
---|---|---|
opts |
object |
- |
opts.X? |
ArrayLike [] |
Input samples. |
opts.fit_params? |
any |
Additional fit parameters. |
opts.y? |
ArrayLike |
Target values (undefined for unsupervised transformations). |
Promise
<any
[]>
Defined in: generated/preprocessing/OneHotEncoder.ts:219
Get output feature names for transformation.
get_feature_names_out(opts: object): Promise<any>;
Name | Type | Description |
---|---|---|
opts |
object |
- |
opts.input_features? |
any |
Input features. |
Promise
<any
>
Defined in: generated/preprocessing/OneHotEncoder.ts:266
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
get_metadata_routing(opts: object): Promise<any>;
Name | Type | Description |
---|---|---|
opts |
object |
- |
opts.routing? |
any |
A MetadataRequest encapsulating routing information. |
Promise
<any
>
Defined in: generated/preprocessing/OneHotEncoder.ts:304
Initializes the underlying Python resources.
This instance is not usable until the Promise
returned by init()
resolves.
init(py: PythonBridge): Promise<void>;
Name | Type |
---|---|
py |
PythonBridge |
Promise
<void
>
Defined in: generated/preprocessing/OneHotEncoder.ts:108
Convert the data back to the original representation.
When unknown categories are encountered (all zeros in the one-hot encoding), undefined
is used to represent this category. If the feature with the unknown category has a dropped category, the dropped category will be its inverse.
For a given input feature, if there is an infrequent category, ‘infrequent_sklearn’ will be used to represent the infrequent category.
inverse_transform(opts: object): Promise<ArrayLike[]>;
Name | Type | Description |
---|---|---|
opts |
object |
- |
opts.X? |
ArrayLike |
The transformed data. |
Promise
<ArrayLike
[]>
Defined in: generated/preprocessing/OneHotEncoder.ts:343
Set output container.
See Introducing the set_output API for an example on how to use the API.
set_output(opts: object): Promise<any>;
Name | Type | Description |
---|---|---|
opts |
object |
- |
opts.transform? |
"default" | "pandas" |
Configure output of transform and fit\_transform . |
Promise
<any
>
Defined in: generated/preprocessing/OneHotEncoder.ts:380
Transform X using one-hot encoding.
If there are infrequent categories for a feature, the infrequent categories will be grouped into a single category.
transform(opts: object): Promise<ArrayLike>;
Name | Type | Description |
---|---|---|
opts |
object |
- |
opts.X? |
ArrayLike [] |
The data to encode. |
Promise
<ArrayLike
>
Defined in: generated/preprocessing/OneHotEncoder.ts:415
boolean
=false
Defined in: generated/preprocessing/OneHotEncoder.ts:29
boolean
=false
Defined in: generated/preprocessing/OneHotEncoder.ts:28
PythonBridge
Defined in: generated/preprocessing/OneHotEncoder.ts:27
string
Defined in: generated/preprocessing/OneHotEncoder.ts:24
any
Defined in: generated/preprocessing/OneHotEncoder.ts:25
The categories of each feature determined during fitting (in order of the features in X and corresponding with the output of transform
). This includes the category specified in drop
(if any).
categories_(): Promise<any>;
Promise
<any
>
Defined in: generated/preprocessing/OneHotEncoder.ts:448
drop\_idx\_\[i\]
is the index in categories\_\[i\]
of the category to be dropped for each feature.
drop_idx_(): Promise<any[]>;
Promise
<any
[]>
Defined in: generated/preprocessing/OneHotEncoder.ts:473
Callable with signature def callable(input\_feature, category)
that returns a string. This is used to create feature names to be returned by get\_feature\_names\_out
.
feature_name_combiner(): Promise<any>;
Promise
<any
>
Defined in: generated/preprocessing/OneHotEncoder.ts:548
Names of features seen during fit. Defined only when X
has feature names that are all strings.
feature_names_in_(): Promise<ArrayLike>;
Promise
<ArrayLike
>
Defined in: generated/preprocessing/OneHotEncoder.ts:523
Number of features seen during fit.
n_features_in_(): Promise<number>;
Promise
<number
>
Defined in: generated/preprocessing/OneHotEncoder.ts:498
py(): PythonBridge;
PythonBridge
Defined in: generated/preprocessing/OneHotEncoder.ts:95
py(pythonBridge: PythonBridge): void;
Name | Type |
---|---|
pythonBridge |
PythonBridge |
void
Defined in: generated/preprocessing/OneHotEncoder.ts:99