Skip to content

Latest commit

 

History

History
203 lines (150 loc) · 5.17 KB

Add-ProduceHashedWordBagsTransform.md

File metadata and controls

203 lines (150 loc) · 5.17 KB

Add-ProduceHashedWordBagsTransform

Transform text column into a bag of hashed ngram counts.

Description

Transform text column into a bag of hashed ngram counts.

Syntax

Add-ProduceHashedWordBagsTransform [-OutputColumn] <String> [[-InputColumn] <String>] [-Bits <Int32>] [-NgramLength <Int32>] [-SkipLength <Int32>] [-Seed <UInt32>] [-MaxInverts <Int32>] [-DontUseAllLengths] [-DontUseOrderedHashing] [-AppendTo <EstimatorChain<ITransformer>>] [-AppendScope <TransformerScope>] [-Context <MLContext>] [<CommonParameters>]
Add-ProduceHashedWordBagsTransform [-OutputColumn] <String> -InputColumns <String[]> [-Bits <Int32>] [-NgramLength <Int32>] [-SkipLength <Int32>] [-Seed <UInt32>] [-MaxInverts <Int32>] [-DontUseAllLengths] [-DontUseOrderedHashing] [-AppendTo <EstimatorChain<ITransformer>>] [-AppendScope <TransformerScope>] [-Context <MLContext>] [<CommonParameters>]

Parameters

-OutputColumn

Name of the column resulting from the transformation of inputColumnNames. This column's data type will be known-size vector of Single.

Type: System.String
Required: True
Position: 0
Default value: None
Accept pipeline input: False
Accept wildcard characters: False

-InputColumn

Name of the column to take the data from. This estimator operates over vector of text.

Type: System.String
Required: False
Position: 1
Default value: null
Accept pipeline input: False
Accept wildcard characters: False

-InputColumns

Names of the multiple columns to take the data from. This estimator operates over vector of text.

Type: System.String[]
Required: True
Position: named
Default value: None
Accept pipeline input: False
Accept wildcard characters: False

-Bits

Number of bits to hash into. Must be between 1 and 30, inclusive.

Type: System.Int32
Required: False
Position: named
Default value: 16
Accept pipeline input: False
Accept wildcard characters: False

-NgramLength

Ngram length.

Type: System.Int32
Required: False
Position: named
Default value: 1
Accept pipeline input: False
Accept wildcard characters: False

-SkipLength

Maximum number of tokens to skip when constructing an n-gram.

Type: System.Int32
Required: False
Position: named
Default value: 0
Accept pipeline input: False
Accept wildcard characters: False

-Seed

Hashing seed.

Type: System.UInt32
Required: False
Position: named
Default value: 314489979
Accept pipeline input: False
Accept wildcard characters: False

-MaxInverts

During hashing we construct mappings between original values and the produced hash values. Text representation of original values are stored in the slot names of the annotations for the new column. Hashing, as such, can map many initial values to one. maximumNumberOfInverts specifies the upper bound of the number of distinct input values mapping to a hash that should be retained. 0 does not retain any input values. -1 retains all input values mapping to each hash.

Type: System.Int32
Required: False
Position: named
Default value: 0
Accept pipeline input: False
Accept wildcard characters: False

-DontUseAllLengths

Whether to include all n-gram lengths up to ngramLength or only ngramLength.

Type: System.Management.Automation.SwitchParameter
Required: False
Position: named
Default value: False
Accept pipeline input: False
Accept wildcard characters: False

-DontUseOrderedHashing

Whether the position of each source column should be included in the hash (when there are multiple source columns).

Type: System.Management.Automation.SwitchParameter
Required: False
Position: named
Default value: False
Accept pipeline input: False
Accept wildcard characters: False

-AppendTo

Append the created estimator to the end of this chain.

Type: Microsoft.ML.Data.EstimatorChain<Microsoft.ML.ITransformer>
Required: False
Position: named
Default value: null
Accept pipeline input: True (ByValue)
Accept wildcard characters: False

-AppendScope

The scope allows for 'tagging' the estimators (and subsequently transformers) in the chain to be used 'only for training', 'for training and evaluation' etc.

Type: Microsoft.ML.Data.TransformerScope
Required: False
Position: named
Default value: Everything
Accept pipeline input: False
Accept wildcard characters: False

-Context

The context on which to perform the action. If omitted, the current (cached) context will be used.

Type: Microsoft.ML.MLContext
Required: False
Position: named
Default value: Current context
Accept pipeline input: False
Accept wildcard characters: False

Common parameters

This cmdlet supports the common parameters: Verbose, Debug, ErrorAction, ErrorVariable, WarningAction, WarningVariable, OutBuffer, PipelineVariable, and OutVariable. For more information, see about_CommonParameters.

Inputs

Type Description
Microsoft.ML.Data.EstimatorChain<Microsoft.ML.ITransformer> You can pipe the EstimatorChain to append to this cmdlet.

Outputs

Type Description
Microsoft.ML.Data.EstimatorChain<Microsoft.ML.ITransformer> This cmdlet returns the appended EstimatorChain.