Transform text column into a bag of hashed ngram counts.
Transform text column into a bag of hashed ngram counts.
Add-ProduceHashedWordBagsTransform [-OutputColumn] <String> [[-InputColumn] <String>] [-Bits <Int32>] [-NgramLength <Int32>] [-SkipLength <Int32>] [-Seed <UInt32>] [-MaxInverts <Int32>] [-DontUseAllLengths] [-DontUseOrderedHashing] [-AppendTo <EstimatorChain<ITransformer>>] [-AppendScope <TransformerScope>] [-Context <MLContext>] [<CommonParameters>]
Add-ProduceHashedWordBagsTransform [-OutputColumn] <String> -InputColumns <String[]> [-Bits <Int32>] [-NgramLength <Int32>] [-SkipLength <Int32>] [-Seed <UInt32>] [-MaxInverts <Int32>] [-DontUseAllLengths] [-DontUseOrderedHashing] [-AppendTo <EstimatorChain<ITransformer>>] [-AppendScope <TransformerScope>] [-Context <MLContext>] [<CommonParameters>]
Name of the column resulting from the transformation of inputColumnNames. This column's data type will be known-size vector of Single.
Type: System.String
Required: True
Position: 0
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
Name of the column to take the data from. This estimator operates over vector of text.
Type: System.String
Required: False
Position: 1
Default value: null
Accept pipeline input: False
Accept wildcard characters: False
Names of the multiple columns to take the data from. This estimator operates over vector of text.
Type: System.String[]
Required: True
Position: named
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
Number of bits to hash into. Must be between 1 and 30, inclusive.
Type: System.Int32
Required: False
Position: named
Default value: 16
Accept pipeline input: False
Accept wildcard characters: False
Ngram length.
Type: System.Int32
Required: False
Position: named
Default value: 1
Accept pipeline input: False
Accept wildcard characters: False
Maximum number of tokens to skip when constructing an n-gram.
Type: System.Int32
Required: False
Position: named
Default value: 0
Accept pipeline input: False
Accept wildcard characters: False
Hashing seed.
Type: System.UInt32
Required: False
Position: named
Default value: 314489979
Accept pipeline input: False
Accept wildcard characters: False
During hashing we construct mappings between original values and the produced hash values. Text representation of original values are stored in the slot names of the annotations for the new column. Hashing, as such, can map many initial values to one. maximumNumberOfInverts specifies the upper bound of the number of distinct input values mapping to a hash that should be retained. 0 does not retain any input values. -1 retains all input values mapping to each hash.
Type: System.Int32
Required: False
Position: named
Default value: 0
Accept pipeline input: False
Accept wildcard characters: False
Whether to include all n-gram lengths up to ngramLength or only ngramLength.
Type: System.Management.Automation.SwitchParameter
Required: False
Position: named
Default value: False
Accept pipeline input: False
Accept wildcard characters: False
Whether the position of each source column should be included in the hash (when there are multiple source columns).
Type: System.Management.Automation.SwitchParameter
Required: False
Position: named
Default value: False
Accept pipeline input: False
Accept wildcard characters: False
Append the created estimator to the end of this chain.
Type: Microsoft.ML.Data.EstimatorChain<Microsoft.ML.ITransformer>
Required: False
Position: named
Default value: null
Accept pipeline input: True (ByValue)
Accept wildcard characters: False
The scope allows for 'tagging' the estimators (and subsequently transformers) in the chain to be used 'only for training', 'for training and evaluation' etc.
Type: Microsoft.ML.Data.TransformerScope
Required: False
Position: named
Default value: Everything
Accept pipeline input: False
Accept wildcard characters: False
The context on which to perform the action. If omitted, the current (cached) context will be used.
Type: Microsoft.ML.MLContext
Required: False
Position: named
Default value: Current context
Accept pipeline input: False
Accept wildcard characters: False
This cmdlet supports the common parameters: Verbose, Debug, ErrorAction, ErrorVariable, WarningAction, WarningVariable, OutBuffer, PipelineVariable, and OutVariable. For more information, see about_CommonParameters.
Type | Description |
---|---|
Microsoft.ML.Data.EstimatorChain<Microsoft.ML.ITransformer> | You can pipe the EstimatorChain to append to this cmdlet. |
Type | Description |
---|---|
Microsoft.ML.Data.EstimatorChain<Microsoft.ML.ITransformer> | This cmdlet returns the appended EstimatorChain. |