public interface CountVectorizerParams extends Params, HasInputCol, HasOutputCol
CountVectorizer
and CountVectorizerModel
.Modifier and Type | Method and Description |
---|---|
BooleanParam |
binary()
Binary toggle to control the output vector values.
|
boolean |
getBinary() |
double |
getMaxDF() |
double |
getMinDF() |
double |
getMinTF() |
int |
getVocabSize() |
DoubleParam |
maxDF()
Specifies the maximum number of different documents a term could appear in to be included
in the vocabulary.
|
DoubleParam |
minDF()
Specifies the minimum number of different documents a term must appear in to be included
in the vocabulary.
|
DoubleParam |
minTF()
Filter to ignore rare words in a document.
|
StructType |
validateAndTransformSchema(StructType schema)
Validates and transforms the input schema.
|
IntParam |
vocabSize()
Max size of the vocabulary.
|
getInputCol, inputCol
getOutputCol, outputCol
clear, copy, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
toString, uid
BooleanParam binary()
boolean getBinary()
double getMaxDF()
double getMinDF()
double getMinTF()
int getVocabSize()
DoubleParam maxDF()
Default: (2^63^) - 1
DoubleParam minDF()
Default: 1.0
DoubleParam minTF()
Note that the parameter is only used in transform of CountVectorizerModel
and does not
affect fitting.
Default: 1.0
StructType validateAndTransformSchema(StructType schema)
IntParam vocabSize()
Default: 2^18^