public class RegexTokenizer extends UnaryTransformer<String,scala.collection.Seq<String>,RegexTokenizer> implements DefaultParamsWritable
gaps is false).
 Optional parameters also allow filtering tokens using a minimal length.
 It returns an array of strings that can be empty.| Constructor and Description | 
|---|
| RegexTokenizer() | 
| RegexTokenizer(String uid) | 
| Modifier and Type | Method and Description | 
|---|---|
| RegexTokenizer | copy(ParamMap extra)Creates a copy of this instance with the same UID and some extra params. | 
| BooleanParam | gaps()Indicates whether regex splits on gaps (true) or matches tokens (false). | 
| boolean | getGaps() | 
| int | getMinTokenLength() | 
| String | getPattern() | 
| boolean | getToLowercase() | 
| static RegexTokenizer | load(String path) | 
| IntParam | minTokenLength()Minimum token length, greater than or equal to 0. | 
| Param<String> | pattern()Regex pattern used to match delimiters if  gapsis true or tokens ifgapsis false. | 
| static MLReader<T> | read() | 
| RegexTokenizer | setGaps(boolean value) | 
| RegexTokenizer | setMinTokenLength(int value) | 
| RegexTokenizer | setPattern(String value) | 
| RegexTokenizer | setToLowercase(boolean value) | 
| BooleanParam | toLowercase()Indicates whether to convert all characters to lowercase before tokenizing. | 
| String | toString() | 
| String | uid()An immutable unique ID for the object and its derivatives. | 
inputCol, outputCol, setInputCol, setOutputCol, transform, transformSchematransform, transform, transformparamswritesavegetInputColgetOutputColclear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitializepublic RegexTokenizer(String uid)
public RegexTokenizer()
public static RegexTokenizer load(String path)
public static MLReader<T> read()
public String uid()
Identifiableuid in interface Identifiablepublic IntParam minTokenLength()
public RegexTokenizer setMinTokenLength(int value)
public int getMinTokenLength()
public BooleanParam gaps()
public RegexTokenizer setGaps(boolean value)
public boolean getGaps()
public Param<String> pattern()
gaps is true or tokens if gaps is false.
 Default: "\\s+"public RegexTokenizer setPattern(String value)
public String getPattern()
public final BooleanParam toLowercase()
public RegexTokenizer setToLowercase(boolean value)
public boolean getToLowercase()
public RegexTokenizer copy(ParamMap extra)
ParamsdefaultCopy().copy in interface Paramscopy in class UnaryTransformer<String,scala.collection.Seq<String>,RegexTokenizer>extra - (undocumented)public String toString()
toString in interface IdentifiabletoString in class Object