org.apache.spark.ml.feature
Class Word2Vec

Object
  extended by org.apache.spark.ml.PipelineStage
      extended by org.apache.spark.ml.Estimator<Word2VecModel>
          extended by org.apache.spark.ml.feature.Word2Vec
All Implemented Interfaces:
java.io.Serializable, Logging, Params

public final class Word2Vec
extends Estimator<Word2VecModel>

:: Experimental :: Word2Vec trains a model of Map(String, Vector), i.e. transforms a word into a code for further natural language processing or machine learning process.

See Also:
Serialized Form

Constructor Summary
Word2Vec()
           
Word2Vec(String uid)
           
 
Method Summary
 Word2Vec copy(ParamMap extra)
          Creates a copy of this instance with the same UID and some extra params.
 Word2VecModel fit(DataFrame dataset)
          Fits a model to the input data.
 int getMinCount()
           
 int getNumPartitions()
           
 int getVectorSize()
           
 IntParam minCount()
          The minimum number of times a token must appear to be included in the word2vec model's vocabulary.
 IntParam numPartitions()
          Number of partitions for sentences of words.
 Word2Vec setInputCol(String value)
           
 Word2Vec setMaxIter(int value)
           
 Word2Vec setMinCount(int value)
           
 Word2Vec setNumPartitions(int value)
           
 Word2Vec setOutputCol(String value)
           
 Word2Vec setSeed(long value)
           
 Word2Vec setStepSize(double value)
           
 Word2Vec setVectorSize(int value)
           
 StructType transformSchema(StructType schema)
          :: DeveloperApi ::
 String uid()
           
 StructType validateAndTransformSchema(StructType schema)
          Validate and transform the input schema.
 IntParam vectorSize()
          The dimension of the code that you want to transform from words.
 
Methods inherited from class org.apache.spark.ml.Estimator
fit, fit, fit, fit
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, setDefault, shouldOwn, validateParams
 
Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
 

Constructor Detail

Word2Vec

public Word2Vec(String uid)

Word2Vec

public Word2Vec()
Method Detail

uid

public String uid()

setInputCol

public Word2Vec setInputCol(String value)

setOutputCol

public Word2Vec setOutputCol(String value)

setVectorSize

public Word2Vec setVectorSize(int value)

setStepSize

public Word2Vec setStepSize(double value)

setNumPartitions

public Word2Vec setNumPartitions(int value)

setMaxIter

public Word2Vec setMaxIter(int value)

setSeed

public Word2Vec setSeed(long value)

setMinCount

public Word2Vec setMinCount(int value)

fit

public Word2VecModel fit(DataFrame dataset)
Description copied from class: Estimator
Fits a model to the input data.

Specified by:
fit in class Estimator<Word2VecModel>
Parameters:
dataset - (undocumented)
Returns:
(undocumented)

transformSchema

public StructType transformSchema(StructType schema)
Description copied from class: PipelineStage
:: DeveloperApi ::

Derives the output schema from the input schema.

Specified by:
transformSchema in class PipelineStage
Parameters:
schema - (undocumented)
Returns:
(undocumented)

copy

public Word2Vec copy(ParamMap extra)
Description copied from interface: Params
Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly.

Specified by:
copy in interface Params
Specified by:
copy in class Estimator<Word2VecModel>
Parameters:
extra - (undocumented)
Returns:
(undocumented)
See Also:
defaultCopy()

vectorSize

public IntParam vectorSize()
The dimension of the code that you want to transform from words.

Returns:
(undocumented)

getVectorSize

public int getVectorSize()

numPartitions

public IntParam numPartitions()
Number of partitions for sentences of words.

Returns:
(undocumented)

getNumPartitions

public int getNumPartitions()

minCount

public IntParam minCount()
The minimum number of times a token must appear to be included in the word2vec model's vocabulary.

Returns:
(undocumented)

getMinCount

public int getMinCount()

validateAndTransformSchema

public StructType validateAndTransformSchema(StructType schema)
Validate and transform the input schema.

Parameters:
schema - (undocumented)
Returns:
(undocumented)