class Word2Vec extends Serializable with Logging
Word2Vec creates vector representation of words in a text corpus. The algorithm first constructs a vocabulary from the corpus and then learns vector representation of words in the vocabulary. The vector representation can be used as features in natural language processing and machine learning algorithms.
We used skip-gram model in our implementation and hierarchical softmax method to train the model. The variable names in the implementation matches the original C implementation.
For original C implementation, see https://code.google.com/p/word2vec/ For research papers, see Efficient Estimation of Word Representations in Vector Space and Distributed Representations of Words and Phrases and their Compositionality.
- Annotations
- @Since("1.1.0")
- Source
- Word2Vec.scala
- Alphabetic
- By Inheritance
- Word2Vec
- Logging
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
-  new Word2Vec()
Type Members
-   implicit  class LogStringContext extends AnyRef- Definition Classes
- Logging
 
Value Members
-   final  def !=(arg0: Any): Boolean- Definition Classes
- AnyRef → Any
 
-   final  def ##: Int- Definition Classes
- AnyRef → Any
 
-   final  def ==(arg0: Any): Boolean- Definition Classes
- AnyRef → Any
 
-    def MDC(key: LogKey, value: Any): MDC- Attributes
- protected
- Definition Classes
- Logging
 
-   final  def asInstanceOf[T0]: T0- Definition Classes
- Any
 
-    def clone(): AnyRef- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
 
-   final  def eq(arg0: AnyRef): Boolean- Definition Classes
- AnyRef
 
-    def equals(arg0: AnyRef): Boolean- Definition Classes
- AnyRef → Any
 
-    def fit[S <: Iterable[String]](dataset: JavaRDD[S]): Word2VecModelComputes the vector representation of each word in vocabulary (Java version). Computes the vector representation of each word in vocabulary (Java version). - dataset
- a JavaRDD of words 
- returns
- a Word2VecModel 
 - Annotations
- @Since("1.1.0")
 
-    def fit[S <: Iterable[String]](dataset: RDD[S]): Word2VecModelComputes the vector representation of each word in vocabulary. Computes the vector representation of each word in vocabulary. - dataset
- an RDD of sentences, each sentence is expressed as an iterable collection of words 
- returns
- a Word2VecModel 
 - Annotations
- @Since("1.1.0")
 
-   final  def getClass(): Class[_ <: AnyRef]- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
 
-    def hashCode(): Int- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
 
-    def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean- Attributes
- protected
- Definition Classes
- Logging
 
-    def initializeLogIfNecessary(isInterpreter: Boolean): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-   final  def isInstanceOf[T0]: Boolean- Definition Classes
- Any
 
-    def isTraceEnabled(): Boolean- Attributes
- protected
- Definition Classes
- Logging
 
-    def log: Logger- Attributes
- protected
- Definition Classes
- Logging
 
-    def logBasedOnLevel(level: Level)(f: => MessageWithContext): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logDebug(msg: => String, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logDebug(entry: LogEntry, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logDebug(entry: LogEntry): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logDebug(msg: => String): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logError(msg: => String, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logError(entry: LogEntry, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logError(entry: LogEntry): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logError(msg: => String): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logInfo(msg: => String, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logInfo(entry: LogEntry, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logInfo(entry: LogEntry): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logInfo(msg: => String): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logName: String- Attributes
- protected
- Definition Classes
- Logging
 
-    def logTrace(msg: => String, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logTrace(entry: LogEntry, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logTrace(entry: LogEntry): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logTrace(msg: => String): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logWarning(msg: => String, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logWarning(entry: LogEntry, throwable: Throwable): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logWarning(entry: LogEntry): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-    def logWarning(msg: => String): Unit- Attributes
- protected
- Definition Classes
- Logging
 
-   final  def ne(arg0: AnyRef): Boolean- Definition Classes
- AnyRef
 
-   final  def notify(): Unit- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
 
-   final  def notifyAll(): Unit- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
 
-    def setLearningRate(learningRate: Double): Word2Vec.this.typeSets initial learning rate (default: 0.025). Sets initial learning rate (default: 0.025). - Annotations
- @Since("1.1.0")
 
-    def setMaxSentenceLength(maxSentenceLength: Int): Word2Vec.this.typeSets the maximum length (in words) of each sentence in the input data. Sets the maximum length (in words) of each sentence in the input data. Any sentence longer than this threshold will be divided into chunks of up to maxSentenceLengthsize (default: 1000)- Annotations
- @Since("2.0.0")
 
-    def setMinCount(minCount: Int): Word2Vec.this.typeSets minCount, the minimum number of times a token must appear to be included in the word2vec model's vocabulary (default: 5). Sets minCount, the minimum number of times a token must appear to be included in the word2vec model's vocabulary (default: 5). - Annotations
- @Since("1.3.0")
 
-    def setNumIterations(numIterations: Int): Word2Vec.this.typeSets number of iterations (default: 1), which should be smaller than or equal to number of partitions. Sets number of iterations (default: 1), which should be smaller than or equal to number of partitions. - Annotations
- @Since("1.1.0")
 
-    def setNumPartitions(numPartitions: Int): Word2Vec.this.typeSets number of partitions (default: 1). Sets number of partitions (default: 1). Use a small number for accuracy. - Annotations
- @Since("1.1.0")
 
-    def setSeed(seed: Long): Word2Vec.this.typeSets random seed (default: a random long integer). Sets random seed (default: a random long integer). - Annotations
- @Since("1.1.0")
 
-    def setVectorSize(vectorSize: Int): Word2Vec.this.typeSets vector size (default: 100). Sets vector size (default: 100). - Annotations
- @Since("1.1.0")
 
-    def setWindowSize(window: Int): Word2Vec.this.typeSets the window of words (default: 5) Sets the window of words (default: 5) - Annotations
- @Since("1.6.0")
 
-   final  def synchronized[T0](arg0: => T0): T0- Definition Classes
- AnyRef
 
-    def toString(): String- Definition Classes
- AnyRef → Any
 
-   final  def wait(arg0: Long, arg1: Int): Unit- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
 
-   final  def wait(arg0: Long): Unit- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
 
-   final  def wait(): Unit- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
 
-    def withLogContext(context: Map[String, String])(body: => Unit): Unit- Attributes
- protected
- Definition Classes
- Logging
 
Deprecated Value Members
-    def finalize(): Unit- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
- (Since version 9)