Word2Vec

class Word2Vec extends Serializable with Logging

Word2Vec creates vector representation of words in a text corpus. The algorithm first constructs a vocabulary from the corpus and then learns vector representation of words in the vocabulary. The vector representation can be used as features in natural language processing and machine learning algorithms.

We used skip-gram model in our implementation and hierarchical softmax method to train the model. The variable names in the implementation matches the original C implementation.

For original C implementation, see https://code.google.com/p/word2vec/ For research papers, see Efficient Estimation of Word Representations in Vector Space and Distributed Representations of Words and Phrases and their Compositionality.

Annotations: @Since( "1.1.0" )
Source: Word2Vec.scala

Linear Supertypes

Logging, Serializable, Serializable, AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

Word2Vec
Logging
Serializable
Serializable
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new Word2Vec()

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def fit[S <: Iterable[String]](dataset: JavaRDD[S]): Word2VecModel
Computes the vector representation of each word in vocabulary (Java version).
Computes the vector representation of each word in vocabulary (Java version).
dataset
a JavaRDD of words
returns
a Word2VecModel

Annotations
@Since( "1.1.0" )
def fit[S <: Iterable[String]](dataset: RDD[S]): Word2VecModel
Computes the vector representation of each word in vocabulary.
Computes the vector representation of each word in vocabulary.
dataset
an RDD of sentences, each sentence is expressed as an iterable collection of words
returns
a Word2VecModel

Annotations
@Since( "1.1.0" )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def setLearningRate(learningRate: Double): Word2Vec.this.type
Sets initial learning rate (default: 0.025).
Sets initial learning rate (default: 0.025).

Annotations
@Since( "1.1.0" )
def setMaxSentenceLength(maxSentenceLength: Int): Word2Vec.this.type
Sets the maximum length (in words) of each sentence in the input data.
Sets the maximum length (in words) of each sentence in the input data. Any sentence longer than this threshold will be divided into chunks of up to maxSentenceLength size (default: 1000)

Annotations
@Since( "2.0.0" )
def setMinCount(minCount: Int): Word2Vec.this.type
Sets minCount, the minimum number of times a token must appear to be included in the word2vec model's vocabulary (default: 5).
Sets minCount, the minimum number of times a token must appear to be included in the word2vec model's vocabulary (default: 5).

Annotations
@Since( "1.3.0" )
def setNumIterations(numIterations: Int): Word2Vec.this.type
Sets number of iterations (default: 1), which should be smaller than or equal to number of partitions.
Sets number of iterations (default: 1), which should be smaller than or equal to number of partitions.

Annotations
@Since( "1.1.0" )
def setNumPartitions(numPartitions: Int): Word2Vec.this.type
Sets number of partitions (default: 1).
Sets number of partitions (default: 1). Use a small number for accuracy.

Annotations
@Since( "1.1.0" )
def setSeed(seed: Long): Word2Vec.this.type
Sets random seed (default: a random long integer).
Sets random seed (default: a random long integer).

Annotations
@Since( "1.1.0" )
def setVectorSize(vectorSize: Int): Word2Vec.this.type
Sets vector size (default: 100).
Sets vector size (default: 100).

Annotations
@Since( "1.1.0" )
def setWindowSize(window: Int): Word2Vec.this.type
Sets the window of words (default: 5)
Sets the window of words (default: 5)

Annotations
@Since( "1.6.0" )
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()

Inherited from Logging

Value Members

def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging

Inherited from AnyRef

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()

Inherited from Any

Value Members

final def asInstanceOf[T0]: T0

Definition Classes
Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any

Ungrouped

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def fit[S <: Iterable[String]](dataset: JavaRDD[S]): Word2VecModel
Computes the vector representation of each word in vocabulary (Java version).
Computes the vector representation of each word in vocabulary (Java version).
dataset
a JavaRDD of words
returns
a Word2VecModel

Annotations
@Since( "1.1.0" )
def fit[S <: Iterable[String]](dataset: RDD[S]): Word2VecModel
Computes the vector representation of each word in vocabulary.
Computes the vector representation of each word in vocabulary.
dataset
an RDD of sentences, each sentence is expressed as an iterable collection of words
returns
a Word2VecModel

Annotations
@Since( "1.1.0" )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def setLearningRate(learningRate: Double): Word2Vec.this.type
Sets initial learning rate (default: 0.025).
Sets initial learning rate (default: 0.025).

Annotations
@Since( "1.1.0" )
def setMaxSentenceLength(maxSentenceLength: Int): Word2Vec.this.type
Sets the maximum length (in words) of each sentence in the input data.
Sets the maximum length (in words) of each sentence in the input data. Any sentence longer than this threshold will be divided into chunks of up to maxSentenceLength size (default: 1000)

Annotations
@Since( "2.0.0" )
def setMinCount(minCount: Int): Word2Vec.this.type
Sets minCount, the minimum number of times a token must appear to be included in the word2vec model's vocabulary (default: 5).
Sets minCount, the minimum number of times a token must appear to be included in the word2vec model's vocabulary (default: 5).

Annotations
@Since( "1.3.0" )
def setNumIterations(numIterations: Int): Word2Vec.this.type
Sets number of iterations (default: 1), which should be smaller than or equal to number of partitions.
Sets number of iterations (default: 1), which should be smaller than or equal to number of partitions.

Annotations
@Since( "1.1.0" )
def setNumPartitions(numPartitions: Int): Word2Vec.this.type
Sets number of partitions (default: 1).
Sets number of partitions (default: 1). Use a small number for accuracy.

Annotations
@Since( "1.1.0" )
def setSeed(seed: Long): Word2Vec.this.type
Sets random seed (default: a random long integer).
Sets random seed (default: a random long integer).

Annotations
@Since( "1.1.0" )
def setVectorSize(vectorSize: Int): Word2Vec.this.type
Sets vector size (default: 100).
Sets vector size (default: 100).

Annotations
@Since( "1.1.0" )
def setWindowSize(window: Int): Word2Vec.this.type
Sets the window of words (default: 5)
Sets the window of words (default: 5)

Annotations
@Since( "1.6.0" )
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()

Packages

Word2Vec 

class Word2Vec extends Serializable with Logging

Instance Constructors

Value Members

Inherited from Logging

Value Members

Inherited from AnyRef

Value Members

Inherited from Any

Value Members

Ungrouped

Word2Vec