class Word2Vec extends Serializable with Logging
Word2Vec creates vector representation of words in a text corpus. The algorithm first constructs a vocabulary from the corpus and then learns vector representation of words in the vocabulary. The vector representation can be used as features in natural language processing and machine learning algorithms.
We used skip-gram model in our implementation and hierarchical softmax method to train the model. The variable names in the implementation matches the original C implementation.
For original C implementation, see https://code.google.com/p/word2vec/ For research papers, see Efficient Estimation of Word Representations in Vector Space and Distributed Representations of Words and Phrases and their Compositionality.
- Annotations
- @Since( "1.1.0" )
- Source
- Word2Vec.scala
- Alphabetic
- By Inheritance
- Word2Vec
- Logging
- Serializable
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new Word2Vec()
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
fit[S <: Iterable[String]](dataset: JavaRDD[S]): Word2VecModel
Computes the vector representation of each word in vocabulary (Java version).
Computes the vector representation of each word in vocabulary (Java version).
- dataset
a JavaRDD of words
- returns
a Word2VecModel
- Annotations
- @Since( "1.1.0" )
-
def
fit[S <: Iterable[String]](dataset: RDD[S]): Word2VecModel
Computes the vector representation of each word in vocabulary.
Computes the vector representation of each word in vocabulary.
- dataset
an RDD of sentences, each sentence is expressed as an iterable collection of words
- returns
a Word2VecModel
- Annotations
- @Since( "1.1.0" )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
setLearningRate(learningRate: Double): Word2Vec.this.type
Sets initial learning rate (default: 0.025).
Sets initial learning rate (default: 0.025).
- Annotations
- @Since( "1.1.0" )
-
def
setMaxSentenceLength(maxSentenceLength: Int): Word2Vec.this.type
Sets the maximum length (in words) of each sentence in the input data.
Sets the maximum length (in words) of each sentence in the input data. Any sentence longer than this threshold will be divided into chunks of up to
maxSentenceLength
size (default: 1000)- Annotations
- @Since( "2.0.0" )
-
def
setMinCount(minCount: Int): Word2Vec.this.type
Sets minCount, the minimum number of times a token must appear to be included in the word2vec model's vocabulary (default: 5).
Sets minCount, the minimum number of times a token must appear to be included in the word2vec model's vocabulary (default: 5).
- Annotations
- @Since( "1.3.0" )
-
def
setNumIterations(numIterations: Int): Word2Vec.this.type
Sets number of iterations (default: 1), which should be smaller than or equal to number of partitions.
Sets number of iterations (default: 1), which should be smaller than or equal to number of partitions.
- Annotations
- @Since( "1.1.0" )
-
def
setNumPartitions(numPartitions: Int): Word2Vec.this.type
Sets number of partitions (default: 1).
Sets number of partitions (default: 1). Use a small number for accuracy.
- Annotations
- @Since( "1.1.0" )
-
def
setSeed(seed: Long): Word2Vec.this.type
Sets random seed (default: a random long integer).
Sets random seed (default: a random long integer).
- Annotations
- @Since( "1.1.0" )
-
def
setVectorSize(vectorSize: Int): Word2Vec.this.type
Sets vector size (default: 100).
Sets vector size (default: 100).
- Annotations
- @Since( "1.1.0" )
-
def
setWindowSize(window: Int): Word2Vec.this.type
Sets the window of words (default: 5)
Sets the window of words (default: 5)
- Annotations
- @Since( "1.6.0" )
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()