org.apache.spark.ml.feature.LSH<MinHashLSHModel>

org.apache.spark.ml.feature.MinHashLSH

All Implemented Interfaces:: Serializable, org.apache.spark.internal.Logging, org.apache.spark.ml.feature.LSHParams, Params, HasInputCol, HasOutputCol, HasSeed, DefaultParamsWritable, Identifiable, MLWritable

public class MinHashLSH extends org.apache.spark.ml.feature.LSH<MinHashLSHModel> implements HasSeed

LSH class for Jaccard distance.

The input can be dense or sparse vectors, but it is more efficient if it is sparse. For example, Vectors.sparse(10, Array((2, 1.0), (3, 1.0), (5, 1.0))) means there are 10 elements in the space. This set contains elements 2, 3, and 5. Also, any input vector must have at least 1 non-zero index, and all non-zero values are treated as binary "1" values.

References: Wikipedia on MinHash

See Also:

Serialized Form

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
Constructor Summary

Constructors

Constructor

Description

MinHashLSH()

MinHashLSH(String uid)
Method Summary

Modifier and Type

Method

Description

MinHashLSH

copy(ParamMap extra)

Creates a copy of this instance with the same UID and some extra params.

static MinHashLSH

load(String path)

static MLReader<T>

read()

final LongParam

seed()

Param for random seed.

MinHashLSH

setInputCol(String value)

MinHashLSH

setNumHashTables(int value)

MinHashLSH

setOutputCol(String value)

MinHashLSH

setSeed(long value)

StructType

transformSchema(StructType schema)

Check transform validity and derive the output schema from the input schema.

String

uid()

An immutable unique ID for the object and its derivatives.

Methods inherited from class org.apache.spark.ml.feature.LSH
fit, getInputCol, getNumHashTables, getOutputCol, inputCol, numHashTables, org$apache$spark$ml$feature$LSHParams$_setter_$numHashTables_$eq, org$apache$spark$ml$param$shared$HasInputCol$_setter_$inputCol_$eq, org$apache$spark$ml$param$shared$HasOutputCol$_setter_$outputCol_$eq, outputCol, save, validateAndTransformSchema, write

Methods inherited from class org.apache.spark.ml.Estimator
fit, fit, fit, fit

Methods inherited from class org.apache.spark.ml.PipelineStage
params

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.ml.param.shared.HasSeed
getSeed

Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString

Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, MDC, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext

Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, estimateMatadataSize, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn

Constructor Details
- MinHashLSH
  
  public MinHashLSH(String uid)
- MinHashLSH
  
  public MinHashLSH()
Method Details
- load
  
  public static MinHashLSH load(String path)
- read
  
  public static MLReader<T> read()
- seed
  
  public final LongParam seed()
  
  Description copied from interface: HasSeed
  
  Param for random seed.
  
  Specified by:
  
  seed in interface HasSeed
  
  Returns:
  
  (undocumented)
- uid
  
  public String uid()
  
  Description copied from interface: Identifiable
  
  An immutable unique ID for the object and its derivatives.
  
  Specified by:
  
  uid in interface Identifiable
  
  Returns:
  
  (undocumented)
- setInputCol
  
  public MinHashLSH setInputCol(String value)
  
  Overrides:
  
  setInputCol in class org.apache.spark.ml.feature.LSH<MinHashLSHModel>
- setOutputCol
  
  public MinHashLSH setOutputCol(String value)
  
  Overrides:
  
  setOutputCol in class org.apache.spark.ml.feature.LSH<MinHashLSHModel>
- setNumHashTables
  
  public MinHashLSH setNumHashTables(int value)
  
  Overrides:
  
  setNumHashTables in class org.apache.spark.ml.feature.LSH<MinHashLSHModel>
- setSeed
  
  public MinHashLSH setSeed(long value)
- transformSchema
  
  public StructType transformSchema(StructType schema)
  
  Description copied from class: PipelineStage
  
  Check transform validity and derive the output schema from the input schema.
  We check validity for interactions between parameters during transformSchema and raise an exception if any parameter value is invalid. Parameter value checks which do not depend on other parameters are handled by Param.validate().
  Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.
  
  Specified by:
  
  transformSchema in class PipelineStage
  
  Parameters:
  
  schema - (undocumented)
  
  Returns:
  
  (undocumented)
- copy
  
  public MinHashLSH copy(ParamMap extra)
  
  Description copied from interface: Params
  
  Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
  
  Specified by:
  
  copy in interface Params
  
  Specified by:
  
  copy in class Estimator<MinHashLSHModel>
  
  Parameters:
  
  extra - (undocumented)
  
  Returns:
  
  (undocumented)

Class MinHashLSH

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.ml.feature.LSH

Methods inherited from class org.apache.spark.ml.Estimator

Methods inherited from class org.apache.spark.ml.PipelineStage

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.ml.param.shared.HasSeed

Methods inherited from interface org.apache.spark.ml.util.Identifiable

Methods inherited from interface org.apache.spark.internal.Logging

Methods inherited from interface org.apache.spark.ml.param.Params

Constructor Details

MinHashLSH

MinHashLSH

Method Details

load

read

seed

uid

setInputCol

setOutputCol

setNumHashTables

setSeed

transformSchema

copy