Summarizer

object Summarizer extends Logging

Tools for vectorized statistics on MLlib Vectors.

The methods in this package provide various statistics for Vectors contained inside DataFrames.

This class lets users pick the statistics they would like to extract for a given column. Here is an example in Scala:

import org.apache.spark.ml.linalg._
import org.apache.spark.sql.Row
val dataframe = ... // Some dataframe containing a feature column and a weight column
val multiStatsDF = dataframe.select(
    Summarizer.metrics("min", "max", "count").summary($"features", $"weight")
val Row(minVec, maxVec, count) = multiStatsDF.first()

If one wants to get a single metric, shortcuts are also available:

val meanDF = dataframe.select(Summarizer.mean($"features"))
val Row(meanVec) = meanDF.first()

Note: Currently, the performance of this interface is about 2x~3x slower than using the RDD interface.

Annotations: @Since( "2.3.0" )
Source: Summarizer.scala

Linear Supertypes

Logging, AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

Summarizer
Logging
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
def count(col: Column): Column

Annotations
@Since( "2.3.0" )
def count(col: Column, weightCol: Column): Column

Annotations
@Since( "2.3.0" )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def max(col: Column): Column

Annotations
@Since( "2.3.0" )
def max(col: Column, weightCol: Column): Column

Annotations
@Since( "2.3.0" )
def mean(col: Column): Column

Annotations
@Since( "2.3.0" )
def mean(col: Column, weightCol: Column): Column

Annotations
@Since( "2.3.0" )
def metrics(metrics: String*): SummaryBuilder
Given a list of metrics, provides a builder that it turns computes metrics from a column.
Given a list of metrics, provides a builder that it turns computes metrics from a column.
See the documentation of Summarizer for an example.
The following metrics are accepted (case sensitive):
- mean: a vector that contains the coefficient-wise mean.
- sum: a vector that contains the coefficient-wise sum.
- variance: a vector that contains the coefficient-wise variance.
- std: a vector that contains the coefficient-wise standard deviation.
- count: the count of all vectors seen.
- numNonzeros: a vector with the number of non-zeros for each coefficients
- max: the maximum for each coefficient.
- min: the minimum for each coefficient.
- normL2: the Euclidean norm for each coefficient.
- normL1: the L1 norm of each coefficient (sum of the absolute values).
metrics
metrics that can be provided.
returns
a builder.

Annotations
@Since( "2.3.0" ) @varargs()
Exceptions thrown
IllegalArgumentException if one of the metric names is not understood. Note: Currently, the performance of this interface is about 2x~3x slower than using the RDD interface.
def min(col: Column): Column

Annotations
@Since( "2.3.0" )
def min(col: Column, weightCol: Column): Column

Annotations
@Since( "2.3.0" )
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def normL1(col: Column): Column

Annotations
@Since( "2.3.0" )
def normL1(col: Column, weightCol: Column): Column

Annotations
@Since( "2.3.0" )
def normL2(col: Column): Column

Annotations
@Since( "2.3.0" )
def normL2(col: Column, weightCol: Column): Column

Annotations
@Since( "2.3.0" )
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def numNonZeros(col: Column): Column

Annotations
@Since( "2.3.0" )
def numNonZeros(col: Column, weightCol: Column): Column

Annotations
@Since( "2.3.0" )
def std(col: Column): Column

Annotations
@Since( "3.0.0" )
def std(col: Column, weightCol: Column): Column

Annotations
@Since( "3.0.0" )
def sum(col: Column): Column

Annotations
@Since( "3.0.0" )
def sum(col: Column, weightCol: Column): Column

Annotations
@Since( "3.0.0" )
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
def variance(col: Column): Column

Annotations
@Since( "2.3.0" )
def variance(col: Column, weightCol: Column): Column

Annotations
@Since( "2.3.0" )
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()

Packages

Summarizer

object Summarizer extends Logging

Value Members

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Members

Packages

Summarizer 

object Summarizer extends Logging

Value Members

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Members

Summarizer