org.apache.spark.sql

DataFrameStatFunctions

final class DataFrameStatFunctions extends AnyRef

:: Experimental :: Statistic functions for DataFrames.

Annotations
@Experimental()
Since

1.4.0

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. DataFrameStatFunctions
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def corr(col1: String, col2: String): Double

    Calculates the Pearson Correlation Coefficient of two columns of a DataFrame.

    Calculates the Pearson Correlation Coefficient of two columns of a DataFrame.

    col1

    the name of the column

    col2

    the name of the column to calculate the correlation against

    returns

    The Pearson Correlation Coefficient as a Double.

    Since

    1.4.0

  9. def corr(col1: String, col2: String, method: String): Double

    Calculates the correlation of two columns of a DataFrame.

    Calculates the correlation of two columns of a DataFrame. Currently only supports the Pearson Correlation Coefficient. For Spearman Correlation, consider using RDD methods found in MLlib's Statistics.

    col1

    the name of the column

    col2

    the name of the column to calculate the correlation against

    returns

    The Pearson Correlation Coefficient as a Double.

    Since

    1.4.0

  10. def cov(col1: String, col2: String): Double

    Calculate the sample covariance of two numerical columns of a DataFrame.

    Calculate the sample covariance of two numerical columns of a DataFrame.

    col1

    the name of the first column

    col2

    the name of the second column

    returns

    the covariance of the two columns.

    Since

    1.4.0

  11. def crosstab(col1: String, col2: String): DataFrame

    Computes a pair-wise frequency table of the given columns.

    Computes a pair-wise frequency table of the given columns. Also known as a contingency table. The number of distinct values for each column should be less than 1e4. At most 1e6 non-zero pair frequencies will be returned. The first column of each row will be the distinct values of col1 and the column names will be the distinct values of col2. The name of the first column will be $col1_$col2. Counts will be returned as Longs. Pairs that have no occurrences will have null as their counts.

    col1

    The name of the first column. Distinct items will make the first item of each row.

    col2

    The name of the second column. Distinct items will make the column names of the DataFrame.

    returns

    A DataFrame containing for the contingency table.

    Since

    1.4.0

  12. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  14. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  15. def freqItems(cols: Seq[String]): DataFrame

    (Scala-specific) Finding frequent items for columns, possibly with false positives.

    (Scala-specific) Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in proposed by Karp, Schenker, and Papadimitriou. Uses a default support of 1%.

    This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame.

    cols

    the names of the columns to search frequent items in.

    returns

    A Local DataFrame with the Array of frequent items for each column.

    Since

    1.4.0

  16. def freqItems(cols: Seq[String], support: Double): DataFrame

    (Scala-specific) Finding frequent items for columns, possibly with false positives.

    (Scala-specific) Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in proposed by Karp, Schenker, and Papadimitriou.

    This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame.

    cols

    the names of the columns to search frequent items in.

    returns

    A Local DataFrame with the Array of frequent items for each column.

    Since

    1.4.0

  17. def freqItems(cols: Array[String]): DataFrame

    Finding frequent items for columns, possibly with false positives.

    Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in proposed by Karp, Schenker, and Papadimitriou. Uses a default support of 1%.

    This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame.

    cols

    the names of the columns to search frequent items in.

    returns

    A Local DataFrame with the Array of frequent items for each column.

    Since

    1.4.0

  18. def freqItems(cols: Array[String], support: Double): DataFrame

    Finding frequent items for columns, possibly with false positives.

    Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in proposed by Karp, Schenker, and Papadimitriou. The support should be greater than 1e-4.

    This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame.

    cols

    the names of the columns to search frequent items in.

    support

    The minimum frequency for an item to be considered frequent. Should be greater than 1e-4.

    returns

    A Local DataFrame with the Array of frequent items for each column.

    Since

    1.4.0

  19. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  20. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  21. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  22. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  23. final def notify(): Unit

    Definition Classes
    AnyRef
  24. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  25. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  26. def toString(): String

    Definition Classes
    AnyRef → Any
  27. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped