Packages

c

org.apache.spark.sql

KeyValueGroupedDataset

class KeyValueGroupedDataset[K, V] extends sql.api.KeyValueGroupedDataset[K, V, Dataset]

A Dataset has been logically grouped by a user specified grouping key. Users should not construct a KeyValueGroupedDataset directly, but should instead call groupByKey on an existing Dataset.

Source
KeyValueGroupedDataset.scala
Since

2.0.0

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. KeyValueGroupedDataset
  2. KeyValueGroupedDataset
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Type Members

  1. type KVDS[KY, VL] = KeyValueGroupedDataset[KY, VL]

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def agg[U1, U2, U3, U4, U5, U6, U7, U8](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3], col4: TypedColumn[V, U4], col5: TypedColumn[V, U5], col6: TypedColumn[V, U6], col7: TypedColumn[V, U7], col8: TypedColumn[V, U8]): Dataset[(K, U1, U2, U3, U4, U5, U6, U7, U8)]

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  5. def agg[U1, U2, U3, U4, U5, U6, U7](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3], col4: TypedColumn[V, U4], col5: TypedColumn[V, U5], col6: TypedColumn[V, U6], col7: TypedColumn[V, U7]): Dataset[(K, U1, U2, U3, U4, U5, U6, U7)]

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  6. def agg[U1, U2, U3, U4, U5, U6](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3], col4: TypedColumn[V, U4], col5: TypedColumn[V, U5], col6: TypedColumn[V, U6]): Dataset[(K, U1, U2, U3, U4, U5, U6)]

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  7. def agg[U1, U2, U3, U4, U5](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3], col4: TypedColumn[V, U4], col5: TypedColumn[V, U5]): Dataset[(K, U1, U2, U3, U4, U5)]

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  8. def agg[U1, U2, U3, U4](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3], col4: TypedColumn[V, U4]): Dataset[(K, U1, U2, U3, U4)]

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  9. def agg[U1, U2, U3](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3]): Dataset[(K, U1, U2, U3)]

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  10. def agg[U1, U2](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2]): Dataset[(K, U1, U2)]

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  11. def agg[U1](col1: TypedColumn[V, U1]): Dataset[(K, U1)]

    Computes the given aggregation, returning a Dataset of tuples for each unique key and the result of computing this aggregation over all elements in the group.

    Computes the given aggregation, returning a Dataset of tuples for each unique key and the result of computing this aggregation over all elements in the group.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  12. def aggUntyped(columns: TypedColumn[_, _]*): Dataset[_]

    Internal helper function for building typed aggregations that return tuples.

    Internal helper function for building typed aggregations that return tuples. For simplicity and code reuse, we do this without the help of the type system and then use helper functions that cast appropriately for the user facing interface.

    Attributes
    protected
    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  13. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  14. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  15. def cogroup[U, R](other: KeyValueGroupedDataset[K, U], f: CoGroupFunction[K, V, U, R], encoder: Encoder[R]): Dataset[R]

    (Java-specific) Applies the given function to each cogrouped data.

    (Java-specific) Applies the given function to each cogrouped data. For each unique group, the function will be passed the grouping key and 2 iterators containing all elements in the group from Dataset this and other. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  16. def cogroup[U, R](other: KeyValueGroupedDataset[K, U])(f: (K, Iterator[V], Iterator[U]) => IterableOnce[R])(implicit arg0: Encoder[R]): Dataset[R]

    (Scala-specific) Applies the given function to each cogrouped data.

    (Scala-specific) Applies the given function to each cogrouped data. For each unique group, the function will be passed the grouping key and 2 iterators containing all elements in the group from Dataset this and other. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  17. def cogroupSorted[U, R](other: KeyValueGroupedDataset[K, U], thisSortExprs: Array[Column], otherSortExprs: Array[Column], f: CoGroupFunction[K, V, U, R], encoder: Encoder[R]): Dataset[R]

    (Java-specific) Applies the given function to each sorted cogrouped data.

    (Java-specific) Applies the given function to each sorted cogrouped data. For each unique group, the function will be passed the grouping key and 2 sorted iterators containing all elements in the group from Dataset this and other. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.

    This is equivalent to KeyValueGroupedDataset#cogroup, except for the iterators to be sorted according to the given sort expressions. That sorting does not add computational complexity.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  18. def cogroupSorted[U, R](other: KeyValueGroupedDataset[K, U])(thisSortExprs: Column*)(otherSortExprs: Column*)(f: (K, Iterator[V], Iterator[U]) => IterableOnce[R])(implicit arg0: Encoder[R]): Dataset[R]

    (Scala-specific) Applies the given function to each sorted cogrouped data.

    (Scala-specific) Applies the given function to each sorted cogrouped data. For each unique group, the function will be passed the grouping key and 2 sorted iterators containing all elements in the group from Dataset this and other. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.

    This is equivalent to KeyValueGroupedDataset#cogroup, except for the iterators to be sorted according to the given sort expressions. That sorting does not add computational complexity.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  19. def count(): Dataset[(K, Long)]

    Returns a Dataset that contains a tuple with each key and the number of items present for that key.

    Returns a Dataset that contains a tuple with each key and the number of items present for that key.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  20. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  21. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  22. def flatMapGroups[U](f: FlatMapGroupsFunction[K, V, U], encoder: Encoder[U]): Dataset[U]

    (Java-specific) Applies the given function to each group of data.

    (Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.

    This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.

    Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  23. def flatMapGroups[U](f: (K, Iterator[V]) => IterableOnce[U])(implicit arg0: Encoder[U]): Dataset[U]

    (Scala-specific) Applies the given function to each group of data.

    (Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.

    This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.

    Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  24. def flatMapGroupsWithState[S, U](func: FlatMapGroupsWithStateFunction[K, V, S, U], outputMode: OutputMode, stateEncoder: Encoder[S], outputEncoder: Encoder[U], timeoutConf: GroupStateTimeout, initialState: KeyValueGroupedDataset[K, S]): Dataset[U]

    (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

    (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.

    S

    The type of the user-defined state. Must be encodable to Spark SQL types.

    U

    The type of the output objects. Must be encodable to Spark SQL types.

    func

    Function to be called on every group.

    outputMode

    The output mode of the function.

    stateEncoder

    Encoder for the state type.

    outputEncoder

    Encoder for the output type.

    timeoutConf

    Timeout configuration for groups that do not receive data for a while.

    initialState

    The user provided state that will be initialized when the first batch of data is processed in the streaming query. The user defined function will be called on the state data even if there are no other values in the group. To covert a Dataset ds of type of type Dataset[(K, S)] to a KeyValueGroupedDataset[K, S], use

    ds.groupByKey(x => x._1).mapValues(_._2)

    See org.apache.spark.sql.Encoder for more details on what types are encodable to Spark SQL.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  25. def flatMapGroupsWithState[S, U](func: FlatMapGroupsWithStateFunction[K, V, S, U], outputMode: OutputMode, stateEncoder: Encoder[S], outputEncoder: Encoder[U], timeoutConf: GroupStateTimeout): Dataset[U]

    (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

    (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.

    S

    The type of the user-defined state. Must be encodable to Spark SQL types.

    U

    The type of the output objects. Must be encodable to Spark SQL types.

    func

    Function to be called on every group.

    outputMode

    The output mode of the function.

    stateEncoder

    Encoder for the state type.

    outputEncoder

    Encoder for the output type.

    timeoutConf

    Timeout configuration for groups that do not receive data for a while. See org.apache.spark.sql.Encoder for more details on what types are encodable to Spark SQL.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  26. def flatMapGroupsWithState[S, U](outputMode: OutputMode, timeoutConf: GroupStateTimeout, initialState: KeyValueGroupedDataset[K, S])(func: (K, Iterator[V], GroupState[S]) => Iterator[U])(implicit arg0: Encoder[S], arg1: Encoder[U]): Dataset[U]

    (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

    (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.

    S

    The type of the user-defined state. Must be encodable to Spark SQL types.

    U

    The type of the output objects. Must be encodable to Spark SQL types.

    outputMode

    The output mode of the function.

    timeoutConf

    Timeout configuration for groups that do not receive data for a while.

    initialState

    The user provided state that will be initialized when the first batch of data is processed in the streaming query. The user defined function will be called on the state data even if there are no other values in the group. To covert a Dataset ds of type of type Dataset[(K, S)] to a KeyValueGroupedDataset[K, S], use

    ds.groupByKey(x => x._1).mapValues(_._2)

    See org.apache.spark.sql.Encoder for more details on what types are encodable to Spark SQL.

    func

    Function to be called on every group.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  27. def flatMapGroupsWithState[S, U](outputMode: OutputMode, timeoutConf: GroupStateTimeout)(func: (K, Iterator[V], GroupState[S]) => Iterator[U])(implicit arg0: Encoder[S], arg1: Encoder[U]): Dataset[U]

    (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

    (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.

    S

    The type of the user-defined state. Must be encodable to Spark SQL types.

    U

    The type of the output objects. Must be encodable to Spark SQL types.

    outputMode

    The output mode of the function.

    timeoutConf

    Timeout configuration for groups that do not receive data for a while. See org.apache.spark.sql.Encoder for more details on what types are encodable to Spark SQL.

    func

    Function to be called on every group.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  28. def flatMapSortedGroups[U](SortExprs: Array[Column], f: FlatMapGroupsFunction[K, V, U], encoder: Encoder[U]): Dataset[U]

    (Java-specific) Applies the given function to each group of data.

    (Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and a sorted iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.

    This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.

    Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.

    This is equivalent to KeyValueGroupedDataset#flatMapGroups, except for the iterator to be sorted according to the given sort expressions. That sorting does not add computational complexity.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  29. def flatMapSortedGroups[U](sortExprs: Column*)(f: (K, Iterator[V]) => IterableOnce[U])(implicit arg0: Encoder[U]): Dataset[U]

    (Scala-specific) Applies the given function to each group of data.

    (Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and a sorted iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.

    This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.

    Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.

    This is equivalent to KeyValueGroupedDataset#flatMapGroups, except for the iterator to be sorted according to the given sort expressions. That sorting does not add computational complexity.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  30. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  31. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  32. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  33. def keyAs[L](implicit arg0: Encoder[L]): KeyValueGroupedDataset[L, V]

    Returns a new KeyValueGroupedDataset where the type of the key has been mapped to the specified type.

    Returns a new KeyValueGroupedDataset where the type of the key has been mapped to the specified type. The mapping of key columns to the type follows the same rules as as on Dataset.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  34. def keys: Dataset[K]

    Returns a Dataset that contains each unique key.

    Returns a Dataset that contains each unique key. This is equivalent to doing mapping over the Dataset to extract the keys and then running a distinct operation on those.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  35. def mapGroups[U](f: MapGroupsFunction[K, V, U], encoder: Encoder[U]): Dataset[U]

    (Java-specific) Applies the given function to each group of data.

    (Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a new Dataset.

    This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.

    Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  36. def mapGroups[U](f: (K, Iterator[V]) => U)(implicit arg0: Encoder[U]): Dataset[U]

    (Scala-specific) Applies the given function to each group of data.

    (Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a new Dataset.

    This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.

    Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  37. def mapGroupsWithState[S, U](func: MapGroupsWithStateFunction[K, V, S, U], stateEncoder: Encoder[S], outputEncoder: Encoder[U], timeoutConf: GroupStateTimeout, initialState: KeyValueGroupedDataset[K, S]): Dataset[U]

    (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

    (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.

    S

    The type of the user-defined state. Must be encodable to Spark SQL types.

    U

    The type of the output objects. Must be encodable to Spark SQL types.

    func

    Function to be called on every group.

    stateEncoder

    Encoder for the state type.

    outputEncoder

    Encoder for the output type.

    timeoutConf

    Timeout configuration for groups that do not receive data for a while.

    initialState

    The user provided state that will be initialized when the first batch of data is processed in the streaming query. The user defined function will be called on the state data even if there are no other values in the group. See org.apache.spark.sql.Encoder for more details on what types are encodable to Spark SQL.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  38. def mapGroupsWithState[S, U](func: MapGroupsWithStateFunction[K, V, S, U], stateEncoder: Encoder[S], outputEncoder: Encoder[U], timeoutConf: GroupStateTimeout): Dataset[U]

    (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

    (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.

    S

    The type of the user-defined state. Must be encodable to Spark SQL types.

    U

    The type of the output objects. Must be encodable to Spark SQL types.

    func

    Function to be called on every group.

    stateEncoder

    Encoder for the state type.

    outputEncoder

    Encoder for the output type.

    timeoutConf

    Timeout configuration for groups that do not receive data for a while. See org.apache.spark.sql.Encoder for more details on what types are encodable to Spark SQL.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  39. def mapGroupsWithState[S, U](func: MapGroupsWithStateFunction[K, V, S, U], stateEncoder: Encoder[S], outputEncoder: Encoder[U]): Dataset[U]

    (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

    (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.

    S

    The type of the user-defined state. Must be encodable to Spark SQL types.

    U

    The type of the output objects. Must be encodable to Spark SQL types.

    func

    Function to be called on every group.

    stateEncoder

    Encoder for the state type.

    outputEncoder

    Encoder for the output type. See org.apache.spark.sql.Encoder for more details on what types are encodable to Spark SQL.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  40. def mapGroupsWithState[S, U](timeoutConf: GroupStateTimeout, initialState: KeyValueGroupedDataset[K, S])(func: (K, Iterator[V], GroupState[S]) => U)(implicit arg0: Encoder[S], arg1: Encoder[U]): Dataset[U]

    (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

    (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See org.apache.spark.sql.streaming.GroupState for more details.

    S

    The type of the user-defined state. Must be encodable to Spark SQL types.

    U

    The type of the output objects. Must be encodable to Spark SQL types.

    timeoutConf

    Timeout Conf, see GroupStateTimeout for more details

    initialState

    The user provided state that will be initialized when the first batch of data is processed in the streaming query. The user defined function will be called on the state data even if there are no other values in the group. To convert a Dataset ds of type Dataset[(K, S)] to a KeyValueGroupedDataset[K, S] do

    ds.groupByKey(x => x._1).mapValues(_._2)

    See org.apache.spark.sql.Encoder for more details on what types are encodable to Spark SQL.

    func

    Function to be called on every group.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  41. def mapGroupsWithState[S, U](timeoutConf: GroupStateTimeout)(func: (K, Iterator[V], GroupState[S]) => U)(implicit arg0: Encoder[S], arg1: Encoder[U]): Dataset[U]

    (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

    (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See org.apache.spark.sql.streaming.GroupState for more details.

    S

    The type of the user-defined state. Must be encodable to Spark SQL types.

    U

    The type of the output objects. Must be encodable to Spark SQL types.

    timeoutConf

    Timeout configuration for groups that do not receive data for a while. See org.apache.spark.sql.Encoder for more details on what types are encodable to Spark SQL.

    func

    Function to be called on every group.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  42. def mapGroupsWithState[S, U](func: (K, Iterator[V], GroupState[S]) => U)(implicit arg0: Encoder[S], arg1: Encoder[U]): Dataset[U]

    (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

    (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See org.apache.spark.sql.streaming.GroupState for more details.

    S

    The type of the user-defined state. Must be encodable to Spark SQL types.

    U

    The type of the output objects. Must be encodable to Spark SQL types.

    func

    Function to be called on every group. See org.apache.spark.sql.Encoder for more details on what types are encodable to Spark SQL.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  43. def mapValues[W](func: MapFunction[V, W], encoder: Encoder[W]): KeyValueGroupedDataset[K, W]

    Returns a new KeyValueGroupedDataset where the given function func has been applied to the data.

    Returns a new KeyValueGroupedDataset where the given function func has been applied to the data. The grouping key is unchanged by this.

    // Create Integer values grouped by String key from a Dataset<Tuple2<String, Integer>>
    Dataset<Tuple2<String, Integer>> ds = ...;
    KeyValueGroupedDataset<String, Integer> grouped =
      ds.groupByKey(t -> t._1, Encoders.STRING()).mapValues(t -> t._2, Encoders.INT());
    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  44. def mapValues[W](func: (V) => W)(implicit arg0: Encoder[W]): KeyValueGroupedDataset[K, W]

    Returns a new KeyValueGroupedDataset where the given function func has been applied to the data.

    Returns a new KeyValueGroupedDataset where the given function func has been applied to the data. The grouping key is unchanged by this.

    // Create values grouped by key from a Dataset[(K, V)]
    ds.groupByKey(_._1).mapValues(_._2) // Scala
    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  45. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  46. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  47. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  48. val queryExecution: QueryExecution
  49. def reduceGroups(f: ReduceFunction[V]): Dataset[(K, V)]

    (Java-specific) Reduces the elements of each group of data using the specified binary function.

    (Java-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  50. def reduceGroups(f: (V, V) => V): Dataset[(K, V)]

    (Scala-specific) Reduces the elements of each group of data using the specified binary function.

    (Scala-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.

    Definition Classes
    KeyValueGroupedDatasetKeyValueGroupedDataset
  51. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  52. def toString(): String
    Definition Classes
    KeyValueGroupedDataset → AnyRef → Any
  53. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  54. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  55. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from api.KeyValueGroupedDataset[K, V, Dataset]

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped