org.apache.spark.sql.KeyValueGroupedDataset<K,V>

All Implemented Interfaces:: Serializable

public abstract class KeyValueGroupedDataset<K,V> extends Object implements Serializable

A Dataset has been logically grouped by a user specified grouping key. Users should not construct a KeyValueGroupedDataset directly, but should instead call groupByKey on an existing Dataset.

Since:

2.0.0

See Also:

Serialized Form

Constructor Summary

Constructors

Constructor

Description

KeyValueGroupedDataset()
Method Summary

Modifier and Type

Method

Description

<U1> Dataset<scala.Tuple2<K,U1>>

agg(TypedColumn<V,U1> col1)

Computes the given aggregation, returning a Dataset of tuples for each unique key and the result of computing this aggregation over all elements in the group.

<U1, U2> Dataset<scala.Tuple3<K,U1,U2>>

agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2)

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

<U1, U2, U3> Dataset<scala.Tuple4<K,U1,U2,U3>>

agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3)

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

<U1, U2, U3, U4> Dataset<scala.Tuple5<K,U1,U2,U3,U4>>

agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4)

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

<U1, U2, U3, U4, U5> Dataset<scala.Tuple6<K,U1,U2,U3,U4,U5>>

agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5)

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

<U1, U2, U3, U4, U5, U6> Dataset<scala.Tuple7<K,U1,U2,U3,U4,U5,U6>>

agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5, TypedColumn<V,U6> col6)

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

<U1, U2, U3, U4, U5, U6, U7> Dataset<scala.Tuple8<K,U1,U2,U3,U4,U5,U6,U7>>

agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5, TypedColumn<V,U6> col6, TypedColumn<V,U7> col7)

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

<U1, U2, U3, U4, U5, U6, U7, U8> Dataset<scala.Tuple9<K,U1,U2,U3,U4,U5,U6,U7,U8>>

agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5, TypedColumn<V,U6> col6, TypedColumn<V,U7> col7, TypedColumn<V,U8> col8)

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

<U, R> Dataset<R>

cogroup(KeyValueGroupedDataset<K,U> other, CoGroupFunction<K,V,U,R> f, Encoder<R> encoder)

(Java-specific) Applies the given function to each cogrouped data.

<U, R> Dataset<R>

cogroup(KeyValueGroupedDataset<K,U> other, scala.Function3<K,scala.collection.Iterator<V>,scala.collection.Iterator,scala.collection.IterableOnce<R>> f, Encoder<R> evidence$28)

(Scala-specific) Applies the given function to each cogrouped data.

<U, R> Dataset<R>

cogroupSorted(KeyValueGroupedDataset<K,U> other, Column[] thisSortExprs, Column[] otherSortExprs, CoGroupFunction<K,V,U,R> f, Encoder<R> encoder)

(Java-specific) Applies the given function to each sorted cogrouped data.

abstract <U, R> Dataset<R>

cogroupSorted(KeyValueGroupedDataset<K,U> other, scala.collection.immutable.Seq<Column> thisSortExprs, scala.collection.immutable.Seq<Column> otherSortExprs, scala.Function3<K,scala.collection.Iterator<V>,scala.collection.Iterator,scala.collection.IterableOnce<R>> f, Encoder<R> evidence$29)

(Scala-specific) Applies the given function to each sorted cogrouped data.

Dataset<scala.Tuple2<K,Object>>

count()

Returns a Dataset that contains a tuple with each key and the number of items present for that key.

 Dataset

flatMapGroups(FlatMapGroupsFunction<K,V,U> f, Encoder encoder)

(Java-specific) Applies the given function to each group of data.

 Dataset

flatMapGroups(scala.Function2<K,scala.collection.Iterator<V>,scala.collection.IterableOnce> f, Encoder evidence$3)

(Scala-specific) Applies the given function to each group of data.

<S, U> Dataset

flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K,V,S,U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf)

(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

<S, U> Dataset

flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K,V,S,U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState)

(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

abstract <S, U> Dataset

flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,scala.collection.Iterator> func, Encoder<S> evidence$14, Encoder evidence$15)

(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

abstract <S, U> Dataset

flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,scala.collection.Iterator> func, Encoder<S> evidence$12, Encoder evidence$13)

(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

 Dataset

flatMapSortedGroups(Column[] SortExprs, FlatMapGroupsFunction<K,V,U> f, Encoder encoder)

(Java-specific) Applies the given function to each group of data.

abstract Dataset

flatMapSortedGroups(scala.collection.immutable.Seq<Column> sortExprs, scala.Function2<K,scala.collection.Iterator<V>,scala.collection.IterableOnce> f, Encoder evidence$4)

(Scala-specific) Applies the given function to each group of data.

abstract <L> KeyValueGroupedDataset<L,V>

keyAs(Encoder<L> evidence$1)

Returns a new KeyValueGroupedDataset where the type of the key has been mapped to the specified type.

abstract Dataset<K>

keys()

Returns a Dataset that contains each unique key.

 Dataset

mapGroups(MapGroupsFunction<K,V,U> f, Encoder encoder)

(Java-specific) Applies the given function to each group of data.

 Dataset

mapGroups(scala.Function2<K,scala.collection.Iterator<V>,U> f, Encoder evidence$5)

(Scala-specific) Applies the given function to each group of data.

<S, U> Dataset

mapGroupsWithState(MapGroupsWithStateFunction<K,V,S,U> func, Encoder<S> stateEncoder, Encoder outputEncoder)

(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

<S, U> Dataset

mapGroupsWithState(MapGroupsWithStateFunction<K,V,S,U> func, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf)

(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

<S, U> Dataset

mapGroupsWithState(MapGroupsWithStateFunction<K,V,S,U> func, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState)

(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

abstract <S, U> Dataset

mapGroupsWithState(GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,U> func, Encoder<S> evidence$10, Encoder evidence$11)

(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

abstract <S, U> Dataset

mapGroupsWithState(GroupStateTimeout timeoutConf, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,U> func, Encoder<S> evidence$8, Encoder evidence$9)

(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

abstract <S, U> Dataset

mapGroupsWithState(scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,U> func, Encoder<S> evidence$6, Encoder evidence$7)

(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

<W> KeyValueGroupedDataset<K,W>

mapValues(MapFunction<V,W> func, Encoder<W> encoder)

Returns a new KeyValueGroupedDataset where the given function func has been applied to the data.

abstract <W> KeyValueGroupedDataset<K,W>

mapValues(scala.Function1<V,W> func, Encoder<W> evidence$2)

Returns a new KeyValueGroupedDataset where the given function func has been applied to the data.

Dataset<scala.Tuple2<K,V>>

reduceGroups(ReduceFunction<V> f)

(Java-specific) Reduces the elements of each group of data using the specified binary function.

abstract Dataset<scala.Tuple2<K,V>>

reduceGroups(scala.Function2<V,V,V> f)

(Scala-specific) Reduces the elements of each group of data using the specified binary function.

abstract Dataset

transformWithState(StatefulProcessor<K,V,U> statefulProcessor, String eventTimeColumnName, OutputMode outputMode, Encoder evidence$17)

(Scala-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2.

 Dataset

transformWithState(StatefulProcessor<K,V,U> statefulProcessor, String eventTimeColumnName, OutputMode outputMode, Encoder outputEncoder, Encoder evidence$19)

(Java-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2.

abstract Dataset

transformWithState(StatefulProcessor<K,V,U> statefulProcessor, TimeMode timeMode, OutputMode outputMode, Encoder evidence$16)

(Scala-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2.

 Dataset

transformWithState(StatefulProcessor<K,V,U> statefulProcessor, TimeMode timeMode, OutputMode outputMode, Encoder outputEncoder, Encoder evidence$18)

(Java-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2.

abstract <U, S> Dataset

transformWithState(StatefulProcessorWithInitialState<K,V,U,S> statefulProcessor, String eventTimeColumnName, OutputMode outputMode, KeyValueGroupedDataset<K,S> initialState, Encoder evidence$22, Encoder<S> evidence$23)

(Scala-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2.

<U, S> Dataset

transformWithState(StatefulProcessorWithInitialState<K,V,U,S> statefulProcessor, OutputMode outputMode, KeyValueGroupedDataset<K,S> initialState, String eventTimeColumnName, Encoder outputEncoder, Encoder<S> initialStateEncoder, Encoder evidence$26, Encoder<S> evidence$27)

(Java-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2.

abstract <U, S> Dataset

transformWithState(StatefulProcessorWithInitialState<K,V,U,S> statefulProcessor, TimeMode timeMode, OutputMode outputMode, KeyValueGroupedDataset<K,S> initialState, Encoder evidence$20, Encoder<S> evidence$21)

(Scala-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2.

<U, S> Dataset

transformWithState(StatefulProcessorWithInitialState<K,V,U,S> statefulProcessor, TimeMode timeMode, OutputMode outputMode, KeyValueGroupedDataset<K,S> initialState, Encoder outputEncoder, Encoder<S> initialStateEncoder, Encoder evidence$24, Encoder<S> evidence$25)

(Java-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- KeyValueGroupedDataset
  
  public KeyValueGroupedDataset()
Method Details
- agg
 
 public <U1> Dataset<scala.Tuple2<K,U1>> agg(TypedColumn<V,U1> col1)
 
 Computes the given aggregation, returning a Dataset of tuples for each unique key and the result of computing this aggregation over all elements in the group.
 
 Parameters:
 
 col1 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 1.6.0
- agg
 
 public <U1, U2> Dataset<scala.Tuple3<K,U1,U2>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2)
 
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Parameters:
 
 col1 - (undocumented)
 
 col2 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 1.6.0
- agg
 
 public <U1, U2, U3> Dataset<scala.Tuple4<K,U1,U2,U3>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3)
 
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Parameters:
 
 col1 - (undocumented)
 
 col2 - (undocumented)
 
 col3 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 1.6.0
- agg
 
 public <U1, U2, U3, U4> Dataset<scala.Tuple5<K,U1,U2,U3,U4>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4)
 
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Parameters:
 
 col1 - (undocumented)
 
 col2 - (undocumented)
 
 col3 - (undocumented)
 
 col4 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 1.6.0
- agg
 
 public <U1, U2, U3, U4, U5> Dataset<scala.Tuple6<K,U1,U2,U3,U4,U5>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5)
 
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Parameters:
 
 col1 - (undocumented)
 
 col2 - (undocumented)
 
 col3 - (undocumented)
 
 col4 - (undocumented)
 
 col5 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 3.0.0
- agg
 
 public <U1, U2, U3, U4, U5, U6> Dataset<scala.Tuple7<K,U1,U2,U3,U4,U5,U6>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5, TypedColumn<V,U6> col6)
 
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Parameters:
 
 col1 - (undocumented)
 
 col2 - (undocumented)
 
 col3 - (undocumented)
 
 col4 - (undocumented)
 
 col5 - (undocumented)
 
 col6 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 3.0.0
- agg
 
 public <U1, U2, U3, U4, U5, U6, U7> Dataset<scala.Tuple8<K,U1,U2,U3,U4,U5,U6,U7>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5, TypedColumn<V,U6> col6, TypedColumn<V,U7> col7)
 
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Parameters:
 
 col1 - (undocumented)
 
 col2 - (undocumented)
 
 col3 - (undocumented)
 
 col4 - (undocumented)
 
 col5 - (undocumented)
 
 col6 - (undocumented)
 
 col7 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 3.0.0
- agg
 
 public <U1, U2, U3, U4, U5, U6, U7, U8> Dataset<scala.Tuple9<K,U1,U2,U3,U4,U5,U6,U7,U8>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5, TypedColumn<V,U6> col6, TypedColumn<V,U7> col7, TypedColumn<V,U8> col8)
 
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Parameters:
 
 col1 - (undocumented)
 
 col2 - (undocumented)
 
 col3 - (undocumented)
 
 col4 - (undocumented)
 
 col5 - (undocumented)
 
 col6 - (undocumented)
 
 col7 - (undocumented)
 
 col8 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 3.0.0
- cogroup
 
 public <U, R> Dataset<R> cogroup(KeyValueGroupedDataset<K,U> other, scala.Function3<K,scala.collection.Iterator<V>,scala.collection.Iterator,scala.collection.IterableOnce<R>> f, Encoder<R> evidence$28)
 
 (Scala-specific) Applies the given function to each cogrouped data. For each unique group, the function will be passed the grouping key and 2 iterators containing all elements in the group from Dataset this and other. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 
 Parameters:
 
 other - (undocumented)
 
 f - (undocumented)
 
 evidence$28 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 1.6.0
- cogroup
 
 public <U, R> Dataset<R> cogroup(KeyValueGroupedDataset<K,U> other, CoGroupFunction<K,V,U,R> f, Encoder<R> encoder)
 
 (Java-specific) Applies the given function to each cogrouped data. For each unique group, the function will be passed the grouping key and 2 iterators containing all elements in the group from Dataset this and other. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 
 Parameters:
 
 other - (undocumented)
 
 f - (undocumented)
 
 encoder - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 1.6.0
- cogroupSorted
 
 public abstract <U, R> Dataset<R> cogroupSorted(KeyValueGroupedDataset<K,U> other, scala.collection.immutable.Seq<Column> thisSortExprs, scala.collection.immutable.Seq<Column> otherSortExprs, scala.Function3<K,scala.collection.Iterator<V>,scala.collection.Iterator,scala.collection.IterableOnce<R>> f, Encoder<R> evidence$29)
 
 (Scala-specific) Applies the given function to each sorted cogrouped data. For each unique group, the function will be passed the grouping key and 2 sorted iterators containing all elements in the group from Dataset this and other. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 This is equivalent to cogroup(org.apache.spark.sql.KeyValueGroupedDataset<K, U>, scala.Function3<K, scala.collection.Iterator<V>, scala.collection.Iterator, scala.collection.IterableOnce<R>>, org.apache.spark.sql.Encoder<R>), except for the iterators to be sorted according to the given sort expressions. That sorting does not add computational complexity.
 Parameters:
 
 other - (undocumented)
 
 thisSortExprs - (undocumented)
 
 otherSortExprs - (undocumented)
 
 f - (undocumented)
 
 evidence$29 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 3.4.0
 
 See Also:
 
 org.apache.spark.sql.api.KeyValueGroupedDataset#cogroup
- cogroupSorted
 
 public <U, R> Dataset<R> cogroupSorted(KeyValueGroupedDataset<K,U> other, Column[] thisSortExprs, Column[] otherSortExprs, CoGroupFunction<K,V,U,R> f, Encoder<R> encoder)
 
 (Java-specific) Applies the given function to each sorted cogrouped data. For each unique group, the function will be passed the grouping key and 2 sorted iterators containing all elements in the group from Dataset this and other. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 This is equivalent to cogroup(org.apache.spark.sql.KeyValueGroupedDataset<K, U>, scala.Function3<K, scala.collection.Iterator<V>, scala.collection.Iterator, scala.collection.IterableOnce<R>>, org.apache.spark.sql.Encoder<R>), except for the iterators to be sorted according to the given sort expressions. That sorting does not add computational complexity.
 Parameters:
 
 other - (undocumented)
 
 thisSortExprs - (undocumented)
 
 otherSortExprs - (undocumented)
 
 f - (undocumented)
 
 encoder - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 3.4.0
 
 See Also:
 
 org.apache.spark.sql.api.KeyValueGroupedDataset#cogroup
- count
 
 public Dataset<scala.Tuple2<K,Object>> count()
 
 Returns a Dataset that contains a tuple with each key and the number of items present for that key.
 
 Returns:
 
 (undocumented)
 
 Since:
 
 1.6.0
- flatMapGroups
 
 public Dataset flatMapGroups(scala.Function2<K,scala.collection.Iterator<V>,scala.collection.IterableOnce> f, Encoder evidence$3)
 
 (Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 
 Parameters:
 
 f - (undocumented)
 
 evidence$3 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 1.6.0
- flatMapGroups
 
 public Dataset flatMapGroups(FlatMapGroupsFunction<K,V,U> f, Encoder encoder)
 
 (Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 
 Parameters:
 
 f - (undocumented)
 
 encoder - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 1.6.0
- flatMapGroupsWithState
 
 public abstract <S, U> Dataset flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,scala.collection.Iterator> func, Encoder<S> evidence$12, Encoder evidence$13)
 
 (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 
 Parameters:
 
 func - Function to be called on every group.
 
 outputMode - The output mode of the function.
 
 timeoutConf - Timeout configuration for groups that do not receive data for a while.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 evidence$12 - (undocumented)
 
 evidence$13 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 2.2.0
- flatMapGroupsWithState
 
 public abstract <S, U> Dataset flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,scala.collection.Iterator> func, Encoder<S> evidence$14, Encoder evidence$15)
 
 (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 Parameters:
 
 func - Function to be called on every group.
 
 outputMode - The output mode of the function.
 
 timeoutConf - Timeout configuration for groups that do not receive data for a while.
 
 initialState - The user provided state that will be initialized when the first batch of data is processed in the streaming query. The user defined function will be called on the state data even if there are no other values in the group. To covert a Dataset ds of type of type Dataset[(K, S)] to a KeyValueGroupedDataset[K, S], use
 ds.groupByKey(x => x._1).mapValues(_._2)
 See Encoder for more details on what types are encodable to Spark SQL. @since 3.2.0
 
 evidence$14 - (undocumented)
 
 evidence$15 - (undocumented)
 
 Returns:
 
 (undocumented)
- flatMapGroupsWithState
 
 public <S, U> Dataset flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K,V,S,U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf)
 
 (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 
 Parameters:
 
 func - Function to be called on every group.
 
 outputMode - The output mode of the function.
 
 stateEncoder - Encoder for the state type.
 
 outputEncoder - Encoder for the output type.
 
 timeoutConf - Timeout configuration for groups that do not receive data for a while.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 Returns:
 
 (undocumented)
 
 Since:
 
 2.2.0
- flatMapGroupsWithState
 
 public <S, U> Dataset flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K,V,S,U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState)
 
 (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 Parameters:
 
 func - Function to be called on every group.
 
 outputMode - The output mode of the function.
 
 stateEncoder - Encoder for the state type.
 
 outputEncoder - Encoder for the output type.
 
 timeoutConf - Timeout configuration for groups that do not receive data for a while.
 
 initialState - The user provided state that will be initialized when the first batch of data is processed in the streaming query. The user defined function will be called on the state data even if there are no other values in the group. To covert a Dataset ds of type of type Dataset[(K, S)] to a KeyValueGroupedDataset[K, S], use
 ds.groupByKey(x => x._1).mapValues(_._2)
 See {@link org.apache.spark.sql.Encoder} for more details on what types are encodable to Spark SQL. @since 3.2.0
 
 Returns:
 
 (undocumented)
- flatMapSortedGroups
 
 public abstract Dataset flatMapSortedGroups(scala.collection.immutable.Seq<Column> sortExprs, scala.Function2<K,scala.collection.Iterator<V>,scala.collection.IterableOnce> f, Encoder evidence$4)
 
 (Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and a sorted iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 This is equivalent to flatMapGroups(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce>, org.apache.spark.sql.Encoder), except for the iterator to be sorted according to the given sort expressions. That sorting does not add computational complexity.
 Parameters:
 
 sortExprs - (undocumented)
 
 f - (undocumented)
 
 evidence$4 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 3.4.0
 
 See Also:
 
 org.apache.spark.sql.api.KeyValueGroupedDataset#flatMapGroups
- flatMapSortedGroups
 
 public Dataset flatMapSortedGroups(Column[] SortExprs, FlatMapGroupsFunction<K,V,U> f, Encoder encoder)
 
 (Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and a sorted iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 This is equivalent to flatMapGroups(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce>, org.apache.spark.sql.Encoder), except for the iterator to be sorted according to the given sort expressions. That sorting does not add computational complexity.
 Parameters:
 
 SortExprs - (undocumented)
 
 f - (undocumented)
 
 encoder - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 3.4.0
 
 See Also:
 
 org.apache.spark.sql.api.KeyValueGroupedDataset#flatMapGroups
- keyAs
 
 public abstract <L> KeyValueGroupedDataset<L,V> keyAs(Encoder<L> evidence$1)
 
 Returns a new KeyValueGroupedDataset where the type of the key has been mapped to the specified type. The mapping of key columns to the type follows the same rules as as on Dataset.
 
 Parameters:
 
 evidence$1 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 1.6.0
- keys
 
 public abstract Dataset<K> keys()
 
 Returns a Dataset that contains each unique key. This is equivalent to doing mapping over the Dataset to extract the keys and then running a distinct operation on those.
 
 Returns:
 
 (undocumented)
 
 Since:
 
 1.6.0
- mapGroups
 
 public Dataset mapGroups(scala.Function2<K,scala.collection.Iterator<V>,U> f, Encoder evidence$5)
 
 (Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 
 Parameters:
 
 f - (undocumented)
 
 evidence$5 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 1.6.0
- mapGroups
 
 public Dataset mapGroups(MapGroupsFunction<K,V,U> f, Encoder encoder)
 
 (Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 
 Parameters:
 
 f - (undocumented)
 
 encoder - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 1.6.0
- mapGroupsWithState
 
 public abstract <S, U> Dataset mapGroupsWithState(scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,U> func, Encoder<S> evidence$6, Encoder evidence$7)
 
 (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 
 Parameters:
 
 func - Function to be called on every group.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 evidence$6 - (undocumented)
 
 evidence$7 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 2.2.0
- mapGroupsWithState
 
 public abstract <S, U> Dataset mapGroupsWithState(GroupStateTimeout timeoutConf, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,U> func, Encoder<S> evidence$8, Encoder evidence$9)
 
 (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 
 Parameters:
 
 func - Function to be called on every group.
 
 timeoutConf - Timeout configuration for groups that do not receive data for a while.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 evidence$8 - (undocumented)
 
 evidence$9 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 2.2.0
- mapGroupsWithState
 
 public abstract <S, U> Dataset mapGroupsWithState(GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,U> func, Encoder<S> evidence$10, Encoder evidence$11)
 
 (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 Parameters:
 
 func - Function to be called on every group.
 
 timeoutConf - Timeout Conf, see GroupStateTimeout for more details
 
 initialState - The user provided state that will be initialized when the first batch of data is processed in the streaming query. The user defined function will be called on the state data even if there are no other values in the group. To convert a Dataset ds of type Dataset[(K, S)] to a KeyValueGroupedDataset[K, S] do
 ds.groupByKey(x => x._1).mapValues(_._2)
 See {@link org.apache.spark.sql.Encoder} for more details on what types are encodable to Spark SQL. @since 3.2.0
 
 evidence$10 - (undocumented)
 
 evidence$11 - (undocumented)
 
 Returns:
 
 (undocumented)
- mapGroupsWithState
 
 public <S, U> Dataset mapGroupsWithState(MapGroupsWithStateFunction<K,V,S,U> func, Encoder<S> stateEncoder, Encoder outputEncoder)
 
 (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 
 Parameters:
 
 func - Function to be called on every group.
 
 stateEncoder - Encoder for the state type.
 
 outputEncoder - Encoder for the output type.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 Returns:
 
 (undocumented)
 
 Since:
 
 2.2.0
- mapGroupsWithState
 
 public <S, U> Dataset mapGroupsWithState(MapGroupsWithStateFunction<K,V,S,U> func, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf)
 
 (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 
 Parameters:
 
 func - Function to be called on every group.
 
 stateEncoder - Encoder for the state type.
 
 outputEncoder - Encoder for the output type.
 
 timeoutConf - Timeout configuration for groups that do not receive data for a while.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 Returns:
 
 (undocumented)
 
 Since:
 
 2.2.0
- mapGroupsWithState
 
 public <S, U> Dataset mapGroupsWithState(MapGroupsWithStateFunction<K,V,S,U> func, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState)
 
 (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 
 Parameters:
 
 func - Function to be called on every group.
 
 stateEncoder - Encoder for the state type.
 
 outputEncoder - Encoder for the output type.
 
 timeoutConf - Timeout configuration for groups that do not receive data for a while.
 
 initialState - The user provided state that will be initialized when the first batch of data is processed in the streaming query. The user defined function will be called on the state data even if there are no other values in the group.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 Returns:
 
 (undocumented)
 
 Since:
 
 3.2.0
- mapValues
 
 public abstract <W> KeyValueGroupedDataset<K,W> mapValues(scala.Function1<V,W> func, Encoder<W> evidence$2)
 Returns a new KeyValueGroupedDataset where the given function func has been applied to the data. The grouping key is unchanged by this.
 
 // Create values grouped by key from a Dataset[(K, V)] ds.groupByKey(_._1).mapValues(_._2) // Scala
 Parameters:
 
 func - (undocumented)
 
 evidence$2 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 2.1.0
- mapValues
 
 public <W> KeyValueGroupedDataset<K,W> mapValues(MapFunction<V,W> func, Encoder<W> encoder)
 Returns a new KeyValueGroupedDataset where the given function func has been applied to the data. The grouping key is unchanged by this.
 
 // Create Integer values grouped by String key from a Dataset<Tuple2<String, Integer>> Dataset<Tuple2<String, Integer>> ds = ...; KeyValueGroupedDataset<String, Integer> grouped = ds.groupByKey(t -> t._1, Encoders.STRING()).mapValues(t -> t._2, Encoders.INT());
 Parameters:
 
 func - (undocumented)
 
 encoder - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 2.1.0
- reduceGroups
 
 public abstract Dataset<scala.Tuple2<K,V>> reduceGroups(scala.Function2<V,V,V> f)
 
 (Scala-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.
 
 Parameters:
 
 f - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 1.6.0
- reduceGroups
 
 public Dataset<scala.Tuple2<K,V>> reduceGroups(ReduceFunction<V> f)
 
 (Java-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.
 
 Parameters:
 
 f - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Since:
 
 1.6.0
- transformWithState
 
 public abstract Dataset transformWithState(StatefulProcessor<K,V,U> statefulProcessor, TimeMode timeMode, OutputMode outputMode, Encoder evidence$16)
 
 (Scala-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2. We allow the user to act on per-group set of input rows along with keyed state and the user can choose to output/return 0 or more rows. For a streaming dataframe, we will repeatedly invoke the interface methods for new rows in each trigger and the user's state/state variables will be stored persistently across invocations.
 
 Parameters:
 
 statefulProcessor - Instance of statefulProcessor whose functions will be invoked by the operator.
 
 timeMode - The time mode semantics of the stateful processor for timers and TTL.
 
 outputMode - The output mode of the stateful processor.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 evidence$16 - (undocumented)
 
 Returns:
 
 (undocumented)
- transformWithState
 
 public abstract Dataset transformWithState(StatefulProcessor<K,V,U> statefulProcessor, String eventTimeColumnName, OutputMode outputMode, Encoder evidence$17)
 
 (Scala-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2. We allow the user to act on per-group set of input rows along with keyed state and the user can choose to output/return 0 or more rows. For a streaming dataframe, we will repeatedly invoke the interface methods for new rows in each trigger and the user's state/state variables will be stored persistently across invocations.
 Downstream operators would use specified eventTimeColumnName to calculate watermark. Note that TimeMode is set to EventTime to ensure correct flow of watermark.
 
 Parameters:
 
 statefulProcessor - Instance of statefulProcessor whose functions will be invoked by the operator.
 
 eventTimeColumnName - eventTime column in the output dataset. Any operations after transformWithState will use the new eventTimeColumn. The user needs to ensure that the eventTime for emitted output adheres to the watermark boundary, otherwise streaming query will fail.
 
 outputMode - The output mode of the stateful processor.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 evidence$17 - (undocumented)
 
 Returns:
 
 (undocumented)
- transformWithState
 
 public Dataset transformWithState(StatefulProcessor<K,V,U> statefulProcessor, TimeMode timeMode, OutputMode outputMode, Encoder outputEncoder, Encoder evidence$18)
 
 (Java-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2. We allow the user to act on per-group set of input rows along with keyed state and the user can choose to output/return 0 or more rows. For a streaming dataframe, we will repeatedly invoke the interface methods for new rows in each trigger and the user's state/state variables will be stored persistently across invocations.
 
 Parameters:
 
 statefulProcessor - Instance of statefulProcessor whose functions will be invoked by the operator.
 
 timeMode - The time mode semantics of the stateful processor for timers and TTL.
 
 outputMode - The output mode of the stateful processor.
 
 outputEncoder - Encoder for the output type.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 evidence$18 - (undocumented)
 
 Returns:
 
 (undocumented)
- transformWithState
 
 public Dataset transformWithState(StatefulProcessor<K,V,U> statefulProcessor, String eventTimeColumnName, OutputMode outputMode, Encoder outputEncoder, Encoder evidence$19)
 
 (Java-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2. We allow the user to act on per-group set of input rows along with keyed state and the user can choose to output/return 0 or more rows.
 For a streaming dataframe, we will repeatedly invoke the interface methods for new rows in each trigger and the user's state/state variables will be stored persistently across invocations.
 Downstream operators would use specified eventTimeColumnName to calculate watermark. Note that TimeMode is set to EventTime to ensure correct flow of watermark.
 
 Parameters:
 
 statefulProcessor - Instance of statefulProcessor whose functions will be invoked by the operator.
 
 eventTimeColumnName - eventTime column in the output dataset. Any operations after transformWithState will use the new eventTimeColumn. The user needs to ensure that the eventTime for emitted output adheres to the watermark boundary, otherwise streaming query will fail.
 
 outputMode - The output mode of the stateful processor.
 
 outputEncoder - Encoder for the output type.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 evidence$19 - (undocumented)
 
 Returns:
 
 (undocumented)
- transformWithState
 
 public abstract <U, S> Dataset transformWithState(StatefulProcessorWithInitialState<K,V,U,S> statefulProcessor, TimeMode timeMode, OutputMode outputMode, KeyValueGroupedDataset<K,S> initialState, Encoder evidence$20, Encoder<S> evidence$21)
 
 (Scala-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2. Functions as the function above, but with additional initial state.
 
 Parameters:
 
 statefulProcessor - Instance of statefulProcessor whose functions will be invoked by the operator.
 
 timeMode - The time mode semantics of the stateful processor for timers and TTL.
 
 outputMode - The output mode of the stateful processor.
 
 initialState - User provided initial state that will be used to initiate state for the query in the first batch.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 evidence$20 - (undocumented)
 
 evidence$21 - (undocumented)
 
 Returns:
 
 (undocumented)
- transformWithState
 
 public abstract <U, S> Dataset transformWithState(StatefulProcessorWithInitialState<K,V,U,S> statefulProcessor, String eventTimeColumnName, OutputMode outputMode, KeyValueGroupedDataset<K,S> initialState, Encoder evidence$22, Encoder<S> evidence$23)
 
 (Scala-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2. Functions as the function above, but with additional eventTimeColumnName for output.
 
 Parameters:
 
 statefulProcessor - Instance of statefulProcessor whose functions will be invoked by the operator.
 
 eventTimeColumnName - eventTime column in the output dataset. Any operations after transformWithState will use the new eventTimeColumn. The user needs to ensure that the eventTime for emitted output adheres to the watermark boundary, otherwise streaming query will fail.
 
 outputMode - The output mode of the stateful processor.
 
 initialState - User provided initial state that will be used to initiate state for the query in the first batch.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 evidence$22 - (undocumented)
 
 evidence$23 - (undocumented)
 
 Returns:
 
 (undocumented)
- transformWithState
 
 public <U, S> Dataset transformWithState(StatefulProcessorWithInitialState<K,V,U,S> statefulProcessor, TimeMode timeMode, OutputMode outputMode, KeyValueGroupedDataset<K,S> initialState, Encoder outputEncoder, Encoder<S> initialStateEncoder, Encoder evidence$24, Encoder<S> evidence$25)
 
 (Java-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2. Functions as the function above, but with additional initialStateEncoder for state encoding.
 
 Parameters:
 
 statefulProcessor - Instance of statefulProcessor whose functions will be invoked by the operator.
 
 timeMode - The time mode semantics of the stateful processor for timers and TTL.
 
 outputMode - The output mode of the stateful processor.
 
 initialState - User provided initial state that will be used to initiate state for the query in the first batch.
 
 outputEncoder - Encoder for the output type.
 
 initialStateEncoder - Encoder for the initial state type.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 evidence$24 - (undocumented)
 
 evidence$25 - (undocumented)
 
 Returns:
 
 (undocumented)
- transformWithState
 
 public <U, S> Dataset transformWithState(StatefulProcessorWithInitialState<K,V,U,S> statefulProcessor, OutputMode outputMode, KeyValueGroupedDataset<K,S> initialState, String eventTimeColumnName, Encoder outputEncoder, Encoder<S> initialStateEncoder, Encoder evidence$26, Encoder<S> evidence$27)
 
 (Java-specific) Invokes methods defined in the stateful processor used in arbitrary state API v2. Functions as the function above, but with additional eventTimeColumnName for output.
 Downstream operators would use specified eventTimeColumnName to calculate watermark. Note that TimeMode is set to EventTime to ensure correct flow of watermark.
 
 Parameters:
 
 statefulProcessor - Instance of statefulProcessor whose functions will be invoked by the operator.
 
 outputMode - The output mode of the stateful processor.
 
 initialState - User provided initial state that will be used to initiate state for the query in the first batch.
 
 eventTimeColumnName - event column in the output dataset. Any operations after transformWithState will use the new eventTimeColumn. The user needs to ensure that the eventTime for emitted output adheres to the watermark boundary, otherwise streaming query will fail.
 
 outputEncoder - Encoder for the output type.
 
 initialStateEncoder - Encoder for the initial state type.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 evidence$26 - (undocumented)
 
 evidence$27 - (undocumented)
 
 Returns:
 
 (undocumented)

Class KeyValueGroupedDataset<K,V>

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

KeyValueGroupedDataset

Method Details

agg

agg

agg

agg

agg

agg

agg

agg

cogroup

cogroup

cogroupSorted

cogroupSorted

count

flatMapGroups

flatMapGroups

flatMapGroupsWithState

flatMapGroupsWithState

flatMapGroupsWithState

flatMapGroupsWithState

flatMapSortedGroups

flatMapSortedGroups

keyAs

keys

mapGroups

mapGroups

mapGroupsWithState

mapGroupsWithState

mapGroupsWithState

mapGroupsWithState

mapGroupsWithState

mapGroupsWithState

mapValues

mapValues

reduceGroups

reduceGroups

transformWithState

transformWithState

transformWithState

transformWithState

transformWithState

transformWithState

transformWithState

transformWithState