Class KeyValueGroupedDataset<K,V>
- All Implemented Interfaces:
Serializable
Dataset
has been logically grouped by a user specified grouping key. Users should not
construct a KeyValueGroupedDataset
directly, but should instead call groupByKey
on
an existing Dataset
.
- Since:
- 2.0.0
- See Also:
-
Method Summary
Modifier and TypeMethodDescriptionagg
(TypedColumn<V, U1> col1) Computes the given aggregation, returning aDataset
of tuples for each unique key and the result of computing this aggregation over all elements in the group.agg
(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2) Computes the given aggregations, returning aDataset
of tuples for each unique key and the result of computing these aggregations over all elements in the group.agg
(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3) Computes the given aggregations, returning aDataset
of tuples for each unique key and the result of computing these aggregations over all elements in the group.agg
(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4) Computes the given aggregations, returning aDataset
of tuples for each unique key and the result of computing these aggregations over all elements in the group.agg
(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5) Computes the given aggregations, returning aDataset
of tuples for each unique key and the result of computing these aggregations over all elements in the group.agg
(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6) Computes the given aggregations, returning aDataset
of tuples for each unique key and the result of computing these aggregations over all elements in the group.agg
(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6, TypedColumn<V, U7> col7) Computes the given aggregations, returning aDataset
of tuples for each unique key and the result of computing these aggregations over all elements in the group.agg
(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6, TypedColumn<V, U7> col7, TypedColumn<V, U8> col8) Computes the given aggregations, returning aDataset
of tuples for each unique key and the result of computing these aggregations over all elements in the group.<U,
R> Dataset<R> cogroup
(KeyValueGroupedDataset<K, U> other, CoGroupFunction<K, V, U, R> f, Encoder<R> encoder) <U,
R> Dataset<R> cogroup
(KeyValueGroupedDataset<K, U> other, scala.Function3<K, scala.collection.Iterator<V>, scala.collection.Iterator<U>, scala.collection.IterableOnce<R>> f, Encoder<R> evidence$30) <U,
R> Dataset<R> cogroupSorted
(KeyValueGroupedDataset<K, U> other, Column[] thisSortExprs, Column[] otherSortExprs, CoGroupFunction<K, V, U, R> f, Encoder<R> encoder) <U,
R> Dataset<R> cogroupSorted
(KeyValueGroupedDataset<K, U> other, scala.collection.immutable.Seq<Column> thisSortExprs, scala.collection.immutable.Seq<Column> otherSortExprs, scala.Function3<K, scala.collection.Iterator<V>, scala.collection.Iterator<U>, scala.collection.IterableOnce<R>> f, Encoder<R> evidence$21) count()
Returns aDataset
that contains a tuple with each key and the number of items present for that key.<U> Dataset<U>
flatMapGroups
(FlatMapGroupsFunction<K, V, U> f, Encoder<U> encoder) (Java-specific) Applies the given function to each group of data.<U> Dataset<U>
flatMapGroups
(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>> f, Encoder<U> evidence$22) (Scala-specific) Applies the given function to each group of data.<S,
U> Dataset<U> flatMapGroupsWithState
(FlatMapGroupsWithStateFunction<K, V, S, U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf) (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<S,
U> Dataset<U> flatMapGroupsWithState
(FlatMapGroupsWithStateFunction<K, V, S, U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState) <S,
U> Dataset<U> flatMapGroupsWithState
(OutputMode outputMode, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, scala.collection.Iterator<U>> func, Encoder<S> evidence$12, Encoder<U> evidence$13) <S,
U> Dataset<U> flatMapGroupsWithState
(OutputMode outputMode, GroupStateTimeout timeoutConf, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, scala.collection.Iterator<U>> func, Encoder<S> evidence$10, Encoder<U> evidence$11) (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<U> Dataset<U>
flatMapSortedGroups
(Column[] SortExprs, FlatMapGroupsFunction<K, V, U> f, Encoder<U> encoder) (Java-specific) Applies the given function to each group of data.<U> Dataset<U>
flatMapSortedGroups
(scala.collection.immutable.Seq<Column> sortExprs, scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>> f, Encoder<U> evidence$3) (Scala-specific) Applies the given function to each group of data.<L> KeyValueGroupedDataset<L,
V> Returns a newKeyValueGroupedDataset
where the type of the key has been mapped to the specified type.keys()
Returns aDataset
that contains each unique key.<U> Dataset<U>
mapGroups
(MapGroupsFunction<K, V, U> f, Encoder<U> encoder) (Java-specific) Applies the given function to each group of data.<U> Dataset<U>
(Scala-specific) Applies the given function to each group of data.<S,
U> Dataset<U> mapGroupsWithState
(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder) (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<S,
U> Dataset<U> mapGroupsWithState
(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf) (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<S,
U> Dataset<U> mapGroupsWithState
(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState) <S,
U> Dataset<U> mapGroupsWithState
(GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$8, Encoder<U> evidence$9) <S,
U> Dataset<U> mapGroupsWithState
(GroupStateTimeout timeoutConf, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$6, Encoder<U> evidence$7) (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<S,
U> Dataset<U> mapGroupsWithState
(scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$4, Encoder<U> evidence$5) (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<W> KeyValueGroupedDataset<K,
W> mapValues
(MapFunction<V, W> func, Encoder<W> encoder) Returns a newKeyValueGroupedDataset
where the given functionfunc
has been applied to the data.<W> KeyValueGroupedDataset<K,
W> Returns a newKeyValueGroupedDataset
where the given functionfunc
has been applied to the data.org.apache.spark.sql.execution.QueryExecution
(Java-specific) Reduces the elements of each group of data using the specified binary function.reduceGroups
(scala.Function2<V, V, V> f) (Scala-specific) Reduces the elements of each group of data using the specified binary function.toString()
Methods inherited from class org.apache.spark.sql.api.KeyValueGroupedDataset
cogroup, cogroup, cogroupSorted, cogroupSorted, flatMapGroupsWithState, flatMapGroupsWithState, mapGroupsWithState, mapGroupsWithState
-
Method Details
-
agg
Description copied from class:KeyValueGroupedDataset
Computes the given aggregation, returning aDataset
of tuples for each unique key and the result of computing this aggregation over all elements in the group.- Overrides:
agg
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
col1
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
agg
Description copied from class:KeyValueGroupedDataset
Computes the given aggregations, returning aDataset
of tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
agg
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
col1
- (undocumented)col2
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
agg
public <U1,U2, Dataset<scala.Tuple4<K,U3> U1, aggU2, U3>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3) Description copied from class:KeyValueGroupedDataset
Computes the given aggregations, returning aDataset
of tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
agg
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
col1
- (undocumented)col2
- (undocumented)col3
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
agg
public <U1,U2, Dataset<scala.Tuple5<K,U3, U4> U1, aggU2, U3, U4>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4) Description copied from class:KeyValueGroupedDataset
Computes the given aggregations, returning aDataset
of tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
agg
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
col1
- (undocumented)col2
- (undocumented)col3
- (undocumented)col4
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
agg
public <U1,U2, Dataset<scala.Tuple6<K,U3, U4, U5> U1, aggU2, U3, U4, U5>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5) Description copied from class:KeyValueGroupedDataset
Computes the given aggregations, returning aDataset
of tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
agg
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
col1
- (undocumented)col2
- (undocumented)col3
- (undocumented)col4
- (undocumented)col5
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
agg
public <U1,U2, Dataset<scala.Tuple7<K,U3, U4, U5, U6> U1, aggU2, U3, U4, U5, U6>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6) Description copied from class:KeyValueGroupedDataset
Computes the given aggregations, returning aDataset
of tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
agg
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
col1
- (undocumented)col2
- (undocumented)col3
- (undocumented)col4
- (undocumented)col5
- (undocumented)col6
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
agg
public <U1,U2, Dataset<scala.Tuple8<K,U3, U4, U5, U6, U7> U1, aggU2, U3, U4, U5, U6, U7>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6, TypedColumn<V, U7> col7) Description copied from class:KeyValueGroupedDataset
Computes the given aggregations, returning aDataset
of tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
agg
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
col1
- (undocumented)col2
- (undocumented)col3
- (undocumented)col4
- (undocumented)col5
- (undocumented)col6
- (undocumented)col7
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
agg
public <U1,U2, Dataset<scala.Tuple9<K,U3, U4, U5, U6, U7, U8> U1, aggU2, U3, U4, U5, U6, U7, U8>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6, TypedColumn<V, U7> col7, TypedColumn<V, U8> col8) Description copied from class:KeyValueGroupedDataset
Computes the given aggregations, returning aDataset
of tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
agg
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
col1
- (undocumented)col2
- (undocumented)col3
- (undocumented)col4
- (undocumented)col5
- (undocumented)col6
- (undocumented)col7
- (undocumented)col8
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
cogroup
public <U,R> Dataset<R> cogroup(KeyValueGroupedDataset<K, U> other, scala.Function3<K, scala.collection.Iterator<V>, scala.collection.Iterator<U>, scala.collection.IterableOnce<R>> f, Encoder<R> evidence$30) - Inheritdoc:
-
cogroup
public <U,R> Dataset<R> cogroup(KeyValueGroupedDataset<K, U> other, CoGroupFunction<K, V, U, R> f, Encoder<R> encoder) - Inheritdoc:
-
cogroupSorted
public <U,R> Dataset<R> cogroupSorted(KeyValueGroupedDataset<K, U> other, scala.collection.immutable.Seq<Column> thisSortExprs, scala.collection.immutable.Seq<Column> otherSortExprs, scala.Function3<K, scala.collection.Iterator<V>, scala.collection.Iterator<U>, scala.collection.IterableOnce<R>> f, Encoder<R> evidence$21) - Inheritdoc:
-
cogroupSorted
public <U,R> Dataset<R> cogroupSorted(KeyValueGroupedDataset<K, U> other, Column[] thisSortExprs, Column[] otherSortExprs, CoGroupFunction<K, V, U, R> f, Encoder<R> encoder) - Inheritdoc:
-
count
Description copied from class:KeyValueGroupedDataset
Returns aDataset
that contains a tuple with each key and the number of items present for that key.- Overrides:
count
in classKeyValueGroupedDataset<K,
V, Dataset> - Returns:
- (undocumented)
- Inheritdoc:
-
flatMapGroups
public <U> Dataset<U> flatMapGroups(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>> f, Encoder<U> evidence$22) Description copied from class:KeyValueGroupedDataset
(Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a newDataset
.This function does not support partial aggregation, and as a result requires shuffling all the data in the
Dataset
. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator
.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList
) unless they are sure that this is possible given the memory constraints of their cluster.- Overrides:
flatMapGroups
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
f
- (undocumented)evidence$22
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
flatMapGroups
Description copied from class:KeyValueGroupedDataset
(Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a newDataset
.This function does not support partial aggregation, and as a result requires shuffling all the data in the
Dataset
. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator
.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList
) unless they are sure that this is possible given the memory constraints of their cluster.- Overrides:
flatMapGroups
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
f
- (undocumented)encoder
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
flatMapGroupsWithState
public <S,U> Dataset<U> flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, scala.collection.Iterator<U>> func, Encoder<S> evidence$10, Encoder<U> evidence$11) Description copied from class:KeyValueGroupedDataset
(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupState
for more details.- Specified by:
flatMapGroupsWithState
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
outputMode
- The output mode of the function.timeoutConf
- Timeout configuration for groups that do not receive data for a while.See
Encoder
for more details on what types are encodable to Spark SQL.func
- Function to be called on every group.evidence$10
- (undocumented)evidence$11
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
flatMapGroupsWithState
public <S,U> Dataset<U> flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, scala.collection.Iterator<U>> func, Encoder<S> evidence$12, Encoder<U> evidence$13) - Inheritdoc:
-
flatMapGroupsWithState
public <S,U> Dataset<U> flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K, V, S, U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf) Description copied from class:KeyValueGroupedDataset
(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupState
for more details.- Overrides:
flatMapGroupsWithState
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
func
- Function to be called on every group.outputMode
- The output mode of the function.stateEncoder
- Encoder for the state type.outputEncoder
- Encoder for the output type.timeoutConf
- Timeout configuration for groups that do not receive data for a while.See
Encoder
for more details on what types are encodable to Spark SQL.- Returns:
- (undocumented)
- Inheritdoc:
-
flatMapGroupsWithState
public <S,U> Dataset<U> flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K, V, S, U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState) - Inheritdoc:
-
flatMapSortedGroups
public <U> Dataset<U> flatMapSortedGroups(scala.collection.immutable.Seq<Column> sortExprs, scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>> f, Encoder<U> evidence$3) Description copied from class:KeyValueGroupedDataset
(Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and a sorted iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a newDataset
.This function does not support partial aggregation, and as a result requires shuffling all the data in the
Dataset
. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator
.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList
) unless they are sure that this is possible given the memory constraints of their cluster.This is equivalent to
KeyValueGroupedDataset.flatMapGroups(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>>, org.apache.spark.sql.Encoder<U>)
, except for the iterator to be sorted according to the given sort expressions. That sorting does not add computational complexity.- Specified by:
flatMapSortedGroups
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
sortExprs
- (undocumented)f
- (undocumented)evidence$3
- (undocumented)- Returns:
- (undocumented)
- See Also:
-
org.apache.spark.sql.api.KeyValueGroupedDataset#flatMapGroups
- Inheritdoc:
-
flatMapSortedGroups
public <U> Dataset<U> flatMapSortedGroups(Column[] SortExprs, FlatMapGroupsFunction<K, V, U> f, Encoder<U> encoder) Description copied from class:KeyValueGroupedDataset
(Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and a sorted iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a newDataset
.This function does not support partial aggregation, and as a result requires shuffling all the data in the
Dataset
. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator
.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList
) unless they are sure that this is possible given the memory constraints of their cluster.This is equivalent to
KeyValueGroupedDataset.flatMapGroups(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>>, org.apache.spark.sql.Encoder<U>)
, except for the iterator to be sorted according to the given sort expressions. That sorting does not add computational complexity.- Overrides:
flatMapSortedGroups
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
SortExprs
- (undocumented)f
- (undocumented)encoder
- (undocumented)- Returns:
- (undocumented)
- See Also:
-
org.apache.spark.sql.api.KeyValueGroupedDataset#flatMapGroups
- Inheritdoc:
-
keyAs
Description copied from class:KeyValueGroupedDataset
Returns a newKeyValueGroupedDataset
where the type of the key has been mapped to the specified type. The mapping of key columns to the type follows the same rules asas
onDataset
.- Specified by:
keyAs
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
evidence$1
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
keys
Description copied from class:KeyValueGroupedDataset
Returns aDataset
that contains each unique key. This is equivalent to doing mapping over the Dataset to extract the keys and then running a distinct operation on those.- Specified by:
keys
in classKeyValueGroupedDataset<K,
V, Dataset> - Returns:
- (undocumented)
- Inheritdoc:
-
mapGroups
public <U> Dataset<U> mapGroups(scala.Function2<K, scala.collection.Iterator<V>, U> f, Encoder<U> evidence$23) Description copied from class:KeyValueGroupedDataset
(Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a newDataset
.This function does not support partial aggregation, and as a result requires shuffling all the data in the
Dataset
. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator
.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList
) unless they are sure that this is possible given the memory constraints of their cluster.- Overrides:
mapGroups
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
f
- (undocumented)evidence$23
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
mapGroups
Description copied from class:KeyValueGroupedDataset
(Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a newDataset
.This function does not support partial aggregation, and as a result requires shuffling all the data in the
Dataset
. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator
.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList
) unless they are sure that this is possible given the memory constraints of their cluster.- Overrides:
mapGroups
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
f
- (undocumented)encoder
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
mapGroupsWithState
public <S,U> Dataset<U> mapGroupsWithState(scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$4, Encoder<U> evidence$5) Description copied from class:KeyValueGroupedDataset
(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupState
for more details.- Specified by:
mapGroupsWithState
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
func
- Function to be called on every group.See
Encoder
for more details on what types are encodable to Spark SQL.evidence$4
- (undocumented)evidence$5
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
mapGroupsWithState
public <S,U> Dataset<U> mapGroupsWithState(GroupStateTimeout timeoutConf, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$6, Encoder<U> evidence$7) Description copied from class:KeyValueGroupedDataset
(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupState
for more details.- Specified by:
mapGroupsWithState
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
timeoutConf
- Timeout configuration for groups that do not receive data for a while.See
Encoder
for more details on what types are encodable to Spark SQL.func
- Function to be called on every group.evidence$6
- (undocumented)evidence$7
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
mapGroupsWithState
public <S,U> Dataset<U> mapGroupsWithState(GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$8, Encoder<U> evidence$9) - Inheritdoc:
-
mapGroupsWithState
public <S,U> Dataset<U> mapGroupsWithState(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder) Description copied from class:KeyValueGroupedDataset
(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupState
for more details.- Overrides:
mapGroupsWithState
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
func
- Function to be called on every group.stateEncoder
- Encoder for the state type.outputEncoder
- Encoder for the output type.See
Encoder
for more details on what types are encodable to Spark SQL.- Returns:
- (undocumented)
- Inheritdoc:
-
mapGroupsWithState
public <S,U> Dataset<U> mapGroupsWithState(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf) Description copied from class:KeyValueGroupedDataset
(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupState
for more details.- Overrides:
mapGroupsWithState
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
func
- Function to be called on every group.stateEncoder
- Encoder for the state type.outputEncoder
- Encoder for the output type.timeoutConf
- Timeout configuration for groups that do not receive data for a while.See
Encoder
for more details on what types are encodable to Spark SQL.- Returns:
- (undocumented)
- Inheritdoc:
-
mapGroupsWithState
public <S,U> Dataset<U> mapGroupsWithState(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState) - Inheritdoc:
-
mapValues
Description copied from class:KeyValueGroupedDataset
Returns a newKeyValueGroupedDataset
where the given functionfunc
has been applied to the data. The grouping key is unchanged by this.// Create values grouped by key from a Dataset[(K, V)] ds.groupByKey(_._1).mapValues(_._2) // Scala
- Specified by:
mapValues
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
func
- (undocumented)evidence$2
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
mapValues
Description copied from class:KeyValueGroupedDataset
Returns a newKeyValueGroupedDataset
where the given functionfunc
has been applied to the data. The grouping key is unchanged by this.// Create Integer values grouped by String key from a Dataset<Tuple2<String, Integer>> Dataset<Tuple2<String, Integer>> ds = ...; KeyValueGroupedDataset<String, Integer> grouped = ds.groupByKey(t -> t._1, Encoders.STRING()).mapValues(t -> t._2, Encoders.INT());
- Overrides:
mapValues
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
func
- (undocumented)encoder
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
queryExecution
public org.apache.spark.sql.execution.QueryExecution queryExecution() -
reduceGroups
Description copied from class:KeyValueGroupedDataset
(Scala-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.- Specified by:
reduceGroups
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
f
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
reduceGroups
Description copied from class:KeyValueGroupedDataset
(Java-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.- Overrides:
reduceGroups
in classKeyValueGroupedDataset<K,
V, Dataset> - Parameters:
f
- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
toString
-