agg {SparkR}R Documentation

Summarize data across columns

Description

Compute aggregates by specifying a list of columns

Aggregates on the entire SparkDataFrame without groups. The resulting SparkDataFrame will also contain the grouping columns.

Usage

## S4 method for signature 'SparkDataFrame'
agg(x, ...)

## S4 method for signature 'SparkDataFrame'
summarize(x, ...)

agg(x, ...)

summarize(x, ...)

## S4 method for signature 'GroupedData'
agg(x, ...)

## S4 method for signature 'GroupedData'
summarize(x, ...)

Arguments

x

a SparkDataFrame or GroupedData.

...

further arguments to be passed to or from other methods.

Details

df2 <- agg(df, <column> = <aggFunction>) df2 <- agg(df, newColName = aggFunction(column))

Value

A SparkDataFrame.

Note

agg since 1.4.0

summarize since 1.4.0

agg since 1.4.0

summarize since 1.4.0

See Also

Other SparkDataFrame functions: $, $,SparkDataFrame-method, $<-, $<-,SparkDataFrame-method, select, select, select,SparkDataFrame,Column-method, select,SparkDataFrame,character-method, select,SparkDataFrame,list-method; SparkDataFrame-class; [, [,SparkDataFrame-method, [[, [[,SparkDataFrame,numericOrcharacter-method, subset, subset, subset,SparkDataFrame-method; arrange, arrange, arrange, arrange,SparkDataFrame,Column-method, arrange,SparkDataFrame,character-method, orderBy,SparkDataFrame,characterOrColumn-method; as.data.frame, as.data.frame,SparkDataFrame-method; attach, attach,SparkDataFrame-method; cache, cache, cache,SparkDataFrame-method; collect, collect, collect,SparkDataFrame-method; colnames, colnames, colnames,SparkDataFrame-method, colnames<-, colnames<-, colnames<-,SparkDataFrame-method, columns, columns, columns,SparkDataFrame-method, names, names,SparkDataFrame-method, names<-, names<-,SparkDataFrame-method; coltypes, coltypes, coltypes,SparkDataFrame-method, coltypes<-, coltypes<-, coltypes<-,SparkDataFrame,character-method; count,SparkDataFrame-method, nrow, nrow, nrow,SparkDataFrame-method; createOrReplaceTempView, createOrReplaceTempView, createOrReplaceTempView,SparkDataFrame,character-method; crossJoin, crossJoin,SparkDataFrame,SparkDataFrame-method; dapplyCollect, dapplyCollect, dapplyCollect,SparkDataFrame,function-method; dapply, dapply, dapply,SparkDataFrame,function,structType-method; describe, describe, describe, describe,SparkDataFrame,ANY-method, describe,SparkDataFrame,character-method, describe,SparkDataFrame-method, summary, summary, summary,SparkDataFrame-method; dim, dim,SparkDataFrame-method; distinct, distinct, distinct,SparkDataFrame-method, unique, unique,SparkDataFrame-method; dropDuplicates, dropDuplicates, dropDuplicates,SparkDataFrame-method; dropna, dropna, dropna,SparkDataFrame-method, fillna, fillna, fillna,SparkDataFrame-method, na.omit, na.omit, na.omit,SparkDataFrame-method; drop, drop, drop, drop,ANY-method, drop,SparkDataFrame-method; dtypes, dtypes, dtypes,SparkDataFrame-method; except, except, except,SparkDataFrame,SparkDataFrame-method; explain, explain, explain,SparkDataFrame-method; filter, filter, filter,SparkDataFrame,characterOrColumn-method, where, where, where,SparkDataFrame,characterOrColumn-method; first, first, first, first,SparkDataFrame-method, first,characterOrColumn-method; gapplyCollect, gapplyCollect, gapplyCollect, gapplyCollect,GroupedData-method, gapplyCollect,SparkDataFrame-method; gapply, gapply, gapply, gapply,GroupedData-method, gapply,SparkDataFrame-method; groupBy, groupBy, groupBy,SparkDataFrame-method, group_by, group_by, group_by,SparkDataFrame-method; head, head,SparkDataFrame-method; histogram, histogram,SparkDataFrame,characterOrColumn-method; insertInto, insertInto, insertInto,SparkDataFrame,character-method; intersect, intersect, intersect,SparkDataFrame,SparkDataFrame-method; isLocal, isLocal, isLocal,SparkDataFrame-method; join, join,SparkDataFrame,SparkDataFrame-method; limit, limit, limit,SparkDataFrame,numeric-method; merge, merge, merge,SparkDataFrame,SparkDataFrame-method; mutate, mutate, mutate,SparkDataFrame-method, transform, transform, transform,SparkDataFrame-method; ncol, ncol,SparkDataFrame-method; persist, persist, persist,SparkDataFrame,character-method; printSchema, printSchema, printSchema,SparkDataFrame-method; randomSplit, randomSplit, randomSplit,SparkDataFrame,numeric-method; rbind, rbind, rbind,SparkDataFrame-method; registerTempTable, registerTempTable, registerTempTable,SparkDataFrame,character-method; rename, rename, rename,SparkDataFrame-method, withColumnRenamed, withColumnRenamed, withColumnRenamed,SparkDataFrame,character,character-method; repartition, repartition, repartition,SparkDataFrame-method; sample, sample, sample,SparkDataFrame,logical,numeric-method, sample_frac, sample_frac, sample_frac,SparkDataFrame,logical,numeric-method; saveAsParquetFile, saveAsParquetFile, saveAsParquetFile,SparkDataFrame,character-method, write.parquet, write.parquet, write.parquet,SparkDataFrame,character-method; saveAsTable, saveAsTable, saveAsTable,SparkDataFrame,character-method; saveDF, saveDF, saveDF,SparkDataFrame,character-method, write.df, write.df, write.df, write.df,SparkDataFrame-method; schema, schema, schema,SparkDataFrame-method; selectExpr, selectExpr, selectExpr,SparkDataFrame,character-method; showDF, showDF, showDF,SparkDataFrame-method; show, show, show,Column-method, show,GroupedData-method, show,SparkDataFrame-method, show,WindowSpec-method; storageLevel, storageLevel,SparkDataFrame-method; str, str,SparkDataFrame-method; take, take, take,SparkDataFrame,numeric-method; union, union, union,SparkDataFrame,SparkDataFrame-method, unionAll, unionAll, unionAll,SparkDataFrame,SparkDataFrame-method; unpersist, unpersist, unpersist,SparkDataFrame-method; withColumn, withColumn, withColumn,SparkDataFrame,character,Column-method; with, with,SparkDataFrame-method; write.jdbc, write.jdbc, write.jdbc,SparkDataFrame,character,character-method; write.json, write.json, write.json,SparkDataFrame,character-method; write.orc, write.orc, write.orc,SparkDataFrame,character-method; write.text, write.text, write.text,SparkDataFrame,character-method

Other agg_funcs: avg, avg, avg,Column-method; countDistinct, countDistinct, countDistinct,Column-method, n_distinct, n_distinct, n_distinct,Column-method; count, count, count,Column-method, count,GroupedData-method, n, n, n,Column-method; first, first, first, first,SparkDataFrame-method, first,characterOrColumn-method; kurtosis, kurtosis, kurtosis,Column-method; last, last, last,characterOrColumn-method; max, max,Column-method; mean, mean,Column-method; min, min,Column-method; sd, sd, sd,Column-method, stddev, stddev, stddev,Column-method; skewness, skewness, skewness,Column-method; stddev_pop, stddev_pop, stddev_pop,Column-method; stddev_samp, stddev_samp, stddev_samp,Column-method; sumDistinct, sumDistinct, sumDistinct,Column-method; sum, sum,Column-method; var_pop, var_pop, var_pop,Column-method; var_samp, var_samp, var_samp,Column-method; var, var, var,Column-method, variance, variance, variance,Column-method

Examples

## Not run: 
##D  df2 <- agg(df, age = "sum")  # new column name will be created as 'SUM(age#0)'
##D  df3 <- agg(df, ageSum = sum(df$age)) # Creates a new column named ageSum
##D  df4 <- summarize(df, ageSum = max(df$age))
## End(Not run)

[Package SparkR version 2.1.0 Index]