DataFrame

Instance Constructors

new DataFrame(sqlContext: SQLContext, logicalPlan: LogicalPlan)

A constructor that automatically analyzes the logical plan.
A constructor that automatically analyzes the logical plan.
This reports error eagerly as the DataFrame is constructed, unless SQLConf.dataFrameEagerAnalysis is turned off.

Value Members

final def !=(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def !=(arg0: Any): Boolean

Definition Classes
Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def ==(arg0: Any): Boolean

Definition Classes
Any
def agg(expr: Column, exprs: Column*): DataFrame

Aggregates on the entire DataFrame without groups.
Aggregates on the entire DataFrame without groups. {{ // df.agg(...) is a shorthand for df.groupBy().agg(...) df.agg(max($"age"), avg($"salary")) df.groupBy().agg(max($"age"), avg($"salary")) }}

Annotations
@varargs()
def agg(exprs: Map[String, String]): DataFrame

(Java-specific) Aggregates on the entire DataFrame without groups.
(Java-specific) Aggregates on the entire DataFrame without groups. {{ // df.agg(...) is a shorthand for df.groupBy().agg(...) df.agg(Map("age" -> "max", "salary" -> "avg")) df.groupBy().agg(Map("age" -> "max", "salary" -> "avg")) }}
def agg(exprs: Map[String, String]): DataFrame

(Scala-specific) Aggregates on the entire DataFrame without groups.
(Scala-specific) Aggregates on the entire DataFrame without groups. {{ // df.agg(...) is a shorthand for df.groupBy().agg(...) df.agg(Map("age" -> "max", "salary" -> "avg")) df.groupBy().agg(Map("age" -> "max", "salary" -> "avg")) }}
def agg(aggExpr: (String, String), aggExprs: (String, String)*): DataFrame

(Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods.
(Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods. The resulting DataFrame will also contain the grouping columns.
The available aggregate methods are avg, max, min, sum, count.
```
// Selects the age of the oldest employee and the aggregate expense for each department
df.groupBy("department").agg(
  "age" -> "max",
  "expense" -> "sum"
)
```
def apply(colName: String): Column

Selects column based on the column name and return it as a Column.
def as(alias: Symbol): DataFrame

(Scala-specific) Returns a new DataFrame with an alias set.
def as(alias: String): DataFrame

Returns a new DataFrame with an alias set.
final def asInstanceOf[T0]: T0

Definition Classes
Any
def cache(): DataFrame.this.type

Definition Classes
DataFrame → RDDApi
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def col(colName: String): Column

Selects column based on the column name and return it as a Column.
def collect(): Array[Row]

Returns an array that contains all of Rows in this DataFrame.
Returns an array that contains all of Rows in this DataFrame.

Definition Classes
DataFrame → RDDApi
def collectAsList(): List[Row]

Returns a Java list that contains all of Rows in this DataFrame.
Returns a Java list that contains all of Rows in this DataFrame.

Definition Classes
DataFrame → RDDApi
def columns: Array[String]

Returns all column names as an array.
def count(): Long

Returns the number of rows in the DataFrame.
Returns the number of rows in the DataFrame.

Definition Classes
DataFrame → RDDApi
def createJDBCTable(url: String, table: String, allowExisting: Boolean): Unit

Save this DataFrame to a JDBC database at url under the table name table.
Save this DataFrame to a JDBC database at url under the table name table. This will run a CREATE TABLE and a bunch of INSERT INTO statements. If you pass true for allowExisting, it will drop any table with the given name; if you pass false, it will throw if the table already exists.
def describe(cols: String*): DataFrame

Computes statistics for numeric columns, including count, mean, stddev, min, and max.
Computes statistics for numeric columns, including count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical columns.
This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame. If you want to programmatically compute summary statistics, use the agg function instead.
```
df.describe("age", "height").show()

// output:
// summary age   height
// count   10.0  10.0
// mean    53.3  178.05
// stddev  11.6  15.7
// min     18.0  163.0
// max     92.0  192.0
```
Annotations
@varargs()
def distinct: DataFrame

Returns a new DataFrame that contains only the unique rows from this DataFrame.
Returns a new DataFrame that contains only the unique rows from this DataFrame.

Definition Classes
DataFrame → RDDApi
def dtypes: Array[(String, String)]

Returns all column names and their data types as an array.
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def except(other: DataFrame): DataFrame

Returns a new DataFrame containing rows in this frame but not in another frame.
Returns a new DataFrame containing rows in this frame but not in another frame. This is equivalent to EXCEPT in SQL.
def explain(): Unit

Only prints the physical plan to the console for debugging purposes.
def explain(extended: Boolean): Unit

Prints the plans (logical and physical) to the console for debugging purposes.
def explode[A, B](inputColumn: String, outputColumn: String)(f: (A) ⇒ TraversableOnce[B])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[B]): DataFrame

(Scala-specific) Returns a new DataFrame where a single column has been expanded to zero or more rows by the provided function.
(Scala-specific) Returns a new DataFrame where a single column has been expanded to zero or more rows by the provided function. This is similar to a LATERAL VIEW in HiveQL. All columns of the input row are implicitly joined with each value that is output by the function.
```
df.explode("words", "word")(words: String => words.split(" "))
```
def explode[A <: Product](input: Column*)(f: (Row) ⇒ TraversableOnce[A])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A]): DataFrame

(Scala-specific) Returns a new DataFrame where each row has been expanded to zero or more rows by the provided function.
(Scala-specific) Returns a new DataFrame where each row has been expanded to zero or more rows by the provided function. This is similar to a LATERAL VIEW in HiveQL. The columns of the input row are implicitly joined with each row that is output by the function.
The following example uses this function to count the number of books which contain a given word:
```
case class Book(title: String, words: String)
val df: RDD[Book]

case class Word(word: String)
val allWords = df.explode('words) {
  case Row(words: String) => words.split(" ").map(Word(_))
}

val bookCountPerWord = allWords.groupBy("word").agg(countDistinct("title"))
```
def filter(conditionExpr: String): DataFrame

Filters rows using the given SQL expression.
Filters rows using the given SQL expression.
```
peopleDf.filter("age > 15")
```

def filter(condition: Column): DataFrame

Filters rows using the given condition.

// The following are equivalent:
peopleDf.filter($"age" > 15)
peopleDf.where($"age" > 15)
peopleDf($"age" > 15)

def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def first(): Row

Returns the first row.
Returns the first row. Alias for head().

Definition Classes
DataFrame → RDDApi
def flatMap[R](f: (Row) ⇒ TraversableOnce[R])(implicit arg0: ClassTag[R]): RDD[R]

Returns a new RDD by first applying a function to all rows of this DataFrame, and then flattening the results.
Returns a new RDD by first applying a function to all rows of this DataFrame, and then flattening the results.

Definition Classes
DataFrame → RDDApi
def foreach(f: (Row) ⇒ Unit): Unit

Applies a function f to all rows.
Applies a function f to all rows.

Definition Classes
DataFrame → RDDApi
def foreachPartition(f: (Iterator[Row]) ⇒ Unit): Unit

Applies a function f to each partition of this DataFrame.
Applies a function f to each partition of this DataFrame.

Definition Classes
DataFrame → RDDApi
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def groupBy(col1: String, cols: String*): GroupedData

Groups the DataFrame using the specified columns, so we can run aggregation on them.
Groups the DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
This is a variant of groupBy that can only group by existing columns using column names (i.e. cannot construct expressions).
```
// Compute the average for all numeric columns grouped by department.
df.groupBy("department").avg()

// Compute the max age and average salary, grouped by department and gender.
df.groupBy($"department", $"gender").agg(Map(
  "salary" -> "avg",
  "age" -> "max"
))
```
Annotations
@varargs()

def groupBy(cols: Column*): GroupedData

Groups the DataFrame using the specified columns, so we can run aggregation on them.

Groups the DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.

// Compute the average for all numeric columns grouped by department.
df.groupBy($"department").avg()

// Compute the max age and average salary, grouped by department and gender.
df.groupBy($"department", $"gender").agg(Map(
  "salary" -> "avg",
  "age" -> "max"
))

Annotations: @varargs()

def hashCode(): Int

Definition Classes
AnyRef → Any
def head(): Row

Returns the first row.
def head(n: Int): Array[Row]

Returns the first n rows.
def insertInto(tableName: String): Unit

:: Experimental :: Adds the rows from this RDD to the specified table.
:: Experimental :: Adds the rows from this RDD to the specified table. Throws an exception if the table already exists.

Annotations
@Experimental()
def insertInto(tableName: String, overwrite: Boolean): Unit

:: Experimental :: Adds the rows from this RDD to the specified table, optionally overwriting the existing data.
:: Experimental :: Adds the rows from this RDD to the specified table, optionally overwriting the existing data.

Annotations
@Experimental()
def insertIntoJDBC(url: String, table: String, overwrite: Boolean): Unit

Save this DataFrame to a JDBC database at url under the table name table.
Save this DataFrame to a JDBC database at url under the table name table. Assumes the table already exists and has a compatible schema. If you pass true for overwrite, it will TRUNCATE the table before performing the INSERTs.
The table must already exist on the database. It must have a schema that is compatible with the schema of this RDD; inserting the rows of the RDD in order via the simple statement INSERT INTO table VALUES (?, ?, ..., ?) should not fail.
def intersect(other: DataFrame): DataFrame

Returns a new DataFrame containing rows only in both this frame and another frame.
Returns a new DataFrame containing rows only in both this frame and another frame. This is equivalent to INTERSECT in SQL.
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def isLocal: Boolean

Returns true if the collect and take methods can be run locally (without any Spark executors).
def javaRDD: JavaRDD[Row]

Returns the content of the DataFrame as a JavaRDD of Rows.
def javaToPython: JavaRDD[Array[Byte]]

Converts a JavaRDD to a PythonRDD.
Converts a JavaRDD to a PythonRDD.

Attributes
protected[org.apache.spark.sql]
def join(right: DataFrame, joinExprs: Column, joinType: String): DataFrame

Join with another DataFrame, using the given join expression.
Join with another DataFrame, using the given join expression. The following performs a full outer join between df1 and df2.
```
// Scala:
import org.apache.spark.sql.functions._
df1.join(df2, $"df1Key" === $"df2Key", "outer")

// Java:
import static org.apache.spark.sql.functions.*;
df1.join(df2, col("df1Key").equalTo(col("df2Key")), "outer");
```
right
Right side of the join.
joinExprs
Join expression.
joinType
One of: inner, outer, left_outer, right_outer, semijoin.
def join(right: DataFrame, joinExprs: Column): DataFrame

Inner join with another DataFrame, using the given join expression.
Inner join with another DataFrame, using the given join expression.
```
// The following two are equivalent:
df1.join(df2, $"df1Key" === $"df2Key")
df1.join(df2).where($"df1Key" === $"df2Key")
```
def join(right: DataFrame): DataFrame

Cartesian join with another DataFrame.
Cartesian join with another DataFrame.
Note that cartesian joins are very expensive without an extra filter that can be pushed down.
right
Right side of the join operation.
def limit(n: Int): DataFrame

Returns a new DataFrame by taking the first n rows.
Returns a new DataFrame by taking the first n rows. The difference between this function and head is that head returns an array while limit returns a new DataFrame.
val logicalPlan: LogicalPlan

Attributes
protected[org.apache.spark.sql]
def map[R](f: (Row) ⇒ R)(implicit arg0: ClassTag[R]): RDD[R]

Returns a new RDD by applying a function to all rows of this DataFrame.
Returns a new RDD by applying a function to all rows of this DataFrame.

Definition Classes
DataFrame → RDDApi
def mapPartitions[R](f: (Iterator[Row]) ⇒ Iterator[R])(implicit arg0: ClassTag[R]): RDD[R]

Returns a new RDD by applying a function to each partition of this DataFrame.
Returns a new RDD by applying a function to each partition of this DataFrame.

Definition Classes
DataFrame → RDDApi
def na: DataFrameNaFunctions

Returns a DataFrameNaFunctions for working with missing data.
Returns a DataFrameNaFunctions for working with missing data.
```
// Dropping rows containing any null values.
df.na.drop()
```
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def numericColumns: Seq[Expression]

Attributes
protected[org.apache.spark.sql]
def orderBy(sortExprs: Column*): DataFrame

Returns a new DataFrame sorted by the given expressions.
Returns a new DataFrame sorted by the given expressions. This is an alias of the sort function.

Annotations
@varargs()
def orderBy(sortCol: String, sortCols: String*): DataFrame

Returns a new DataFrame sorted by the given expressions.
Returns a new DataFrame sorted by the given expressions. This is an alias of the sort function.

Annotations
@varargs()
def persist(newLevel: StorageLevel): DataFrame.this.type

Definition Classes
DataFrame → RDDApi
def persist(): DataFrame.this.type

Definition Classes
DataFrame → RDDApi
def printSchema(): Unit

Prints the schema to the console in a nice tree format.
val queryExecution: QueryExecution
def rdd: RDD[Row]

Returns the content of the DataFrame as an RDD of Rows.
def registerTempTable(tableName: String): Unit

Registers this DataFrame as a temporary table using the given name.
Registers this DataFrame as a temporary table using the given name. The lifetime of this temporary table is tied to the SQLContext that was used to create this DataFrame.
def repartition(numPartitions: Int): DataFrame

Returns a new DataFrame that has exactly numPartitions partitions.
Returns a new DataFrame that has exactly numPartitions partitions.

Definition Classes
DataFrame → RDDApi
def resolve(colName: String): NamedExpression

Attributes
protected[org.apache.spark.sql]
def sample(withReplacement: Boolean, fraction: Double): DataFrame

Returns a new DataFrame by sampling a fraction of rows, using a random seed.
Returns a new DataFrame by sampling a fraction of rows, using a random seed.
withReplacement
Sample with replacement or not.
fraction
Fraction of rows to generate.
def sample(withReplacement: Boolean, fraction: Double, seed: Long): DataFrame

Returns a new DataFrame by sampling a fraction of rows.
Returns a new DataFrame by sampling a fraction of rows.
withReplacement
Sample with replacement or not.
fraction
Fraction of rows to generate.
seed
Seed for sampling.
def save(source: String, mode: SaveMode, options: Map[String, String]): Unit

:: Experimental :: (Scala-specific) Saves the contents of this DataFrame based on the given data source, SaveMode specified by mode, and a set of options
:: Experimental :: (Scala-specific) Saves the contents of this DataFrame based on the given data source, SaveMode specified by mode, and a set of options

Annotations
@Experimental()
def save(source: String, mode: SaveMode, options: Map[String, String]): Unit

:: Experimental :: Saves the contents of this DataFrame based on the given data source, SaveMode specified by mode, and a set of options.
:: Experimental :: Saves the contents of this DataFrame based on the given data source, SaveMode specified by mode, and a set of options.

Annotations
@Experimental()
def save(path: String, source: String, mode: SaveMode): Unit

:: Experimental :: Saves the contents of this DataFrame to the given path based on the given data source and SaveMode specified by mode.
:: Experimental :: Saves the contents of this DataFrame to the given path based on the given data source and SaveMode specified by mode.

Annotations
@Experimental()
def save(path: String, source: String): Unit

:: Experimental :: Saves the contents of this DataFrame to the given path based on the given data source, using SaveMode.ErrorIfExists as the save mode.
:: Experimental :: Saves the contents of this DataFrame to the given path based on the given data source, using SaveMode.ErrorIfExists as the save mode.

Annotations
@Experimental()
def save(path: String, mode: SaveMode): Unit

:: Experimental :: Saves the contents of this DataFrame to the given path and SaveMode specified by mode, using the default data source configured by spark.
:: Experimental :: Saves the contents of this DataFrame to the given path and SaveMode specified by mode, using the default data source configured by spark.sql.sources.default.

Annotations
@Experimental()
def save(path: String): Unit

:: Experimental :: Saves the contents of this DataFrame to the given path, using the default data source configured by spark.
:: Experimental :: Saves the contents of this DataFrame to the given path, using the default data source configured by spark.sql.sources.default and SaveMode.ErrorIfExists as the save mode.

Annotations
@Experimental()
def saveAsParquetFile(path: String): Unit

Saves the contents of this DataFrame as a parquet file, preserving the schema.
Saves the contents of this DataFrame as a parquet file, preserving the schema. Files that are written out using this method can be read back in as a DataFrame using the parquetFile function in SQLContext.
def saveAsTable(tableName: String, source: String, mode: SaveMode, options: Map[String, String]): Unit

:: Experimental :: (Scala-specific) Creates a table from the the contents of this DataFrame based on a given data source, SaveMode specified by mode, and a set of options.
:: Experimental :: (Scala-specific) Creates a table from the the contents of this DataFrame based on a given data source, SaveMode specified by mode, and a set of options.
Note that this currently only works with DataFrames that are created from a HiveContext as there is no notion of a persisted catalog in a standard SQL context. Instead you can write an RDD out to a parquet file, and then register that file as a table. This "table" can then be the target of an insertInto.

Annotations
@Experimental()
def saveAsTable(tableName: String, source: String, mode: SaveMode, options: Map[String, String]): Unit

:: Experimental :: Creates a table at the given path from the the contents of this DataFrame based on a given data source, SaveMode specified by mode, and a set of options.
:: Experimental :: Creates a table at the given path from the the contents of this DataFrame based on a given data source, SaveMode specified by mode, and a set of options.
Note that this currently only works with DataFrames that are created from a HiveContext as there is no notion of a persisted catalog in a standard SQL context. Instead you can write an RDD out to a parquet file, and then register that file as a table. This "table" can then be the target of an insertInto.

Annotations
@Experimental()
def saveAsTable(tableName: String, source: String, mode: SaveMode): Unit

:: Experimental :: Creates a table at the given path from the the contents of this DataFrame based on a given data source, SaveMode specified by mode, and a set of options.
:: Experimental :: Creates a table at the given path from the the contents of this DataFrame based on a given data source, SaveMode specified by mode, and a set of options.
Note that this currently only works with DataFrames that are created from a HiveContext as there is no notion of a persisted catalog in a standard SQL context. Instead you can write an RDD out to a parquet file, and then register that file as a table. This "table" can then be the target of an insertInto.

Annotations
@Experimental()
def saveAsTable(tableName: String, source: String): Unit

:: Experimental :: Creates a table at the given path from the the contents of this DataFrame based on a given data source and a set of options, using SaveMode.ErrorIfExists as the save mode.
:: Experimental :: Creates a table at the given path from the the contents of this DataFrame based on a given data source and a set of options, using SaveMode.ErrorIfExists as the save mode.
Note that this currently only works with DataFrames that are created from a HiveContext as there is no notion of a persisted catalog in a standard SQL context. Instead you can write an RDD out to a parquet file, and then register that file as a table. This "table" can then be the target of an insertInto.

Annotations
@Experimental()
def saveAsTable(tableName: String, mode: SaveMode): Unit

:: Experimental :: Creates a table from the the contents of this DataFrame, using the default data source configured by spark.
:: Experimental :: Creates a table from the the contents of this DataFrame, using the default data source configured by spark.sql.sources.default and SaveMode.ErrorIfExists as the save mode.
Note that this currently only works with DataFrames that are created from a HiveContext as there is no notion of a persisted catalog in a standard SQL context. Instead you can write an RDD out to a parquet file, and then register that file as a table. This "table" can then be the target of an insertInto.

Annotations
@Experimental()
def saveAsTable(tableName: String): Unit

:: Experimental :: Creates a table from the the contents of this DataFrame.
:: Experimental :: Creates a table from the the contents of this DataFrame. It will use the default data source configured by spark.sql.sources.default. This will fail if the table already exists.
Note that this currently only works with DataFrames that are created from a HiveContext as there is no notion of a persisted catalog in a standard SQL context. Instead you can write an RDD out to a parquet file, and then register that file as a table. This "table" can then be the target of an insertInto.

Annotations
@Experimental()
def schema: StructType

Returns the schema of this DataFrame.
def select(col: String, cols: String*): DataFrame

Selects a set of columns.
Selects a set of columns. This is a variant of select that can only select existing columns using column names (i.e. cannot construct expressions).
```
// The following two are equivalent:
df.select("colA", "colB")
df.select($"colA", $"colB")
```
Annotations
@varargs()
def select(cols: Column*): DataFrame

Selects a set of expressions.
Selects a set of expressions.
```
df.select($"colA", $"colB" + 1)
```
Annotations
@varargs()
def selectExpr(exprs: String*): DataFrame

Selects a set of SQL expressions.
Selects a set of SQL expressions. This is a variant of select that accepts SQL expressions.
```
df.selectExpr("colA", "colB as newName", "abs(colC)")
```
Annotations
@varargs()
def show(): Unit

Displays the top 20 rows of DataFrame in a tabular form.

def show(numRows: Int): Unit

Displays the DataFrame in a tabular form.

Displays the DataFrame in a tabular form. For example:

year  month AVG('Adj Close) MAX('Adj Close)
1980  12    0.503218        0.595103
1981  01    0.523289        0.570307
1982  02    0.436504        0.475256
1983  03    0.410516        0.442194
1984  04    0.450090        0.483521

numRows: Number of rows to show

def sort(sortExprs: Column*): DataFrame

Returns a new DataFrame sorted by the given expressions.
Returns a new DataFrame sorted by the given expressions. For example:
```
df.sort($"col1", $"col2".desc)
```
Annotations
@varargs()
def sort(sortCol: String, sortCols: String*): DataFrame

Returns a new DataFrame sorted by the specified column, all in ascending order.
Returns a new DataFrame sorted by the specified column, all in ascending order.
```
// The following 3 are equivalent
df.sort("sortcol")
df.sort($"sortcol")
df.sort($"sortcol".asc)
```
Annotations
@varargs()
val sqlContext: SQLContext
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def take(n: Int): Array[Row]

Returns the first n rows in the DataFrame.
Returns the first n rows in the DataFrame.

Definition Classes
DataFrame → RDDApi
def toDF(colNames: String*): DataFrame

Returns a new DataFrame with columns renamed.
Returns a new DataFrame with columns renamed. This can be quite convenient in conversion from a RDD of tuples into a DataFrame with meaningful names. For example:
```
val rdd: RDD[(Int, String)] = ...
rdd.toDF()  // this implicit conversion creates a DataFrame with column name _1 and _2
rdd.toDF("id", "name")  // this creates a DataFrame with column name "id" and "name"
```
Annotations
@varargs()
def toDF(): DataFrame

Returns the object itself.
def toJSON: RDD[String]

Returns the content of the DataFrame as a RDD of JSON strings.
def toJavaRDD: JavaRDD[Row]

Returns the content of the DataFrame as a JavaRDD of Rows.
def toString(): String

Definition Classes
DataFrame → AnyRef → Any
def unionAll(other: DataFrame): DataFrame

Returns a new DataFrame containing union of rows in this frame and another frame.
Returns a new DataFrame containing union of rows in this frame and another frame. This is equivalent to UNION ALL in SQL.
def unpersist(): DataFrame.this.type

Definition Classes
DataFrame → RDDApi
def unpersist(blocking: Boolean): DataFrame.this.type

Definition Classes
DataFrame → RDDApi
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
def where(condition: Column): DataFrame

Filters rows using the given condition.
Filters rows using the given condition. This is an alias for filter.
```
// The following are equivalent:
peopleDf.filter($"age" > 15)
peopleDf.where($"age" > 15)
peopleDf($"age" > 15)
```
def withColumn(colName: String, col: Column): DataFrame

Returns a new DataFrame by adding a column.
def withColumnRenamed(existingName: String, newName: String): DataFrame

Returns a new DataFrame with a column renamed.

Deprecated Value Members

def toSchemaRDD: DataFrame

Left here for backward compatibility.
Left here for backward compatibility.

Annotations
@deprecated
Deprecated
(Since version use toDF) 1.3.0

class DataFrame extends RDDApi[Row] with Serializable

Instance Constructors

new DataFrame(sqlContext: SQLContext, logicalPlan: LogicalPlan)

Value Members

final def !=(arg0: AnyRef): Boolean

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: AnyRef): Boolean

final def ==(arg0: Any): Boolean

def agg(expr: Column, exprs: Column*): DataFrame

def agg(exprs: Map[String, String]): DataFrame

def agg(exprs: Map[String, String]): DataFrame

def agg(aggExpr: (String, String), aggExprs: (String, String)*): DataFrame

def apply(colName: String): Column

def as(alias: Symbol): DataFrame

def as(alias: String): DataFrame

final def asInstanceOf[T0]: T0

def cache(): DataFrame.this.type

def clone(): AnyRef

def col(colName: String): Column

def collect(): Array[Row]

def collectAsList(): List[Row]

def columns: Array[String]

def count(): Long

def createJDBCTable(url: String, table: String, allowExisting: Boolean): Unit

def describe(cols: String*): DataFrame

def distinct: DataFrame

def dtypes: Array[(String, String)]

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def except(other: DataFrame): DataFrame

def explain(): Unit

def explain(extended: Boolean): Unit

def explode[A, B](inputColumn: String, outputColumn: String)(f: (A) ⇒ TraversableOnce[B])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[B]): DataFrame

def explode[A <: Product](input: Column*)(f: (Row) ⇒ TraversableOnce[A])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A]): DataFrame

def filter(conditionExpr: String): DataFrame

def filter(condition: Column): DataFrame

def finalize(): Unit

def first(): Row

def flatMap[R](f: (Row) ⇒ TraversableOnce[R])(implicit arg0: ClassTag[R]): RDD[R]

def foreach(f: (Row) ⇒ Unit): Unit

def foreachPartition(f: (Iterator[Row]) ⇒ Unit): Unit

final def getClass(): Class[_]

def groupBy(col1: String, cols: String*): GroupedData

def groupBy(cols: Column*): GroupedData

def hashCode(): Int

def head(): Row

def head(n: Int): Array[Row]

def insertInto(tableName: String): Unit

def insertInto(tableName: String, overwrite: Boolean): Unit

def insertIntoJDBC(url: String, table: String, overwrite: Boolean): Unit

def intersect(other: DataFrame): DataFrame

final def isInstanceOf[T0]: Boolean

def isLocal: Boolean

def javaRDD: JavaRDD[Row]

def javaToPython: JavaRDD[Array[Byte]]

def join(right: DataFrame, joinExprs: Column, joinType: String): DataFrame

def join(right: DataFrame, joinExprs: Column): DataFrame

def join(right: DataFrame): DataFrame

def limit(n: Int): DataFrame

val logicalPlan: LogicalPlan

def map[R](f: (Row) ⇒ R)(implicit arg0: ClassTag[R]): RDD[R]

def mapPartitions[R](f: (Iterator[Row]) ⇒ Iterator[R])(implicit arg0: ClassTag[R]): RDD[R]

def na: DataFrameNaFunctions

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def numericColumns: Seq[Expression]

def orderBy(sortExprs: Column*): DataFrame

def orderBy(sortCol: String, sortCols: String*): DataFrame

def persist(newLevel: StorageLevel): DataFrame.this.type

def persist(): DataFrame.this.type

def printSchema(): Unit

val queryExecution: QueryExecution

def rdd: RDD[Row]

def registerTempTable(tableName: String): Unit

def repartition(numPartitions: Int): DataFrame

def resolve(colName: String): NamedExpression

def sample(withReplacement: Boolean, fraction: Double): DataFrame