org.apache.spark.sql
Class DataFrameWriter

Object
  extended by org.apache.spark.sql.DataFrameWriter

public final class DataFrameWriter
extends Object

:: Experimental :: Interface used to write a DataFrame to external storage systems (e.g. file systems, key-value stores, etc). Use DataFrame.write to access this.

Since:
1.4.0

Method Summary
 DataFrameWriter format(String source)
          Specifies the underlying output data source.
 void insertInto(String tableName)
          Inserts the content of the DataFrame to the specified table.
 void jdbc(String url, String table, java.util.Properties connectionProperties)
          Saves the content of the DataFrame to a external database table via JDBC.
 void json(String path)
          Saves the content of the DataFrame in JSON format at the specified path.
 DataFrameWriter mode(SaveMode saveMode)
          Specifies the behavior when data or table already exists.
 DataFrameWriter mode(String saveMode)
          Specifies the behavior when data or table already exists.
 DataFrameWriter option(String key, String value)
          Adds an output option for the underlying data source.
 DataFrameWriter options(scala.collection.Map<String,String> options)
          (Scala-specific) Adds output options for the underlying data source.
 DataFrameWriter options(java.util.Map<String,String> options)
          Adds output options for the underlying data source.
 void parquet(String path)
          Saves the content of the DataFrame in Parquet format at the specified path.
 DataFrameWriter partitionBy(scala.collection.Seq<String> colNames)
          Partitions the output by the given columns on the file system.
 DataFrameWriter partitionBy(String... colNames)
          Partitions the output by the given columns on the file system.
 void save()
          Saves the content of the DataFrame as the specified table.
 void save(String path)
          Saves the content of the DataFrame at the specified path.
 void saveAsTable(String tableName)
          Saves the content of the DataFrame as the specified table.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

partitionBy

public DataFrameWriter partitionBy(String... colNames)
Partitions the output by the given columns on the file system. If specified, the output is laid out on the file system similar to Hive's partitioning scheme.

This is only applicable for Parquet at the moment.

Parameters:
colNames - (undocumented)
Returns:
(undocumented)
Since:
1.4.0

mode

public DataFrameWriter mode(SaveMode saveMode)
Specifies the behavior when data or table already exists. Options include: - SaveMode.Overwrite: overwrite the existing data. - SaveMode.Append: append the data. - SaveMode.Ignore: ignore the operation (i.e. no-op). - SaveMode.ErrorIfExists: default option, throw an exception at runtime.

Parameters:
saveMode - (undocumented)
Returns:
(undocumented)
Since:
1.4.0

mode

public DataFrameWriter mode(String saveMode)
Specifies the behavior when data or table already exists. Options include: - overwrite: overwrite the existing data. - append: append the data. - ignore: ignore the operation (i.e. no-op). - error: default option, throw an exception at runtime.

Parameters:
saveMode - (undocumented)
Returns:
(undocumented)
Since:
1.4.0

format

public DataFrameWriter format(String source)
Specifies the underlying output data source. Built-in options include "parquet", "json", etc.

Parameters:
source - (undocumented)
Returns:
(undocumented)
Since:
1.4.0

option

public DataFrameWriter option(String key,
                              String value)
Adds an output option for the underlying data source.

Parameters:
key - (undocumented)
value - (undocumented)
Returns:
(undocumented)
Since:
1.4.0

options

public DataFrameWriter options(scala.collection.Map<String,String> options)
(Scala-specific) Adds output options for the underlying data source.

Parameters:
options - (undocumented)
Returns:
(undocumented)
Since:
1.4.0

options

public DataFrameWriter options(java.util.Map<String,String> options)
Adds output options for the underlying data source.

Parameters:
options - (undocumented)
Returns:
(undocumented)
Since:
1.4.0

partitionBy

public DataFrameWriter partitionBy(scala.collection.Seq<String> colNames)
Partitions the output by the given columns on the file system. If specified, the output is laid out on the file system similar to Hive's partitioning scheme.

This is only applicable for Parquet at the moment.

Parameters:
colNames - (undocumented)
Returns:
(undocumented)
Since:
1.4.0

save

public void save(String path)
Saves the content of the DataFrame at the specified path.

Parameters:
path - (undocumented)
Since:
1.4.0

save

public void save()
Saves the content of the DataFrame as the specified table.

Since:
1.4.0

insertInto

public void insertInto(String tableName)
Inserts the content of the DataFrame to the specified table. It requires that the schema of the DataFrame is the same as the schema of the table.

Because it inserts data to an existing table, format or options will be ignored.

Parameters:
tableName - (undocumented)
Since:
1.4.0

saveAsTable

public void saveAsTable(String tableName)
Saves the content of the DataFrame as the specified table.

In the case the table already exists, behavior of this function depends on the save mode, specified by the mode function (default to throwing an exception). When mode is Overwrite, the schema of the DataFrame does not need to be the same as that of the existing table. When mode is Append, the schema of the DataFrame need to be the same as that of the existing table, and format or options will be ignored.

Parameters:
tableName - (undocumented)
Since:
1.4.0

jdbc

public void jdbc(String url,
                 String table,
                 java.util.Properties connectionProperties)
Saves the content of the DataFrame to a external database table via JDBC. In the case the table already exists in the external database, behavior of this function depends on the save mode, specified by the mode function (default to throwing an exception).

Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems.

Parameters:
url - JDBC database url of the form jdbc:subprotocol:subname
table - Name of the table in the external database.
connectionProperties - JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included.

json

public void json(String path)
Saves the content of the DataFrame in JSON format at the specified path. This is equivalent to:

   format("json").save(path)
 

Parameters:
path - (undocumented)
Since:
1.4.0

parquet

public void parquet(String path)
Saves the content of the DataFrame in Parquet format at the specified path. This is equivalent to:

   format("parquet").save(path)
 

Parameters:
path - (undocumented)
Since:
1.4.0