Spark 3.4.0 ScalaDoc - org.apache.spark.sql.DataFrameWriter

final def !=(arg0: Any): Boolean

Definition Classes: AnyRef → Any

final def ##(): Int

Definition Classes: AnyRef → Any

final def ==(arg0: Any): Boolean

Definition Classes: AnyRef → Any

final def asInstanceOf[T0]: T0

Definition Classes: Any

def bucketBy(numBuckets: Int, colName: String, colNames: String*): DataFrameWriter[T]

Buckets the output by the given columns.

Buckets the output by the given columns. If specified, the output is laid out on the file system similar to Hive's bucketing scheme, but with a different bucket hash function and is not compatible with Hive's bucketing.

This is applicable for all file-based data sources (e.g. Parquet, JSON) starting with Spark 2.1.0.

Annotations: @varargs()
Since: 2.0

def clone(): AnyRef

Attributes: protected[lang]
Definition Classes: AnyRef
Annotations: @throws( ... ) @native()

def csv(path: String): Unit

Saves the content of the DataFrame in CSV format at the specified path.

Saves the content of the DataFrame in CSV format at the specified path. This is equivalent to:

format("csv").save(path)

You can find the CSV-specific options for writing CSV files in Data Source Option in the version you use.

Since: 2.0.0

final def eq(arg0: AnyRef): Boolean

Definition Classes: AnyRef

def equals(arg0: Any): Boolean

Definition Classes: AnyRef → Any

def finalize(): Unit

Attributes: protected[lang]
Definition Classes: AnyRef
Annotations: @throws( classOf[java.lang.Throwable] )

def format(source: String): DataFrameWriter[T]

Specifies the underlying output data source.

Specifies the underlying output data source. Built-in options include "parquet", "json", etc.

Since: 1.4.0

final def getClass(): Class[_]

Definition Classes: AnyRef → Any
Annotations: @native()

def hashCode(): Int

Definition Classes: AnyRef → Any
Annotations: @native()

def insertInto(tableName: String): Unit

Inserts the content of the DataFrame to the specified table.

Inserts the content of the DataFrame to the specified table. It requires that the schema of the DataFrame is the same as the schema of the table.

Since

1.4.0

Note

Unlike saveAsTable, insertInto ignores the column names and just uses position-based resolution. For example:

,

SaveMode.ErrorIfExists and SaveMode.Ignore behave as SaveMode.Append in insertInto as insertInto is not a table creating operation.

scala> Seq((1, 2)).toDF("i", "j").write.mode("overwrite").saveAsTable("t1")
scala> Seq((3, 4)).toDF("j", "i").write.insertInto("t1")
scala> Seq((5, 6)).toDF("a", "b").write.insertInto("t1")
scala> sql("select * from t1").show
+---+---+
|  i|  j|
+---+---+
|  5|  6|
|  3|  4|
|  1|  2|
+---+---+

Because it inserts data to an existing table, format or options will be ignored.

final def isInstanceOf[T0]: Boolean

Definition Classes: Any

def jdbc(url: String, table: String, connectionProperties: Properties): Unit

Saves the content of the DataFrame to an external database table via JDBC.

Saves the content of the DataFrame to an external database table via JDBC. In the case the table already exists in the external database, behavior of this function depends on the save mode, specified by the mode function (default to throwing an exception).

Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems.

JDBC-specific option and parameter documentation for storing tables via JDBC in Data Source Option in the version you use.

table: Name of the table in the external database.
connectionProperties: JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included. "batchsize" can be used to control the number of rows per insert. "isolationLevel" can be one of "NONE", "READ_COMMITTED", "READ_UNCOMMITTED", "REPEATABLE_READ", or "SERIALIZABLE", corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of "READ_UNCOMMITTED".

Since: 1.4.0

def json(path: String): Unit

Saves the content of the DataFrame in JSON format ( JSON Lines text format or newline-delimited JSON) at the specified path.

Saves the content of the DataFrame in JSON format ( JSON Lines text format or newline-delimited JSON) at the specified path. This is equivalent to:

format("json").save(path)

You can find the JSON-specific options for writing JSON files in Data Source Option in the version you use.

Since: 1.4.0

def mode(saveMode: String): DataFrameWriter[T]

Specifies the behavior when data or table already exists.

Specifies the behavior when data or table already exists. Options include:

overwrite: overwrite the existing data.
append: append the data.
ignore: ignore the operation (i.e. no-op).
error or errorifexists: default option, throw an exception at runtime.

Since: 1.4.0

def mode(saveMode: SaveMode): DataFrameWriter[T]

Specifies the behavior when data or table already exists.

Specifies the behavior when data or table already exists. Options include:

SaveMode.Overwrite: overwrite the existing data.
SaveMode.Append: append the data.
SaveMode.Ignore: ignore the operation (i.e. no-op).
SaveMode.ErrorIfExists: throw an exception at runtime.

The default option is ErrorIfExists.

Since: 1.4.0

final def ne(arg0: AnyRef): Boolean

Definition Classes: AnyRef

final def notify(): Unit

Definition Classes: AnyRef
Annotations: @native()

final def notifyAll(): Unit

Definition Classes: AnyRef
Annotations: @native()

def option(key: String, value: Double): DataFrameWriter[T]

Adds an output option for the underlying data source.