pyspark.pandas.DataFrame.spark.to_table#

spark.to_table(name, format=None, mode='overwrite', partition_cols=None, index_col=None, **options)#

Write the DataFrame into a Spark table. DataFrame.spark.to_table() is an alias of DataFrame.to_table().

Parameters

namestr, required

Table name in Spark.

formatstring, optional

Specifies the output data source format. Some common ones are:

‘delta’
‘parquet’
‘orc’
‘json’
‘csv’

modestr {‘append’, ‘overwrite’, ‘ignore’, ‘error’, ‘errorifexists’}, default

‘overwrite’. Specifies the behavior of the save operation when the table exists already.

‘append’: Append the new data to existing data.
‘overwrite’: Overwrite existing data.
‘ignore’: Silently ignore this operation if data already exists.
‘error’ or ‘errorifexists’: Throw an exception if data already exists.

partition_colsstr or list of str, optional, default None

Names of partitioning columns

index_col: str or list of str, optional, default: None

Column names to be used in Spark to represent pandas-on-Spark’s index. The index name in pandas-on-Spark is ignored. By default, the index is always lost.

options

Additional options passed directly to Spark.

Returns

None

See also

read_table
DataFrame.spark.to_spark_io
DataFrame.to_parquet

Examples

>>> df = ps.DataFrame(dict(
...    date=list(pd.date_range('2012-1-1 12:00:00', periods=3, freq='ME')),
...    country=['KR', 'US', 'JP'],
...    code=[1, 2 ,3]), columns=['date', 'country', 'code'])
>>> df
                 date country  code
0 2012-01-31 12:00:00      KR     1
1 2012-02-29 12:00:00      US     2
2 2012-03-31 12:00:00      JP     3

>>> df.to_table('%s.my_table' % db, partition_cols='date')