Input/Output

Data Generator

range(start[, end, step, num_partitions])

Create a DataFrame with some range of numbers.

Spark Metastore Table

read_table(name[, index_col])

Read a Spark table and return a DataFrame.

DataFrame.to_table(name[, format, mode, …])

Write the DataFrame into a Spark table.

Delta Lake

read_delta(path[, version, timestamp, index_col])

Read a Delta Lake table on some file system and return a DataFrame.

DataFrame.to_delta(path[, mode, …])

Write the DataFrame out as a Delta Lake table.

Parquet

read_parquet(path[, columns, index_col, …])

Load a parquet object from the file path, returning a DataFrame.

DataFrame.to_parquet(path[, mode, …])

Write the DataFrame out as a Parquet file or directory.

ORC

read_orc(path[, columns, index_col])

Load an ORC object from the file path, returning a DataFrame.

DataFrame.to_orc(path[, mode, …])

Write the DataFrame out as a ORC file or directory.

Generic Spark I/O

read_spark_io([path, format, schema, index_col])

Load a DataFrame from a Spark data source.

DataFrame.to_spark_io([path, format, mode, …])

Write the DataFrame out to a Spark data source.

Flat File / CSV

read_csv(path[, sep, header, names, …])

Read CSV (comma-separated) file into DataFrame or Series.

DataFrame.to_csv([path, sep, na_rep, …])

Write object to a comma-separated values (csv) file.

Clipboard

read_clipboard([sep])

Read text from clipboard and pass to read_csv.

DataFrame.to_clipboard([excel, sep])

Copy object to the system clipboard.

Excel

read_excel(io[, sheet_name, header, names, …])

Read an Excel file into a pandas-on-Spark DataFrame or Series.

DataFrame.to_excel(excel_writer[, …])

Write object to an Excel sheet.

JSON

read_json(path[, lines, index_col])

Convert a JSON string to DataFrame.

DataFrame.to_json([path, compression, …])

Convert the object to a JSON string.

HTML

read_html(io[, match, flavor, header, …])

Read HTML tables into a list of DataFrame objects.

DataFrame.to_html([buf, columns, col_space, …])

Render a DataFrame as an HTML table.

SQL

read_sql_table(table_name, con[, schema, …])

Read SQL database table into a DataFrame.

read_sql_query(sql, con[, index_col])

Read SQL query into a DataFrame.

read_sql(sql, con[, index_col, columns])

Read SQL query or database table into a DataFrame.