pyspark.sql.DataFrame.to_pandas_on_spark

DataFrame.to_pandas_on_spark(index_col=None)[source]

Converts the existing DataFrame into a pandas-on-Spark DataFrame.

If a pandas-on-Spark DataFrame is converted to a Spark DataFrame and then back to pandas-on-Spark, it will lose the index information and the original index will be turned into a normal column.

This is only available if Pandas is installed and available.

Parameters
index_col: str or list of str, optional, default: None

Index column of table in Spark.

See also

pyspark.pandas.frame.DataFrame.to_spark

Examples

>>> df.show()  
+----+----+
|Col1|Col2|
+----+----+
|   a|   1|
|   b|   2|
|   c|   3|
+----+----+
>>> df.to_pandas_on_spark()  
  Col1  Col2
0    a     1
1    b     2
2    c     3

We can specify the index columns.

>>> df.to_pandas_on_spark(index_col="Col1"): 
      Col2
Col1
a        1
b        2
c        3