pyspark.sql.DataFrame.cache

DataFrame.cache() → pyspark.sql.dataframe.DataFrame[source]

Persists the DataFrame with the default storage level (MEMORY_AND_DISK).

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Returns
DataFrame

Cached DataFrame.

Notes

The default storage level has changed to MEMORY_AND_DISK to match Scala in 2.0.

Examples

>>> df = spark.range(1)
>>> df.cache()
DataFrame[id: bigint]
>>> df.explain()
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- InMemoryTableScan ...