pyspark.sql.DataFrame.explain

DataFrame.explain(extended=None, mode=None)[source]

Prints the (logical and physical) plans to the console for debugging purpose.

New in version 1.3.0.

Parameters
extendedbool, optional

default False. If False, prints only the physical plan. When this is a string without specifying the mode, it works as the mode is specified.

modestr, optional

specifies the expected output format of plans.

  • simple: Print only a physical plan.

  • extended: Print both logical and physical plans.

  • codegen: Print a physical plan and generated codes if they are available.

  • cost: Print a logical plan and statistics if they are available.

  • formatted: Split explain output into two sections: a physical plan outline and node details.

Changed in version 3.0.0: Added optional argument mode to specify the expected output format of plans.

Examples

>>> df.explain()
== Physical Plan ==
*(1) Scan ExistingRDD[age#0,name#1]
>>> df.explain(True)
== Parsed Logical Plan ==
...
== Analyzed Logical Plan ==
...
== Optimized Logical Plan ==
...
== Physical Plan ==
...
>>> df.explain(mode="formatted")
== Physical Plan ==
* Scan ExistingRDD (1)
(1) Scan ExistingRDD [codegen id : 1]
Output [2]: [age#0, name#1]
...
>>> df.explain("cost")
== Optimized Logical Plan ==
...Statistics...
...