pyspark.sql.DataFrame.observe

DataFrame.observe(observation: Observation, *exprs: pyspark.sql.column.Column) → DataFrame[source]

Observe (named) metrics through an Observation instance.

A user can retrieve the metrics by accessing Observation.get.

New in version 3.3.0.

Parameters
observationObservation

an Observation instance to obtain the metric.

exprslist of Column

column expressions (Column).

Returns
DataFrame

the observed DataFrame.

Notes

This method does not support streaming datasets.

Examples

>>> from pyspark.sql.functions import col, count, lit, max
>>> from pyspark.sql import Observation
>>> observation = Observation("my metrics")
>>> observed_df = df.observe(observation, count(lit(1)).alias("count"), max(col("age")))
>>> observed_df.count()
2
>>> observation.get
{'count': 2, 'max(age)': 5}

New in version 3.3.