pyspark.sql.DataFrame.semanticHash

DataFrame.semanticHash()[source]

Returns a hash code of the logical query plan against this DataFrame.

New in version 3.1.0.

Notes

Unlike the standard hash code, the hash is calculated against the query plan simplified by tolerating the cosmetic differences such as attribute names.

This API is a developer API.

Examples

>>> spark.range(10).selectExpr("id as col0").semanticHash()  
1855039936
>>> spark.range(10).selectExpr("id as col1").semanticHash()  
1855039936