pyspark.sql.functions.levenshtein

pyspark.sql.functions.levenshtein(left: ColumnOrName, right: ColumnOrName, threshold: Optional[int] = None) → pyspark.sql.column.Column[source]

Computes the Levenshtein distance of the two given strings.

New in version 1.5.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
leftColumn or str

first column value.

rightColumn or str

second column value.

thresholdint, optional

if set when the levenshtein distance of the two given strings less than or equal to a given threshold then return result distance, or -1

Returns
Column

Levenshtein distance as integer value.

Examples

>>> df0 = spark.createDataFrame([('kitten', 'sitting',)], ['l', 'r'])
>>> df0.select(levenshtein('l', 'r').alias('d')).collect()
[Row(d=3)]
>>> df0.select(levenshtein('l', 'r', 2).alias('d')).collect()
[Row(d=-1)]