pyspark.sql.functions.regexp_replace

pyspark.sql.functions.regexp_replace(string: ColumnOrName, pattern: Union[str, pyspark.sql.column.Column], replacement: Union[str, pyspark.sql.column.Column]) → pyspark.sql.column.Column[source]

Replace all substrings of the specified string value that match regexp with replacement.

New in version 1.5.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
stringColumn or str

column name or column containing the string value

patternColumn or str

column object or str containing the regexp pattern

replacementColumn or str

column object or str containing the replacement

Returns
Column

string with all substrings replaced.

Examples

>>> df = spark.createDataFrame([("100-200", r"(\d+)", "--")], ["str", "pattern", "replacement"])
>>> df.select(regexp_replace('str', r'(\d+)', '--').alias('d')).collect()
[Row(d='-----')]
>>> df.select(regexp_replace("str", col("pattern"), col("replacement")).alias('d')).collect()
[Row(d='-----')]