pyspark.sql.functions.regexp_extract

pyspark.sql.functions.regexp_extract(str: ColumnOrName, pattern: str, idx: int) → pyspark.sql.column.Column[source]

Extract a specific group matched by the Java regex regexp, from the specified string column. If the regex did not match, or the specified group did not match, an empty string is returned.

New in version 1.5.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
strColumn or str

target column to work on.

patternstr

regex pattern to apply.

idxint

matched group id.

Returns
Column

matched value specified by idx group id.

Examples

>>> df = spark.createDataFrame([('100-200',)], ['str'])
>>> df.select(regexp_extract('str', r'(\d+)-(\d+)', 1).alias('d')).collect()
[Row(d='100')]
>>> df = spark.createDataFrame([('foo',)], ['str'])
>>> df.select(regexp_extract('str', r'(\d+)', 1).alias('d')).collect()
[Row(d='')]
>>> df = spark.createDataFrame([('aaaac',)], ['str'])
>>> df.select(regexp_extract('str', '(a+)(b)?(c)', 2).alias('d')).collect()
[Row(d='')]