pyspark.sql.functions.regexp_extract(str: ColumnOrName, pattern: str, idx: int) → pyspark.sql.column.Column[source]

Extract a specific group matched by a Java regex, from the specified string column. If the regex did not match, or the specified group did not match, an empty string is returned.

New in version 1.5.0.


>>> df = spark.createDataFrame([('100-200',)], ['str'])
>>>'str', r'(\d+)-(\d+)', 1).alias('d')).collect()
>>> df = spark.createDataFrame([('foo',)], ['str'])
>>>'str', r'(\d+)', 1).alias('d')).collect()
>>> df = spark.createDataFrame([('aaaac',)], ['str'])
>>>'str', '(a+)(b)?(c)', 2).alias('d')).collect()