pyspark.sql.functions.substr¶

pyspark.sql.functions.substr(str: ColumnOrName, pos: ColumnOrName, len: Optional[ColumnOrName] = None) → pyspark.sql.column.Column[source]¶

Returns the substring of str that starts at pos and is of length len, or the slice of byte array that starts at pos and is of length len.

New in version 3.5.0.

Parameters

srcColumn or str: A column of string.
posColumn or str: A column of string, the substring of str that starts at pos.
lenColumn or str, optional: A column of string, the substring of str is of length len.

Examples

>>> import pyspark.sql.functions as sf
>>> spark.createDataFrame(
...     [("Spark SQL", 5, 1,)], ["a", "b", "c"]
... ).select(sf.substr("a", "b", "c")).show()
+---------------+
|substr(a, b, c)|
+---------------+
|              k|
+---------------+

>>> import pyspark.sql.functions as sf
>>> spark.createDataFrame(
...     [("Spark SQL", 5, 1,)], ["a", "b", "c"]
... ).select(sf.substr("a", "b")).show()
+------------------------+
|substr(a, b, 2147483647)|
+------------------------+
|                   k SQL|
+------------------------+

pyspark.sql.functions.startswith

pyspark.sql.functions.substring