pyspark.sql.functions.from_csv

pyspark.sql.functions.from_csv(col: ColumnOrName, schema: Union[pyspark.sql.column.Column, str], options: Optional[Dict[str, str]] = None) → pyspark.sql.column.Column[source]

Parses a column containing a CSV string to a row with the specified schema. Returns null, in the case of an unparseable string.

New in version 3.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

a column or column name in CSV format

schema :class:`~pyspark.sql.Column` or str

a column, or Python string literal with schema in DDL format, to use when parsing the CSV column.

optionsdict, optional

options to control parsing. accepts the same options as the CSV datasource. See Data Source Option for the version you use.

Returns
Column

a column of parsed CSV values

Examples

>>> data = [("1,2,3",)]
>>> df = spark.createDataFrame(data, ("value",))
>>> df.select(from_csv(df.value, "a INT, b INT, c INT").alias("csv")).collect()
[Row(csv=Row(a=1, b=2, c=3))]
>>> value = data[0][0]
>>> df.select(from_csv(df.value, schema_of_csv(value)).alias("csv")).collect()
[Row(csv=Row(_c0=1, _c1=2, _c2=3))]
>>> data = [("   abc",)]
>>> df = spark.createDataFrame(data, ("value",))
>>> options = {'ignoreLeadingWhiteSpace': True}
>>> df.select(from_csv(df.value, "s string", options).alias("csv")).collect()
[Row(csv=Row(s='abc'))]