pyspark.sql.functions.from_csv

pyspark.sql.functions.from_csv(col, schema, options=None)[source]

Parses a column containing a CSV string to a row with the specified schema. Returns null, in the case of an unparseable string.

New in version 3.0.0.

Parameters
colColumn or str

string column in CSV format

schema :class:`~pyspark.sql.Column` or str

a string with schema in DDL format to use when parsing the CSV column.

optionsdict, optional

options to control parsing. accepts the same options as the CSV datasource. See Data Source Option in the version you use.

Examples

>>> data = [("1,2,3",)]
>>> df = spark.createDataFrame(data, ("value",))
>>> df.select(from_csv(df.value, "a INT, b INT, c INT").alias("csv")).collect()
[Row(csv=Row(a=1, b=2, c=3))]
>>> value = data[0][0]
>>> df.select(from_csv(df.value, schema_of_csv(value)).alias("csv")).collect()
[Row(csv=Row(_c0=1, _c1=2, _c2=3))]
>>> data = [("   abc",)]
>>> df = spark.createDataFrame(data, ("value",))
>>> options = {'ignoreLeadingWhiteSpace': True}
>>> df.select(from_csv(df.value, "s string", options).alias("csv")).collect()
[Row(csv=Row(s='abc'))]