StructType¶

class pyspark.sql.types.StructType(fields: Optional[List[pyspark.sql.types.StructField]] = None)[source]¶

Struct type, consisting of a list of StructField.

This is the data type representing a Row.

Iterating a StructType will iterate over its StructFields. A contained StructField can be accessed by its name or position.

Examples

>>> from pyspark.sql.types import *
>>> struct1 = StructType([StructField("f1", StringType(), True)])
>>> struct1["f1"]
StructField('f1', StringType(), True)
>>> struct1[0]
StructField('f1', StringType(), True)

>>> struct1 = StructType([StructField("f1", StringType(), True)])
>>> struct2 = StructType([StructField("f1", StringType(), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType([StructField("f1", CharType(10), True)])
>>> struct2 = StructType([StructField("f1", CharType(10), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType([StructField("f1", VarcharType(10), True)])
>>> struct2 = StructType([StructField("f1", VarcharType(10), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType([StructField("f1", StringType(), True)])
>>> struct2 = StructType([StructField("f1", StringType(), True),
...     StructField("f2", IntegerType(), False)])
>>> struct1 == struct2
False

The below example demonstrates how to create a DataFrame based on a struct created using class:StructType and class:StructField:

>>> data = [("Alice", ["Java", "Scala"]), ("Bob", ["Python", "Scala"])]
>>> schema = StructType([
...     StructField("name", StringType()),
...     StructField("languagesSkills", ArrayType(StringType())),
... ])
>>> df = spark.createDataFrame(data=data, schema=schema)
>>> df.printSchema()
root
 |-- name: string (nullable = true)
 |-- languagesSkills: array (nullable = true)
 |    |-- element: string (containsNull = true)
>>> df.show()
+-----+---------------+
| name|languagesSkills|
+-----+---------------+
|Alice|  [Java, Scala]|
|  Bob|[Python, Scala]|
+-----+---------------+

Methods

`add`(field[, data_type, nullable, metadata])	Construct a `StructType` by adding new elements to it, to define the schema.
`fieldNames`()	Returns all field names in a list.
`fromInternal`(obj)	Converts an internal SQL object into a native Python object.
`fromJson`(json)	Constructs `StructType` from a schema defined in JSON format.
`json`()
`jsonValue`()
`needConversion`()	Does this type needs conversion between Python object and internal SQL object.
`simpleString`()
`toInternal`(obj)	Converts a Python object into an internal SQL object.
`typeName`()

Methods Documentation

add(field: Union[str, pyspark.sql.types.StructField], data_type: Union[str, pyspark.sql.types.DataType, None] = None, nullable: bool = True, metadata: Optional[Dict[str, Any]] = None) → pyspark.sql.types.StructType [source]¶

Construct a StructType by adding new elements to it, to define the schema. The method accepts either:

A single parameter which is a StructField object.

Between 2 and 4 parameters as (name, data_type, nullable (optional), metadata(optional). The data_type parameter may be either a String or a DataType object.

Parameters

fieldstr or StructField: Either the name of the field or a StructField object
data_typeDataType, optional: If present, the DataType of the StructField to create
nullablebool, optional: Whether the field to add should be nullable (default True)
metadatadict, optional: Any additional metadata (default None)

Returns

StructType

Examples

>>> from pyspark.sql.types import IntegerType, StringType, StructField, StructType
>>> struct1 = StructType().add("f1", StringType(), True).add("f2", StringType(), True, None)
>>> struct2 = StructType([StructField("f1", StringType(), True),
...     StructField("f2", StringType(), True, None)])
>>> struct1 == struct2
True
>>> struct1 = StructType().add(StructField("f1", StringType(), True))
>>> struct2 = StructType([StructField("f1", StringType(), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType().add("f1", "string", True)
>>> struct2 = StructType([StructField("f1", StringType(), True)])
>>> struct1 == struct2
True

fieldNames() → List[str][source]¶

Returns all field names in a list.

Examples

>>> from pyspark.sql.types import StringType, StructField, StructType
>>> struct = StructType([StructField("f1", StringType(), True)])
>>> struct.fieldNames()
['f1']

fromInternal(obj: Tuple) → pyspark.sql.types.Row[source]¶: Converts an internal SQL object into a native Python object.

classmethod fromJson(json: Dict[str, Any]) → pyspark.sql.types.StructType [source]¶

Constructs StructType from a schema defined in JSON format.

Below is a JSON schema it must adhere to:

 {
   "title":"StructType",
   "description":"Schema of StructType in json format",
   "type":"object",
   "properties":{
      "fields":{
         "description":"Array of struct fields",
         "type":"array",
         "items":{
             "type":"object",
             "properties":{
                "name":{
                   "description":"Name of the field",
                   "type":"string"
                },
                "type":{
                   "description": "Type of the field. Can either be
                                   another nested StructType or primitive type",
                   "type":"object/string"
                },
                "nullable":{
                   "description":"If nulls are allowed",
                   "type":"boolean"
                },
                "metadata":{
                   "description":"Additional metadata to supply",
                   "type":"object"
                },
                "required":[
                   "name",
                   "type",
                   "nullable",
                   "metadata"
                ]
             }
        }
     }
  }
}

Parameters

jsondict or a dict-like object e.g. JSON object: This “dict” must have “fields” key that returns an array of fields each of which must have specific keys (name, type, nullable, metadata).

Returns

StructType

Examples

>>> json_str = '''
...  {
...      "fields": [
...          {
...              "metadata": {},
...              "name": "Person",
...              "nullable": true,
...              "type": {
...                  "fields": [
...                      {
...                          "metadata": {},
...                          "name": "name",
...                          "nullable": false,
...                          "type": "string"
...                      },
...                      {
...                          "metadata": {},
...                          "name": "surname",
...                          "nullable": false,
...                          "type": "string"
...                      }
...                  ],
...                  "type": "struct"
...              }
...          }
...      ],
...      "type": "struct"
...  }
...  '''
>>> import json
>>> scheme = StructType.fromJson(json.loads(json_str))
>>> scheme.simpleString()
'struct<Person:struct<name:string,surname:string>>'

json() → str¶

jsonValue() → Dict[str, Any][source]¶

needConversion() → bool[source]¶

Does this type needs conversion between Python object and internal SQL object.

This is used to avoid the unnecessary conversion for ArrayType/MapType/StructType.

simpleString() → str[source]¶

toInternal(obj: Tuple) → Tuple[source]¶: Converts a Python object into an internal SQL object.

classmethod typeName() → str¶

StructField

TimestampType