StructType¶
- 
class pyspark.sql.types.StructType(fields: Optional[List[pyspark.sql.types.StructField]] = None)[source]¶
- Struct type, consisting of a list of - StructField.- This is the data type representing a - Row.- Iterating a - StructTypewill iterate over its- StructFields. A contained- StructFieldcan be accessed by its name or position.- Examples - >>> from pyspark.sql.types import * >>> struct1 = StructType([StructField("f1", StringType(), True)]) >>> struct1["f1"] StructField('f1', StringType(), True) >>> struct1[0] StructField('f1', StringType(), True) - >>> struct1 = StructType([StructField("f1", StringType(), True)]) >>> struct2 = StructType([StructField("f1", StringType(), True)]) >>> struct1 == struct2 True >>> struct1 = StructType([StructField("f1", CharType(10), True)]) >>> struct2 = StructType([StructField("f1", CharType(10), True)]) >>> struct1 == struct2 True >>> struct1 = StructType([StructField("f1", VarcharType(10), True)]) >>> struct2 = StructType([StructField("f1", VarcharType(10), True)]) >>> struct1 == struct2 True >>> struct1 = StructType([StructField("f1", StringType(), True)]) >>> struct2 = StructType([StructField("f1", StringType(), True), ... StructField("f2", IntegerType(), False)]) >>> struct1 == struct2 False - The below example demonstrates how to create a DataFrame based on a struct created using class:StructType and class:StructField: - >>> data = [("Alice", ["Java", "Scala"]), ("Bob", ["Python", "Scala"])] >>> schema = StructType([ ... StructField("name", StringType()), ... StructField("languagesSkills", ArrayType(StringType())), ... ]) >>> df = spark.createDataFrame(data=data, schema=schema) >>> df.printSchema() root |-- name: string (nullable = true) |-- languagesSkills: array (nullable = true) | |-- element: string (containsNull = true) >>> df.show() +-----+---------------+ | name|languagesSkills| +-----+---------------+ |Alice| [Java, Scala]| | Bob|[Python, Scala]| +-----+---------------+ - Methods - add(field[, data_type, nullable, metadata])- Construct a - StructTypeby adding new elements to it, to define the schema.- Returns all field names in a list. - fromInternal(obj)- Converts an internal SQL object into a native Python object. - fromJson(json)- Constructs - StructTypefrom a schema defined in JSON format.- json()- Does this type needs conversion between Python object and internal SQL object. - toInternal(obj)- Converts a Python object into an internal SQL object. - typeName()- Methods Documentation - 
add(field: Union[str, pyspark.sql.types.StructField], data_type: Union[str, pyspark.sql.types.DataType, None] = None, nullable: bool = True, metadata: Optional[Dict[str, Any]] = None) → pyspark.sql.types.StructType[source]¶
- Construct a - StructTypeby adding new elements to it, to define the schema. The method accepts either:- A single parameter which is a - StructFieldobject.
- Between 2 and 4 parameters as (name, data_type, nullable (optional), metadata(optional). The data_type parameter may be either a String or a - DataTypeobject.
 - Parameters
- fieldstr or StructField
- Either the name of the field or a - StructFieldobject
- data_typeDataType, optional
- If present, the DataType of the - StructFieldto create
- nullablebool, optional
- Whether the field to add should be nullable (default True) 
- metadatadict, optional
- Any additional metadata (default None) 
 
- fieldstr or 
- Returns
 - Examples - >>> from pyspark.sql.types import IntegerType, StringType, StructField, StructType >>> struct1 = StructType().add("f1", StringType(), True).add("f2", StringType(), True, None) >>> struct2 = StructType([StructField("f1", StringType(), True), ... StructField("f2", StringType(), True, None)]) >>> struct1 == struct2 True >>> struct1 = StructType().add(StructField("f1", StringType(), True)) >>> struct2 = StructType([StructField("f1", StringType(), True)]) >>> struct1 == struct2 True >>> struct1 = StructType().add("f1", "string", True) >>> struct2 = StructType([StructField("f1", StringType(), True)]) >>> struct1 == struct2 True 
 - 
fieldNames() → List[str][source]¶
- Returns all field names in a list. - Examples - >>> from pyspark.sql.types import StringType, StructField, StructType >>> struct = StructType([StructField("f1", StringType(), True)]) >>> struct.fieldNames() ['f1'] 
 - 
fromInternal(obj: Tuple) → pyspark.sql.types.Row[source]¶
- Converts an internal SQL object into a native Python object. 
 - 
classmethod fromJson(json: Dict[str, Any]) → pyspark.sql.types.StructType[source]¶
- Constructs - StructTypefrom a schema defined in JSON format.- Below is a JSON schema it must adhere to: - { "title":"StructType", "description":"Schema of StructType in json format", "type":"object", "properties":{ "fields":{ "description":"Array of struct fields", "type":"array", "items":{ "type":"object", "properties":{ "name":{ "description":"Name of the field", "type":"string" }, "type":{ "description": "Type of the field. Can either be another nested StructType or primitive type", "type":"object/string" }, "nullable":{ "description":"If nulls are allowed", "type":"boolean" }, "metadata":{ "description":"Additional metadata to supply", "type":"object" }, "required":[ "name", "type", "nullable", "metadata" ] } } } } } - Parameters
- jsondict or a dict-like object e.g. JSON object
- This “dict” must have “fields” key that returns an array of fields each of which must have specific keys (name, type, nullable, metadata). 
 
- Returns
 - Examples - >>> json_str = ''' ... { ... "fields": [ ... { ... "metadata": {}, ... "name": "Person", ... "nullable": true, ... "type": { ... "fields": [ ... { ... "metadata": {}, ... "name": "name", ... "nullable": false, ... "type": "string" ... }, ... { ... "metadata": {}, ... "name": "surname", ... "nullable": false, ... "type": "string" ... } ... ], ... "type": "struct" ... } ... } ... ], ... "type": "struct" ... } ... ''' >>> import json >>> scheme = StructType.fromJson(json.loads(json_str)) >>> scheme.simpleString() 'struct<Person:struct<name:string,surname:string>>' 
 - 
json() → str¶
 - 
needConversion() → bool[source]¶
- Does this type needs conversion between Python object and internal SQL object. - This is used to avoid the unnecessary conversion for ArrayType/MapType/StructType. 
 - 
classmethod typeName() → str¶
 
-