Skip to contents

Avro processing functions defined for Column.

Usage

from_avro(x, ...)

to_avro(x, ...)

# S4 method for characterOrColumn
from_avro(x, jsonFormatSchema, ...)

# S4 method for characterOrColumn
to_avro(x, jsonFormatSchema = NULL)

Arguments

x

Column to compute on.

...

additional argument(s) passed as parser options.

jsonFormatSchema

character Avro schema in JSON string format

Details

from_avro Converts a binary column of Avro format into its corresponding catalyst value. The specified schema must match the read data, otherwise the behavior is undefined: it may fail or return arbitrary result. To deserialize the data with a compatible and evolved schema, the expected Avro schema can be set via the option avroSchema.

to_avro Converts a column into binary of Avro format.

Note

Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".

from_avro since 3.1.0

to_avro since 3.1.0

Examples

if (FALSE) {
df <- createDataFrame(iris)
schema <- paste(
  c(
    '{"type": "record", "namespace": "example.avro", "name": "Iris", "fields": [',
    '{"type": ["double", "null"], "name": "Sepal_Length"},',
    '{"type": ["double", "null"], "name": "Sepal_Width"},',
    '{"type": ["double", "null"], "name": "Petal_Length"},',
    '{"type": ["double", "null"], "name": "Petal_Width"},',
    '{"type": ["string", "null"], "name": "Species"}]}'
  ),
  collapse="\\n"
)

df_serialized <- select(
  df,
  alias(to_avro(alias(struct(column("*")), "fields")), "payload")
)

df_deserialized <- select(
  df_serialized,
  from_avro(df_serialized$payload, schema)
)

head(df_deserialized)
}