Packages

  • package root
    Definition Classes
    root
  • package org
    Definition Classes
    root
  • package apache
    Definition Classes
    org
  • package spark

    Core Spark functionality.

    Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.

    In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.

    Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.

    Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.

    Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.

    Definition Classes
    apache
  • package sql

    Allows the execution of relational queries, including those expressed in SQL using Spark.

    Allows the execution of relational queries, including those expressed in SQL using Spark.

    Definition Classes
    spark
  • package api

    Contains API classes that are specific to a single language (i.e.

    Contains API classes that are specific to a single language (i.e. Java).

    Definition Classes
    sql
  • package artifact
    Definition Classes
    sql
  • package avro
    Definition Classes
    sql
  • package catalog
    Definition Classes
    sql
  • package catalyst
    Definition Classes
    sql
  • package columnar
    Definition Classes
    sql
  • CachedBatch
  • CachedBatchSerializer
  • SimpleMetricsCachedBatch
  • SimpleMetricsCachedBatchSerializer
  • package connector
    Definition Classes
    sql
  • package exceptions
    Definition Classes
    sql
  • package expressions
    Definition Classes
    sql
  • package jdbc
    Definition Classes
    sql
  • package ml
    Definition Classes
    sql
  • package scripting
    Definition Classes
    sql
  • package sources

    A set of APIs for adding data sources to Spark SQL.

    A set of APIs for adding data sources to Spark SQL.

    Definition Classes
    sql
  • package streaming
    Definition Classes
    sql
  • package types

    Contains a type system for attributes produced by relations, including complex types like structs, arrays and maps.

    Contains a type system for attributes produced by relations, including complex types like structs, arrays and maps.

    Definition Classes
    sql
  • package util
    Definition Classes
    sql
  • package vectorized
    Definition Classes
    sql

package columnar

Type Members

  1. trait CachedBatch extends AnyRef

    Basic interface that all cached batches of data must support.

    Basic interface that all cached batches of data must support. This is primarily to allow for metrics to be handled outside of the encoding and decoding steps in a standard way.

    Annotations
    @DeveloperApi() @Since("3.1.0")
  2. trait CachedBatchSerializer extends Serializable

    Provides APIs that handle transformations of SQL data associated with the cache/persist APIs.

    Provides APIs that handle transformations of SQL data associated with the cache/persist APIs.

    Annotations
    @DeveloperApi() @Since("3.1.0")
  3. trait SimpleMetricsCachedBatch extends CachedBatch

    A CachedBatch that stores some simple metrics that can be used for filtering of batches with the SimpleMetricsCachedBatchSerializer.

    A CachedBatch that stores some simple metrics that can be used for filtering of batches with the SimpleMetricsCachedBatchSerializer. The metrics are returned by the stats value. For each column in the batch 5 columns of metadata are needed in the row.

    Annotations
    @DeveloperApi() @Since("3.1.0")
  4. abstract class SimpleMetricsCachedBatchSerializer extends CachedBatchSerializer with Logging

    Provides basic filtering for CachedBatchSerializer implementations.

    Provides basic filtering for CachedBatchSerializer implementations. The requirement to extend this is that all of the batches produced by your serializer are instances of SimpleMetricsCachedBatch. This does not calculate the metrics needed to be stored in the batches. That is up to each implementation. The metrics required are really just min and max values and those are optional especially for complex types. Because those metrics are simple and it is likely that compression will also be done on the data we thought it best to let each implementation decide on the most efficient way to calculate the metrics, possibly combining them with compression passes that might also be done across the data.

    Annotations
    @DeveloperApi() @Since("3.1.0")

Ungrouped