org.apache.spark.sql.parquet

ParquetRelation2

case class ParquetRelation2(path: String)(sqlContext: SQLContext) extends CatalystScan with Logging with Product with Serializable

An alternative to ParquetRelation that plugs in using the data sources API. This class is currently not intended as a full replacement of the parquet support in Spark SQL though it is likely that it will eventually subsume the existing physical plan implementation.

Compared with the current implementation, this class has the following notable differences:

Partitioning: Partitions are auto discovered and must be in the form of directories key=value/ located at path. Currently only a single partitioning column is supported and it must be an integer. This class supports both fully self-describing data, which contains the partition key, and data where the partition key is only present in the folder structure. The presence of the partitioning key in the data is also auto-detected. The null partition is not yet supported.

Metadata: The metadata is automatically discovered by reading the first parquet file present. There is currently no support for working with files that have different schema. Additionally, when parquet metadata caching is turned on, the FileStatus objects for all data will be cached to improve the speed of interactive querying. When data is added to a table it must be dropped and recreated to pick up any changes.

Statistics: Statistics for the size of the table are automatically populated during metadata discovery.

Annotations
@DeveloperApi()
Linear Supertypes
Serializable, Serializable, Product, Equals, Logging, CatalystScan, BaseRelation, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. ParquetRelation2
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. Logging
  7. CatalystScan
  8. BaseRelation
  9. AnyRef
  10. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new ParquetRelation2(path: String)(sqlContext: SQLContext)

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def buildScan(output: Seq[Attribute], predicates: Seq[Expression]): RDD[Row]

    Definition Classes
    ParquetRelation2CatalystScan
  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. val dataIncludesKey: Boolean

  10. val dataSchema: catalyst.types.StructType

  11. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  12. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  14. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  15. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  16. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  17. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  18. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  19. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  20. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  21. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  22. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  23. def logName: String

    Attributes
    protected
    Definition Classes
    Logging
  24. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  25. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  26. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  27. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  28. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  29. final def notify(): Unit

    Definition Classes
    AnyRef
  30. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  31. val path: String

  32. val schema: catalyst.types.StructType

    Definition Classes
    ParquetRelation2BaseRelation
  33. val sizeInBytes: Long

    Returns an estimated size of this relation in bytes.

    Returns an estimated size of this relation in bytes. This information is used by the planner to decided when it is safe to broadcast a relation and can be overridden by sources that know the size ahead of time. By default, the system will assume that tables are too large to broadcast. This method will be called multiple times during query planning and thus should not perform expensive operations for each invocation.

    Definition Classes
    ParquetRelation2BaseRelation
  34. def sparkContext: SparkContext

  35. val sqlContext: SQLContext

    Definition Classes
    ParquetRelation2BaseRelation
  36. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  37. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  38. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from Logging

Inherited from CatalystScan

Inherited from BaseRelation

Inherited from AnyRef

Inherited from Any

Ungrouped