@Evolving
public interface Scan
This logical representation is shared between batch scan, micro-batch streaming scan and
continuous streaming scan. Data sources must implement the corresponding methods in this
interface, to match what the table promises to support. For example, toBatch()
must be
implemented, if the Table
that creates this Scan
returns
TableCapability.BATCH_READ
support in its Table.capabilities()
.
Modifier and Type | Interface and Description |
---|---|
static class |
Scan.ColumnarSupportMode
This enum defines how the columnar support for the partitions of the data source
should be determined.
|
Modifier and Type | Method and Description |
---|---|
default Scan.ColumnarSupportMode |
columnarSupportMode()
Subclasses can implement this method to indicate if the support for columnar data should
be determined by each partition or is set as a default for the whole scan.
|
default String |
description()
A description string of this scan, which may includes information like: what filters are
configured for this scan, what's the value of some important options like path, etc.
|
StructType |
readSchema()
Returns the actual schema of this data source scan, which may be different from the physical
schema of the underlying storage, as column pruning or other optimizations may happen.
|
default CustomTaskMetric[] |
reportDriverMetrics()
Returns an array of custom metrics which are collected with values at the driver side only.
|
default CustomMetric[] |
supportedCustomMetrics()
Returns an array of supported custom metrics with name and description.
|
default Batch |
toBatch()
Returns the physical representation of this scan for batch query.
|
default ContinuousStream |
toContinuousStream(String checkpointLocation)
Returns the physical representation of this scan for streaming query with continuous mode.
|
default MicroBatchStream |
toMicroBatchStream(String checkpointLocation)
Returns the physical representation of this scan for streaming query with micro-batch mode.
|
StructType readSchema()
default String description()
readSchema()
, as Spark already knows it.
By default this returns the class name of the implementation. Please override it to provide a meaningful description.
default Batch toBatch()
Table
that creates this scan returns TableCapability.BATCH_READ
support in its
Table.capabilities()
.
If the scan supports runtime filtering and implements SupportsRuntimeFiltering
,
this method may be called multiple times. Therefore, implementations can cache some state
to avoid planning the job twice.
UnsupportedOperationException
default MicroBatchStream toMicroBatchStream(String checkpointLocation)
Table
that creates this scan returns
TableCapability.MICRO_BATCH_READ
support in its Table.capabilities()
.checkpointLocation
- a path to Hadoop FS scratch space that can be used for failure
recovery. Data streams for the same logical source in the same query
will be given the same checkpointLocation.UnsupportedOperationException
default ContinuousStream toContinuousStream(String checkpointLocation)
Table
that creates this scan returns
TableCapability.CONTINUOUS_READ
support in its Table.capabilities()
.checkpointLocation
- a path to Hadoop FS scratch space that can be used for failure
recovery. Data streams for the same logical source in the same query
will be given the same checkpointLocation.UnsupportedOperationException
default CustomMetric[] supportedCustomMetrics()
default CustomTaskMetric[] reportDriverMetrics()
default Scan.ColumnarSupportMode columnarSupportMode()