Scan (Spark 3.5.5 JavaDoc)

All Known Subinterfaces:

LocalScan, SupportsReportOrdering, SupportsReportPartitioning, SupportsReportStatistics, SupportsRuntimeFiltering, SupportsRuntimeV2Filtering, V1Scan
```
@Evolving
public interface Scan
```
A logical representation of a data source scan. This interface is used to provide logical information, like what the actual read schema is.
This logical representation is shared between batch scan, micro-batch streaming scan and continuous streaming scan. Data sources must implement the corresponding methods in this interface, to match what the table promises to support. For example, toBatch() must be implemented, if the Table that creates this Scan returns TableCapability.BATCH_READ support in its Table.capabilities().

Since:

3.0.0

Nested Class Summary

Nested Classes
Modifier and Type	Interface and Description
`static class`	`Scan.ColumnarSupportMode` This enum defines how the columnar support for the partitions of the data source should be determined.

Method Summary

All Methods Instance Methods Abstract Methods Default Methods
Modifier and Type	Method and Description
`default Scan.ColumnarSupportMode`	`columnarSupportMode()` Subclasses can implement this method to indicate if the support for columnar data should be determined by each partition or is set as a default for the whole scan.
`default String`	`description()` A description string of this scan, which may includes information like: what filters are configured for this scan, what's the value of some important options like path, etc.
`StructType`	`readSchema()` Returns the actual schema of this data source scan, which may be different from the physical schema of the underlying storage, as column pruning or other optimizations may happen.
`default CustomTaskMetric[]`	`reportDriverMetrics()` Returns an array of custom metrics which are collected with values at the driver side only.
`default CustomMetric[]`	`supportedCustomMetrics()` Returns an array of supported custom metrics with name and description.
`default Batch`	`toBatch()` Returns the physical representation of this scan for batch query.
`default ContinuousStream`	`toContinuousStream(String checkpointLocation)` Returns the physical representation of this scan for streaming query with continuous mode.
`default MicroBatchStream`	`toMicroBatchStream(String checkpointLocation)` Returns the physical representation of this scan for streaming query with micro-batch mode.

- Method Detail
  - readSchema
```
StructType readSchema()
```
    Returns the actual schema of this data source scan, which may be different from the physical schema of the underlying storage, as column pruning or other optimizations may happen.
  - description
```
default String description()
```
    A description string of this scan, which may includes information like: what filters are configured for this scan, what's the value of some important options like path, etc. The description doesn't need to include readSchema(), as Spark already knows it.
    By default this returns the class name of the implementation. Please override it to provide a meaningful description.
  - toBatch
```
default Batch toBatch()
```
    Returns the physical representation of this scan for batch query. By default this method throws exception, data sources must overwrite this method to provide an implementation, if the Table that creates this scan returns TableCapability.BATCH_READ support in its Table.capabilities().
    If the scan supports runtime filtering and implements SupportsRuntimeFiltering, this method may be called multiple times. Therefore, implementations can cache some state to avoid planning the job twice.
    
    Throws:
    
    UnsupportedOperationException
  - toMicroBatchStream
```
default MicroBatchStream toMicroBatchStream(String checkpointLocation)
```
    Returns the physical representation of this scan for streaming query with micro-batch mode. By default this method throws exception, data sources must overwrite this method to provide an implementation, if the Table that creates this scan returns TableCapability.MICRO_BATCH_READ support in its Table.capabilities().
    
    Parameters:
    
    checkpointLocation - a path to Hadoop FS scratch space that can be used for failure recovery. Data streams for the same logical source in the same query will be given the same checkpointLocation.
    
    Throws:
    
    UnsupportedOperationException
  - toContinuousStream
```
default ContinuousStream toContinuousStream(String checkpointLocation)
```
    Returns the physical representation of this scan for streaming query with continuous mode. By default this method throws exception, data sources must overwrite this method to provide an implementation, if the Table that creates this scan returns TableCapability.CONTINUOUS_READ support in its Table.capabilities().
    
    Parameters:
    
    checkpointLocation - a path to Hadoop FS scratch space that can be used for failure recovery. Data streams for the same logical source in the same query will be given the same checkpointLocation.
    
    Throws:
    
    UnsupportedOperationException
  - supportedCustomMetrics
```
default CustomMetric[] supportedCustomMetrics()
```
    Returns an array of supported custom metrics with name and description. By default it returns empty array.
  - reportDriverMetrics
```
default CustomTaskMetric[] reportDriverMetrics()
```
    Returns an array of custom metrics which are collected with values at the driver side only. Note that these metrics must be included in the supported custom metrics reported by `supportedCustomMetrics`.
  - columnarSupportMode
```
default Scan.ColumnarSupportMode columnarSupportMode()
```
    Subclasses can implement this method to indicate if the support for columnar data should be determined by each partition or is set as a default for the whole scan.
    
    Since:
    
    3.5.0

Interface Scan

Nested Class Summary

Method Summary

Method Detail

readSchema

description

toBatch

toMicroBatchStream

toContinuousStream

supportedCustomMetrics

reportDriverMetrics

columnarSupportMode