An interface to represent data distribution requirement, which specifies how the records should
be distributed among the data partitions(one
DataReader
outputs data for one partition).
Note that this interface has nothing to do with the data ordering inside one
partition(the output records of a single
DataReader
).
The instance of this interface is created and provided by Spark, then consumed by
Partitioning.satisfy(Distribution)
. This means data source developers don't need to
implement this interface, but need to catch as more concrete implementations of this interface
as possible in
Partitioning.satisfy(Distribution)
.
Concrete implementations until now: