Interface ShufflePartitionWriter
This writer stores bytes for one (mapper, reducer) pair, corresponding to one shuffle block.
- Since:
- 3.0.0
-
Method Summary
Modifier and TypeMethodDescriptionlong
Returns the number of bytes written either by this writer's output stream opened byopenStream()
or the byte channel opened byopenChannelWrapper()
.default Optional<WritableByteChannelWrapper>
Opens and returns aWritableByteChannelWrapper
for transferring bytes from input byte channels to the underlying shuffle data store.Open and return anOutputStream
that can write bytes to the underlying data store.
-
Method Details
-
openStream
Open and return anOutputStream
that can write bytes to the underlying data store.This method will only be called once on this partition writer in the map task, to write the bytes to the partition. The output stream will only be used to write the bytes for this partition. The map task closes this output stream upon writing all the bytes for this block, or if the write fails for any reason.
Implementations that intend on combining the bytes for all the partitions written by this map task should reuse the same OutputStream instance across all the partition writers provided by the parent
ShuffleMapOutputWriter
. If one does so, ensure thatOutputStream.close()
does not close the resource, since it will be reused across partition writes. The underlying resources should be cleaned up inShuffleMapOutputWriter.commitAllPartitions(long[])
andShuffleMapOutputWriter.abort(Throwable)
.- Throws:
IOException
-
openChannelWrapper
Opens and returns aWritableByteChannelWrapper
for transferring bytes from input byte channels to the underlying shuffle data store.This method will only be called once on this partition writer in the map task, to write the bytes to the partition. The channel will only be used to write the bytes for this partition. The map task closes this channel upon writing all the bytes for this block, or if the write fails for any reason.
Implementations that intend on combining the bytes for all the partitions written by this map task should reuse the same channel instance across all the partition writers provided by the parent
ShuffleMapOutputWriter
. If one does so, ensure thatCloseable.close()
does not close the resource, since the channel will be reused across partition writes. The underlying resources should be cleaned up inShuffleMapOutputWriter.commitAllPartitions(long[])
andShuffleMapOutputWriter.abort(Throwable)
.This method is primarily for advanced optimizations where bytes can be copied from the input spill files to the output channel without copying data into memory. If such optimizations are not supported, the implementation should return
Optional.empty()
. By default, the implementation returnsOptional.empty()
.Note that the returned
WritableByteChannelWrapper
itself is closed, but not the underlying channel that is returned byWritableByteChannelWrapper.channel()
. Ensure that the underlying channel is cleaned up inCloseable.close()
,ShuffleMapOutputWriter.commitAllPartitions(long[])
, orShuffleMapOutputWriter.abort(Throwable)
.- Throws:
IOException
-
getNumBytesWritten
long getNumBytesWritten()Returns the number of bytes written either by this writer's output stream opened byopenStream()
or the byte channel opened byopenChannelWrapper()
.This can be different from the number of bytes given by the caller. For example, the stream might compress or encrypt the bytes before persisting the data to the backing data store.
-