Class PortableDataStream

Object
org.apache.spark.input.PortableDataStream
All Implemented Interfaces:
Serializable, scala.Serializable

public class PortableDataStream extends Object implements scala.Serializable
A class that allows DataStreams to be serialized and moved around by not creating them until they need to be read
See Also:
Note:
TaskAttemptContext is not serializable resulting in the confBytes construct, CombineFileSplit is not serializable resulting in the splitBytes construct
  • Constructor Details

    • PortableDataStream

      public PortableDataStream(org.apache.hadoop.mapreduce.lib.input.CombineFileSplit isplit, org.apache.hadoop.mapreduce.TaskAttemptContext context, Integer index)
  • Method Details

    • getConfiguration

      public org.apache.hadoop.conf.Configuration getConfiguration()
    • getPath

      public String getPath()
    • open

      public DataInputStream open()
      Create a new DataInputStream from the split and context. The user of this method is responsible for closing the stream after usage.
      Returns:
      (undocumented)
    • toArray

      public byte[] toArray()
      Read the file as a byte array
      Returns:
      (undocumented)