pyspark.streaming.StreamingContext.binaryRecordsStream¶

StreamingContext.binaryRecordsStream(directory: str, recordLength: int) → pyspark.streaming.dstream.DStream[bytes][source]¶

Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as flat binary files with records of fixed length. Files must be written to the monitored directory by “moving” them from another location within the same file system. File names starting with . are ignored.

Parameters

directorystr: Directory to load data from
recordLengthint: Length of each record in bytes

pyspark.streaming.StreamingContext.union

pyspark.streaming.StreamingContext.queueStream