Class InputFormatInfo

Object
org.apache.spark.scheduler.InputFormatInfo
All Implemented Interfaces:
org.apache.spark.internal.Logging

public class InputFormatInfo extends Object implements org.apache.spark.internal.Logging
:: DeveloperApi :: Parses and holds information about inputFormat (and files) specified as a parameter.
  • Nested Class Summary

    Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

    org.apache.spark.internal.Logging.SparkShellLoggingFilter
  • Constructor Summary

    Constructors
    Constructor
    Description
    InputFormatInfo(org.apache.hadoop.conf.Configuration configuration, Class<?> inputFormatClazz, String path)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static scala.collection.immutable.Map<String,scala.collection.immutable.Set<SplitInfo>>
    computePreferredLocations(scala.collection.Seq<InputFormatInfo> formats)
    Computes the preferred locations based on input(s) and returned a location to block map.
    org.apache.hadoop.conf.Configuration
     
    boolean
    equals(Object other)
     
    int
     
     
    boolean
     
    boolean
     
     
     

    Methods inherited from class java.lang.Object

    getClass, notify, notifyAll, wait, wait, wait

    Methods inherited from interface org.apache.spark.internal.Logging

    initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq
  • Constructor Details

    • InputFormatInfo

      public InputFormatInfo(org.apache.hadoop.conf.Configuration configuration, Class<?> inputFormatClazz, String path)
  • Method Details

    • computePreferredLocations

      public static scala.collection.immutable.Map<String,scala.collection.immutable.Set<SplitInfo>> computePreferredLocations(scala.collection.Seq<InputFormatInfo> formats)
      Computes the preferred locations based on input(s) and returned a location to block map. Typical use of this method for allocation would follow some algo like this:

      a) For each host, count number of splits hosted on that host. b) Decrement the currently allocated containers on that host. c) Compute rack info for each host and update rack to count map based on (b). d) Allocate nodes based on (c) e) On the allocation result, ensure that we don't allocate "too many" jobs on a single node (even if data locality on that is very high) : this is to prevent fragility of job if a single (or small set of) hosts go down.

      go to (a) until required nodes are allocated.

      If a node 'dies', follow same procedure.

      PS: I know the wording here is weird, hopefully it makes some sense !

      Parameters:
      formats - (undocumented)
      Returns:
      (undocumented)
    • configuration

      public org.apache.hadoop.conf.Configuration configuration()
    • inputFormatClazz

      public Class<?> inputFormatClazz()
    • path

      public String path()
    • mapreduceInputFormat

      public boolean mapreduceInputFormat()
    • mapredInputFormat

      public boolean mapredInputFormat()
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • equals

      public boolean equals(Object other)
      Overrides:
      equals in class Object