Object

org.apache.spark.scheduler.InputFormatInfo

All Implemented Interfaces:: org.apache.spark.internal.Logging

public class InputFormatInfo extends Object implements org.apache.spark.internal.Logging

:: DeveloperApi :: Parses and holds information about inputFormat (and files) specified as a parameter.

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.SparkShellLoggingFilter
Constructor Summary

Constructors

Constructor

Description

InputFormatInfo(org.apache.hadoop.conf.Configuration configuration, Class<?> inputFormatClazz, String path)
Method Summary

Modifier and Type

Method

Description

static scala.collection.immutable.Map<String,scala.collection.immutable.Set<SplitInfo>>

computePreferredLocations(scala.collection.Seq<InputFormatInfo> formats)

Computes the preferred locations based on input(s) and returned a location to block map.

org.apache.hadoop.conf.Configuration

configuration()

boolean

equals(Object other)

int

hashCode()

Class<?>

inputFormatClazz()

boolean

mapredInputFormat()

boolean

mapreduceInputFormat()

String

path()

String

toString()

Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq

Constructor Details
- InputFormatInfo
  
  public InputFormatInfo(org.apache.hadoop.conf.Configuration configuration, Class<?> inputFormatClazz, String path)
Method Details
- computePreferredLocations
  
  public static scala.collection.immutable.Map<String,scala.collection.immutable.Set<SplitInfo>> computePreferredLocations(scala.collection.Seq<InputFormatInfo> formats)
  
  Computes the preferred locations based on input(s) and returned a location to block map. Typical use of this method for allocation would follow some algo like this:
  a) For each host, count number of splits hosted on that host. b) Decrement the currently allocated containers on that host. c) Compute rack info for each host and update rack to count map based on (b). d) Allocate nodes based on (c) e) On the allocation result, ensure that we don't allocate "too many" jobs on a single node (even if data locality on that is very high) : this is to prevent fragility of job if a single (or small set of) hosts go down.
  go to (a) until required nodes are allocated.
  If a node 'dies', follow same procedure.
  PS: I know the wording here is weird, hopefully it makes some sense !
  
  Parameters:
  
  formats - (undocumented)
  
  Returns:
  
  (undocumented)
- configuration
  
  public org.apache.hadoop.conf.Configuration configuration()
- inputFormatClazz
  
  public Class<?> inputFormatClazz()
- path
  
  public String path()
- mapreduceInputFormat
  
  public boolean mapreduceInputFormat()
- mapredInputFormat
  
  public boolean mapredInputFormat()
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
- hashCode
  
  public int hashCode()
  
  Overrides:
  
  hashCode in class Object
- equals
  
  public boolean equals(Object other)
  
  Overrides:
  
  equals in class Object

Class InputFormatInfo

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.internal.Logging

Constructor Details

InputFormatInfo

Method Details

computePreferredLocations

configuration

inputFormatClazz

path

mapreduceInputFormat

mapredInputFormat

toString

hashCode

equals