public class InputFormatInfo
extends Object
implements org.apache.spark.internal.Logging
| Constructor and Description | 
|---|
| InputFormatInfo(org.apache.hadoop.conf.Configuration configuration,
               Class<?> inputFormatClazz,
               String path) | 
| Modifier and Type | Method and Description | 
|---|---|
| static scala.collection.immutable.Map<String,scala.collection.immutable.Set<SplitInfo>> | computePreferredLocations(scala.collection.Seq<InputFormatInfo> formats)Computes the preferred locations based on input(s) and returned a location to block map. | 
| org.apache.hadoop.conf.Configuration | configuration() | 
| boolean | equals(Object other) | 
| int | hashCode() | 
| Class<?> | inputFormatClazz() | 
| boolean | mapredInputFormat() | 
| boolean | mapreduceInputFormat() | 
| String | path() | 
| String | toString() | 
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitializepublic InputFormatInfo(org.apache.hadoop.conf.Configuration configuration,
                       Class<?> inputFormatClazz,
                       String path)
public static scala.collection.immutable.Map<String,scala.collection.immutable.Set<SplitInfo>> computePreferredLocations(scala.collection.Seq<InputFormatInfo> formats)
a) For each host, count number of splits hosted on that host. b) Decrement the currently allocated containers on that host. c) Compute rack info for each host and update rack to count map based on (b). d) Allocate nodes based on (c) e) On the allocation result, ensure that we don't allocate "too many" jobs on a single node (even if data locality on that is very high) : this is to prevent fragility of job if a single (or small set of) hosts go down.
go to (a) until required nodes are allocated.
If a node 'dies', follow same procedure.
PS: I know the wording here is weird, hopefully it makes some sense !
formats - (undocumented)public org.apache.hadoop.conf.Configuration configuration()
public Class<?> inputFormatClazz()
public String path()
public boolean mapreduceInputFormat()
public boolean mapredInputFormat()
public String toString()
toString in class Objectpublic int hashCode()
hashCode in class Objectpublic boolean equals(Object other)
equals in class Object