Package org.apache.spark.scheduler
Class InputFormatInfo
Object
org.apache.spark.scheduler.InputFormatInfo
- All Implemented Interfaces:
- org.apache.spark.internal.Logging
:: DeveloperApi ::
 Parses and holds information about inputFormat (and files) specified as a parameter.
- 
Nested Class SummaryNested classes/interfaces inherited from interface org.apache.spark.internal.Loggingorg.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
- 
Constructor SummaryConstructorsConstructorDescriptionInputFormatInfo(org.apache.hadoop.conf.Configuration configuration, Class<?> inputFormatClazz, String path) 
- 
Method SummaryModifier and TypeMethodDescriptioncomputePreferredLocations(scala.collection.immutable.Seq<InputFormatInfo> formats) Computes the preferred locations based on input(s) and returned a location to block map.org.apache.hadoop.conf.ConfigurationbooleaninthashCode()Class<?>booleanbooleanpath()toString()Methods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, MDC, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
- 
Constructor Details- 
InputFormatInfo
 
- 
- 
Method Details- 
computePreferredLocationspublic static scala.collection.immutable.Map<String,scala.collection.immutable.Set<SplitInfo>> computePreferredLocations(scala.collection.immutable.Seq<InputFormatInfo> formats) Computes the preferred locations based on input(s) and returned a location to block map. Typical use of this method for allocation would follow some algo like this:a) For each host, count number of splits hosted on that host. b) Decrement the currently allocated containers on that host. c) Compute rack info for each host and update rack to count map based on (b). d) Allocate nodes based on (c) e) On the allocation result, ensure that we don't allocate "too many" jobs on a single node (even if data locality on that is very high) : this is to prevent fragility of job if a single (or small set of) hosts go down. go to (a) until required nodes are allocated. If a node 'dies', follow same procedure. PS: I know the wording here is weird, hopefully it makes some sense ! - Parameters:
- formats- (undocumented)
- Returns:
- (undocumented)
 
- 
configurationpublic org.apache.hadoop.conf.Configuration configuration()
- 
inputFormatClazz
- 
path
- 
mapreduceInputFormatpublic boolean mapreduceInputFormat()
- 
mapredInputFormatpublic boolean mapredInputFormat()
- 
toString
- 
hashCodepublic int hashCode()
- 
equals
 
-