InputFormatInfo (Spark 1.4.1 JavaDoc)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.spark.scheduler
Class InputFormatInfo

Object
  org.apache.spark.scheduler.InputFormatInfo

All Implemented Interfaces:: Logging

public class InputFormatInfo
extends Object
implements Logging
extends Object
implements Logging

:: DeveloperApi :: Parses and holds information about inputFormat (and files) specified as a parameter.

Constructor Summary
`InputFormatInfo(org.apache.hadoop.conf.Configuration configuration, Class<?> inputFormatClazz, String path)`

Method Summary
`static scala.collection.immutable.Map<String,scala.collection.immutable.Set<SplitInfo>>`	`computePreferredLocations(scala.collection.Seq<InputFormatInfo> formats)` Computes the preferred locations based on input(s) and returned a location to block map.
`org.apache.hadoop.conf.Configuration`	`configuration()`
`boolean`	`equals(Object other)`
`int`	`hashCode()`
`Object`	`inputFormatClazz()`
`boolean`	`mapredInputFormat()`
`boolean`	`mapreduceInputFormat()`
`String`	`path()`
`String`	`toString()`

Methods inherited from class Object
`getClass, notify, notifyAll, wait, wait, wait`

Methods inherited from interface org.apache.spark.Logging
`initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning`

Constructor Detail

InputFormatInfo

public InputFormatInfo(org.apache.hadoop.conf.Configuration configuration,
                       Class<?> inputFormatClazz,
                       String path)

Method Detail

computePreferredLocations

public static scala.collection.immutable.Map<String,scala.collection.immutable.Set<SplitInfo>> computePreferredLocations(scala.collection.Seq<InputFormatInfo> formats)

Computes the preferred locations based on input(s) and returned a location to block map. Typical use of this method for allocation would follow some algo like this:

a) For each host, count number of splits hosted on that host. b) Decrement the currently allocated containers on that host. c) Compute rack info for each host and update rack -> count map based on (b). d) Allocate nodes based on (c) e) On the allocation result, ensure that we dont allocate "too many" jobs on a single node (even if data locality on that is very high) : this is to prevent fragility of job if a single (or small set of) hosts go down.

go to (a) until required nodes are allocated.

If a node 'dies', follow same procedure.

PS: I know the wording here is weird, hopefully it makes some sense !

Parameters:: formats - (undocumented)
Returns:: (undocumented)

configuration

public org.apache.hadoop.conf.Configuration configuration()

inputFormatClazz

public Object inputFormatClazz()

path

public String path()

mapreduceInputFormat

public boolean mapreduceInputFormat()

mapredInputFormat

public boolean mapredInputFormat()

toString

public String toString()

Overrides:: toString in class Object

hashCode

public int hashCode()

Overrides:: hashCode in class Object

equals

public boolean equals(Object other)

Overrides:: equals in class Object