org.apache.spark.sql.sources
Class BaseRelation

Object
  extended by org.apache.spark.sql.sources.BaseRelation
Direct Known Subclasses:
HadoopFsRelation

public abstract class BaseRelation
extends Object

::DeveloperApi:: Represents a collection of tuples with a known schema. Classes that extend BaseRelation must be able to produce the schema of their data in the form of a StructType. Concrete implementation should inherit from one of the descendant Scan classes, which define various abstract methods for execution.

BaseRelations must also define a equality function that only returns true when the two instances will return the same data. This equality function is used when determining when it is safe to substitute cached results for a given relation.

Since:
1.3.0

Constructor Summary
BaseRelation()
           
 
Method Summary
 boolean needConversion()
          Whether does it need to convert the objects in Row to internal representation, for example: java.lang.String -> UTF8String java.lang.Decimal -> Decimal
abstract  StructType schema()
           
 long sizeInBytes()
          Returns an estimated size of this relation in bytes.
abstract  SQLContext sqlContext()
           
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BaseRelation

public BaseRelation()
Method Detail

sqlContext

public abstract SQLContext sqlContext()

schema

public abstract StructType schema()

sizeInBytes

public long sizeInBytes()
Returns an estimated size of this relation in bytes. This information is used by the planner to decided when it is safe to broadcast a relation and can be overridden by sources that know the size ahead of time. By default, the system will assume that tables are too large to broadcast. This method will be called multiple times during query planning and thus should not perform expensive operations for each invocation.

Note that it is always better to overestimate size than underestimate, because underestimation could lead to execution plans that are suboptimal (i.e. broadcasting a very large table).

Returns:
(undocumented)
Since:
1.3.0

needConversion

public boolean needConversion()
Whether does it need to convert the objects in Row to internal representation, for example: java.lang.String -> UTF8String java.lang.Decimal -> Decimal

Note: The internal representation is not stable across releases and thus data sources outside of Spark SQL should leave this as true.

Returns:
(undocumented)
Since:
1.4.0