org.apache.spark.rdd.JdbcRDD<T>

All Implemented Interfaces:: Serializable, org.apache.spark.internal.Logging

public class JdbcRDD<T> extends RDD<T> implements org.apache.spark.internal.Logging

An RDD that executes a SQL query on a JDBC connection and reads results. For usage example, see test case JdbcRDDSuite.

param: getConnection a function that returns an open Connection. The RDD takes care of closing the connection. param: sql the text of the query. The query must contain two ? placeholders for parameters used to partition the results. For example,


   select title, author from books where ? <= id and id <= ?

param: lowerBound the minimum value of the first placeholder param: upperBound the maximum value of the second placeholder The lower and upper bounds are inclusive. param: numPartitions the number of partitions. Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2, the query would be executed twice, once with (1, 10) and once with (11, 20) param: mapRow a function from a ResultSet to a single row of the desired result type(s). This should only call getInt, getString, etc; the RDD takes care of calling next. The default maps a ResultSet to an array of Object.

See Also:

Serialized Form

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static interface

JdbcRDD.ConnectionFactory

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
Constructor Summary

Constructors

Constructor

Description

JdbcRDD(SparkContext sc, scala.Function0<Connection> getConnection, String sql, long lowerBound, long upperBound, int numPartitions, scala.Function1<ResultSet,T> mapRow, scala.reflect.ClassTag<T> evidence$1)
Method Summary

Modifier and Type

Method

Description

scala.collection.Iterator<T>

compute(Partition thePart, TaskContext context)

Developer API Implemented by subclasses to compute a given partition.

static JavaRDD<Object[]>

create(JavaSparkContext sc, JdbcRDD.ConnectionFactory connectionFactory, String sql, long lowerBound, long upperBound, int numPartitions)

Create an RDD that executes a SQL query on a JDBC connection and reads results.

static <T> JavaRDD<T>

create(JavaSparkContext sc, JdbcRDD.ConnectionFactory connectionFactory, String sql, long lowerBound, long upperBound, int numPartitions, Function<ResultSet,T> mapRow)

Create an RDD that executes a SQL query on a JDBC connection and reads results.

Partition[]

getPartitions()

static Object[]

resultSetToObjectArray(ResultSet rs)

Methods inherited from class org.apache.spark.rdd.RDD
aggregate, barrier, cache, cartesian, checkpoint, cleanShuffleDependencies, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, filter, first, flatMap, fold, foreach, foreachPartition, getCheckpointFile, getNumPartitions, getResourceProfile, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, localCheckpoint, map, mapPartitions, mapPartitionsWithEvaluator, mapPartitionsWithIndex, max, min, name, numericRDDToDoubleRDDFunctions, partitioner, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeAggregate, treeReduce, union, unpersist, withResources, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitionsWithEvaluator, zipWithIndex, zipWithUniqueId

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext

Constructor Details
- JdbcRDD
  
  public JdbcRDD(SparkContext sc, scala.Function0<Connection> getConnection, String sql, long lowerBound, long upperBound, int numPartitions, scala.Function1<ResultSet,T> mapRow, scala.reflect.ClassTag<T> evidence$1)
Method Details
- resultSetToObjectArray
  
  public static Object[] resultSetToObjectArray(ResultSet rs)
- create
  
  public static <T> JavaRDD<T> create(JavaSparkContext sc, JdbcRDD.ConnectionFactory connectionFactory, String sql, long lowerBound, long upperBound, int numPartitions, Function<ResultSet,T> mapRow)
  
  Create an RDD that executes a SQL query on a JDBC connection and reads results. For usage example, see test case JavaAPISuite.testJavaJdbcRDD.
  Parameters:
  
  connectionFactory - a factory that returns an open Connection. The RDD takes care of closing the connection.
  
  sql - the text of the query. The query must contain two ? placeholders for parameters used to partition the results. For example,
  select title, author from books where ? <= id and id <= ?
  
  lowerBound - the minimum value of the first placeholder
  
  upperBound - the maximum value of the second placeholder The lower and upper bounds are inclusive.
  
  numPartitions - the number of partitions. Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2, the query would be executed twice, once with (1, 10) and once with (11, 20)
  
  mapRow - a function from a ResultSet to a single row of the desired result type(s). This should only call getInt, getString, etc; the RDD takes care of calling next. The default maps a ResultSet to an array of Object.
  
  sc - (undocumented)
  
  Returns:
  
  (undocumented)
- create
  
  public static JavaRDD<Object[]> create(JavaSparkContext sc, JdbcRDD.ConnectionFactory connectionFactory, String sql, long lowerBound, long upperBound, int numPartitions)
  
  Create an RDD that executes a SQL query on a JDBC connection and reads results. Each row is converted into a Object array. For usage example, see test case JavaAPISuite.testJavaJdbcRDD.
  Parameters:
  
  connectionFactory - a factory that returns an open Connection. The RDD takes care of closing the connection.
  
  sql - the text of the query. The query must contain two ? placeholders for parameters used to partition the results. For example,
  select title, author from books where ? <= id and id <= ?
  
  lowerBound - the minimum value of the first placeholder
  
  upperBound - the maximum value of the second placeholder The lower and upper bounds are inclusive.
  
  numPartitions - the number of partitions. Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2, the query would be executed twice, once with (1, 10) and once with (11, 20)
  
  sc - (undocumented)
  
  Returns:
  
  (undocumented)
- getPartitions
  
  public Partition[] getPartitions()
- compute
  
  public scala.collection.Iterator<T> compute(Partition thePart, TaskContext context)
  
  Description copied from class: RDD
  
  Developer API Implemented by subclasses to compute a given partition.
  
  Specified by:
  
  compute in class RDD<T>
  
  Parameters:
  
  thePart - (undocumented)
  
  context - (undocumented)
  
  Returns:
  
  (undocumented)

Class JdbcRDD<T>

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.rdd.RDD

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.internal.Logging

Constructor Details

JdbcRDD

Method Details

resultSetToObjectArray

create

create

getPartitions

compute