Class JdbcRDD<T>

Object
org.apache.spark.rdd.RDD<T>
org.apache.spark.rdd.JdbcRDD<T>
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, scala.Serializable

public class JdbcRDD<T> extends RDD<T> implements org.apache.spark.internal.Logging
An RDD that executes a SQL query on a JDBC connection and reads results. For usage example, see test case JdbcRDDSuite.

param: getConnection a function that returns an open Connection. The RDD takes care of closing the connection. param: sql the text of the query. The query must contain two ? placeholders for parameters used to partition the results. For example,


   select title, author from books where ? <= id and id <= ?
   
param: lowerBound the minimum value of the first placeholder param: upperBound the maximum value of the second placeholder The lower and upper bounds are inclusive. param: numPartitions the number of partitions. Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2, the query would be executed twice, once with (1, 10) and once with (11, 20) param: mapRow a function from a ResultSet to a single row of the desired result type(s). This should only call getInt, getString, etc; the RDD takes care of calling next. The default maps a ResultSet to an array of Object.
See Also:
  • Constructor Details

    • JdbcRDD

      public JdbcRDD(SparkContext sc, scala.Function0<Connection> getConnection, String sql, long lowerBound, long upperBound, int numPartitions, scala.Function1<ResultSet,T> mapRow, scala.reflect.ClassTag<T> evidence$1)
  • Method Details

    • resultSetToObjectArray

      public static Object[] resultSetToObjectArray(ResultSet rs)
    • create

      public static <T> JavaRDD<T> create(JavaSparkContext sc, JdbcRDD.ConnectionFactory connectionFactory, String sql, long lowerBound, long upperBound, int numPartitions, Function<ResultSet,T> mapRow)
      Create an RDD that executes a SQL query on a JDBC connection and reads results. For usage example, see test case JavaAPISuite.testJavaJdbcRDD.

      Parameters:
      connectionFactory - a factory that returns an open Connection. The RDD takes care of closing the connection.
      sql - the text of the query. The query must contain two ? placeholders for parameters used to partition the results. For example,
      
         select title, author from books where ? <= id and id <= ?
         
      lowerBound - the minimum value of the first placeholder
      upperBound - the maximum value of the second placeholder The lower and upper bounds are inclusive.
      numPartitions - the number of partitions. Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2, the query would be executed twice, once with (1, 10) and once with (11, 20)
      mapRow - a function from a ResultSet to a single row of the desired result type(s). This should only call getInt, getString, etc; the RDD takes care of calling next. The default maps a ResultSet to an array of Object.
      sc - (undocumented)
      Returns:
      (undocumented)
    • create

      public static JavaRDD<Object[]> create(JavaSparkContext sc, JdbcRDD.ConnectionFactory connectionFactory, String sql, long lowerBound, long upperBound, int numPartitions)
      Create an RDD that executes a SQL query on a JDBC connection and reads results. Each row is converted into a Object array. For usage example, see test case JavaAPISuite.testJavaJdbcRDD.

      Parameters:
      connectionFactory - a factory that returns an open Connection. The RDD takes care of closing the connection.
      sql - the text of the query. The query must contain two ? placeholders for parameters used to partition the results. For example,
      
         select title, author from books where ? <= id and id <= ?
         
      lowerBound - the minimum value of the first placeholder
      upperBound - the maximum value of the second placeholder The lower and upper bounds are inclusive.
      numPartitions - the number of partitions. Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2, the query would be executed twice, once with (1, 10) and once with (11, 20)
      sc - (undocumented)
      Returns:
      (undocumented)
    • getPartitions

      public Partition[] getPartitions()
    • compute

      public scala.collection.Iterator<T> compute(Partition thePart, TaskContext context)
      Description copied from class: RDD
      :: DeveloperApi :: Implemented by subclasses to compute a given partition.
      Specified by:
      compute in class RDD<T>
      Parameters:
      thePart - (undocumented)
      context - (undocumented)
      Returns:
      (undocumented)