Class JdbcDialect

Object
org.apache.spark.sql.jdbc.JdbcDialect
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging
Direct Known Subclasses:
AggregatedDialect, DatabricksDialect, DB2Dialect, DerbyDialect, MsSqlServerDialect, MySQLDialect, OracleDialect, PostgresDialect, SnowflakeDialect, TeradataDialect

public abstract class JdbcDialect extends Object implements Serializable, org.apache.spark.internal.Logging
:: DeveloperApi :: Encapsulates everything (extensions, workarounds, quirks) to handle the SQL dialect of a certain database or jdbc driver. Lots of databases define types that aren't explicitly supported by the JDBC spec. Some JDBC drivers also report inaccurate information---for instance, BIT(n>1) being reported as a BIT type is quite common, even though BIT in JDBC is meant for single-bit values. Also, there does not appear to be a standard name for an unbounded string or binary type; we use BLOB and CLOB by default but override with database-specific alternatives when these are absent or do not behave correctly.

Currently, the only thing done by the dialect is type mapping. getCatalystType is used when reading from a JDBC table and getJDBCType is used when writing to a JDBC table. If getCatalystType returns null, the default type handling is used for the given JDBC type. Similarly, if getJDBCType returns (null, None), the default type handling is used for the given Catalyst type.

See Also:
  • Constructor Details

    • JdbcDialect

      public JdbcDialect()
  • Method Details

    • alterTable

      public String[] alterTable(String tableName, scala.collection.immutable.Seq<TableChange> changes, int dbMajorVersion)
      Alter an existing table.

      Parameters:
      tableName - The name of the table to be altered.
      changes - Changes to apply to the table.
      dbMajorVersion - (undocumented)
      Returns:
      The SQL statements to use for altering the table.
    • beforeFetch

      public void beforeFetch(Connection connection, scala.collection.immutable.Map<String,String> properties)
      Override connection specific properties to run before a select is made. This is in place to allow dialects that need special treatment to optimize behavior.
      Parameters:
      connection - The connection object
      properties - The connection properties. This is passed through from the relation.
    • canHandle

      public abstract boolean canHandle(String url)
      Check if this dialect instance can handle a certain jdbc url.
      Parameters:
      url - the jdbc url.
      Returns:
      True if the dialect can be applied on the given jdbc url.
      Throws:
      NullPointerException - if the url is null.
    • classifyException

      public AnalysisException classifyException(Throwable e, String errorClass, scala.collection.immutable.Map<String,String> messageParameters, String description)
      Gets a dialect exception, classifies it and wraps it by AnalysisException.
      Parameters:
      e - The dialect specific exception.
      errorClass - The error class assigned in the case of an unclassified e
      messageParameters - The message parameters of errorClass
      description - The error description
      Returns:
      AnalysisException or its sub-class.
    • classifyException

      public AnalysisException classifyException(String message, Throwable e)
      Deprecated.
      Please override the classifyException method with an error class. Since 4.0.0.
      Gets a dialect exception, classifies it and wraps it by AnalysisException.
      Parameters:
      message - The error message to be placed to the returned exception.
      e - The dialect specific exception.
      Returns:
      AnalysisException or its sub-class.
    • compileAggregate

      public scala.Option<String> compileAggregate(AggregateFunc aggFunction)
      Deprecated.
      use org.apache.spark.sql.jdbc.JdbcDialect.compileExpression instead. Since 3.4.0.
      Converts aggregate function to String representing a SQL expression.
      Parameters:
      aggFunction - The aggregate function to be converted.
      Returns:
      Converted value.
    • compileExpression

      public scala.Option<String> compileExpression(Expression expr)
      Converts V2 expression to String representing a SQL expression.
      Parameters:
      expr - The V2 expression to be converted.
      Returns:
      Converted value.
    • compileValue

      public Object compileValue(Object value)
      Converts value to SQL expression.
      Parameters:
      value - The value to be converted.
      Returns:
      Converted value.
    • convertJavaDateToDate

      public Date convertJavaDateToDate(Date d)
      Converts an instance of java.sql.Date to a custom java.sql.Date value.
      Parameters:
      d - the date value returned from JDBC ResultSet getDate method.
      Returns:
      the date value after conversion
    • convertJavaTimestampToTimestamp

      public Timestamp convertJavaTimestampToTimestamp(Timestamp t)
      Converts an instance of java.sql.Timestamp to a custom java.sql.Timestamp value.
      Parameters:
      t - represents a specific instant in time based on the hybrid calendar which combines Julian and Gregorian calendars.
      Returns:
      the timestamp value to convert to
      Throws:
      IllegalArgumentException - if t is null
    • convertJavaTimestampToTimestampNTZ

      public LocalDateTime convertJavaTimestampToTimestampNTZ(Timestamp t)
      Convert java.sql.Timestamp to a LocalDateTime representing the same wall-clock time as the value stored in a remote database. JDBC dialects should override this function to provide implementations that suit their JDBC drivers.
      Parameters:
      t - Timestamp returned from JDBC driver getTimestamp method.
      Returns:
      A LocalDateTime representing the same wall clock time as the timestamp in database.
    • convertTimestampNTZToJavaTimestamp

      public Timestamp convertTimestampNTZToJavaTimestamp(LocalDateTime ldt)
      Converts a LocalDateTime representing a TimestampNTZ type to an instance of java.sql.Timestamp.
      Parameters:
      ldt - representing a TimestampNTZType.
      Returns:
      A Java Timestamp representing this LocalDateTime.
    • createConnectionFactory

      public scala.Function1<Object,Connection> createConnectionFactory(org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions options)
      Returns a factory for creating connections to the given JDBC URL. In general, creating a connection has nothing to do with JDBC partition id. But sometimes it is needed, such as a database with multiple shard nodes.
      Parameters:
      options - - JDBC options that contains url, table and other information.
      Returns:
      The factory method for creating JDBC connections with the RDD partition ID. -1 means the connection is being created at the driver side.
      Throws:
      IllegalArgumentException - if the driver could not open a JDBC connection.
    • createIndex

      public String createIndex(String indexName, Identifier tableIdent, NamedReference[] columns, Map<NamedReference,Map<String,String>> columnsProperties, Map<String,String> properties)
      Build a create index SQL statement.

      Parameters:
      indexName - the name of the index to be created
      tableIdent - the table on which index to be created
      columns - the columns on which index to be created
      columnsProperties - the properties of the columns on which index to be created
      properties - the properties of the index to be created
      Returns:
      the SQL statement to use for creating the index.
    • createSchema

      public void createSchema(Statement statement, String schema, String comment)
      Create schema with an optional comment. Empty string means no comment.
      Parameters:
      statement - (undocumented)
      schema - (undocumented)
      comment - (undocumented)
    • createTable

      public void createTable(Statement statement, String tableName, String strSchema, org.apache.spark.sql.execution.datasources.jdbc.JdbcOptionsInWrite options)
      Create the table if the table does not exist. To allow certain options to append when create a new table, which can be table_options or partition_options. E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT CHARSET=utf8"

      Parameters:
      statement - The Statement object used to execute SQL statements.
      tableName - The name of the table to be created.
      strSchema - The schema of the table to be created.
      options - The JDBC options. It contains the create table option, which can be table_options or partition_options.
    • dropIndex

      public String dropIndex(String indexName, Identifier tableIdent)
      Build a drop index SQL statement.

      Parameters:
      indexName - the name of the index to be dropped.
      tableIdent - the table on which index to be dropped.
      Returns:
      the SQL statement to use for dropping the index.
    • dropSchema

      public String dropSchema(String schema, boolean cascade)
    • dropTable

      public String dropTable(String table)
      Build a SQL statement to drop the given table.

      Parameters:
      table - the table name
      Returns:
      The SQL statement to use for drop the table.
    • functions

      public scala.collection.immutable.Seq<scala.Tuple2<String,UnboundFunction>> functions()
      List the user-defined functions in jdbc dialect.
      Returns:
      a sequence of tuple from function name to user-defined function.
    • getAddColumnQuery

      public String getAddColumnQuery(String tableName, String columnName, String dataType)
    • getCatalystType

      public scala.Option<DataType> getCatalystType(int sqlType, String typeName, int size, MetadataBuilder md)
      Get the custom datatype mapping for the given jdbc meta information.

      Guidelines for mapping database defined timestamps to Spark SQL timestamps:

      • TIMESTAMP WITHOUT TIME ZONE if preferTimestampNTZ -> TimestampNTZType
      • TIMESTAMP WITHOUT TIME ZONE if !preferTimestampNTZ -> TimestampType(LTZ)
      • TIMESTAMP WITH TIME ZONE -> TimestampType(LTZ)
      • TIMESTAMP WITH LOCAL TIME ZONE -> TimestampType(LTZ)
      • If the TIMESTAMP cannot be distinguished by sqlType and typeName, preferTimestampNTZ is respected for now, but we may need to add another option in the future if necessary.

      Parameters:
      sqlType - Refers to Types constants, or other constants defined by the target database, e.g. -101 is Oracle's TIMESTAMP WITH TIME ZONE type. This value is returned by ResultSetMetaData.getColumnType(int).
      typeName - The column type name used by the database (e.g. "BIGINT UNSIGNED"). This is sometimes used to determine the target data type when sqlType is not sufficient if multiple database types are conflated into a single id. This value is returned by ResultSetMetaData.getColumnTypeName(int).
      size - The size of the type, e.g. the maximum precision for numeric types, length for character string, etc. This value is returned by ResultSetMetaData.getPrecision(int).
      md - Result metadata associated with this type. This contains additional information from ResultSetMetaData or user specified options.
      Returns:
      An option the actual DataType (subclasses of DataType) or None if the default type mapping should be used.
    • getDayTimeIntervalAsMicros

      public long getDayTimeIntervalAsMicros(String daytimeStr)
      Converts a day-time interval string to a long value micros.

      Parameters:
      daytimeStr - the day-time interval string
      Returns:
      the number of total microseconds in the interval
      Throws:
      IllegalArgumentException - if the input string is invalid
    • getDeleteColumnQuery

      public String getDeleteColumnQuery(String tableName, String columnName)
    • getFullyQualifiedQuotedTableName

      public String getFullyQualifiedQuotedTableName(Identifier ident)
      Return the DB-specific quoted and fully qualified table name
      Parameters:
      ident - (undocumented)
      Returns:
      (undocumented)
    • getJDBCType

      public scala.Option<JdbcType> getJDBCType(DataType dt)
      Retrieve the jdbc / sql type for a given datatype.
      Parameters:
      dt - The datatype (e.g. StringType)
      Returns:
      The new JdbcType if there is an override for this DataType
    • getJdbcSQLQueryBuilder

      public JdbcSQLQueryBuilder getJdbcSQLQueryBuilder(org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions options)
      Returns the SQL builder for the SELECT statement.
      Parameters:
      options - (undocumented)
      Returns:
      (undocumented)
    • getLimitClause

      public String getLimitClause(Integer limit)
      Returns the LIMIT clause for the SELECT statement
      Parameters:
      limit - (undocumented)
      Returns:
      (undocumented)
    • getOffsetClause

      public String getOffsetClause(Integer offset)
      Returns the OFFSET clause for the SELECT statement
      Parameters:
      offset - (undocumented)
      Returns:
      (undocumented)
    • getRenameColumnQuery

      public String getRenameColumnQuery(String tableName, String columnName, String newName, int dbMajorVersion)
    • getSchemaCommentQuery

      public String getSchemaCommentQuery(String schema, String comment)
    • getSchemaQuery

      public String getSchemaQuery(String table)
      The SQL query that should be used to discover the schema of a table. It only needs to ensure that the result set has the same schema as the table, such as by calling "SELECT * ...". Dialects can override this method to return a query that works best in a particular database.
      Parameters:
      table - The name of the table.
      Returns:
      The SQL query to use for discovering the schema.
    • getTableCommentQuery

      public String getTableCommentQuery(String table, String comment)
    • getTableExistsQuery

      public String getTableExistsQuery(String table)
      Get the SQL query that should be used to find if the given table exists. Dialects can override this method to return a query that works best in a particular database.
      Parameters:
      table - The name of the table.
      Returns:
      The SQL query to use for checking the table.
    • getTableSample

      public String getTableSample(org.apache.spark.sql.execution.datasources.v2.TableSampleInfo sample)
    • getTruncateQuery

      public String getTruncateQuery(String table)
      The SQL query that should be used to truncate a table. Dialects can override this method to return a query that is suitable for a particular database. For PostgreSQL, for instance, a different query is used to prevent "TRUNCATE" affecting other tables.
      Parameters:
      table - The table to truncate
      Returns:
      The SQL query to use for truncating a table
    • getTruncateQuery

      public String getTruncateQuery(String table, scala.Option<Object> cascade)
      The SQL query that should be used to truncate a table. Dialects can override this method to return a query that is suitable for a particular database. For PostgreSQL, for instance, a different query is used to prevent "TRUNCATE" affecting other tables.
      Parameters:
      table - The table to truncate
      cascade - Whether or not to cascade the truncation
      Returns:
      The SQL query to use for truncating a table
    • getUpdateColumnNullabilityQuery

      public String getUpdateColumnNullabilityQuery(String tableName, String columnName, boolean isNullable)
    • getUpdateColumnTypeQuery

      public String getUpdateColumnTypeQuery(String tableName, String columnName, String newDataType)
    • getYearMonthIntervalAsMonths

      public int getYearMonthIntervalAsMonths(String yearmonthStr)
      Converts an year-month interval string to an int value months.

      Parameters:
      yearmonthStr - the year-month interval string
      Returns:
      the number of total months in the interval
      Throws:
      IllegalArgumentException - if the input string is invalid
    • indexExists

      public boolean indexExists(Connection conn, String indexName, Identifier tableIdent, org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions options)
      Checks whether an index exists

      Parameters:
      indexName - the name of the index
      tableIdent - the table on which index to be checked
      options - JDBCOptions of the table
      conn - (undocumented)
      Returns:
      true if the index with indexName exists in the table with tableName, false otherwise
    • insertIntoTable

      public String insertIntoTable(String table, StructField[] fields)
      Returns an Insert SQL statement template for inserting a row into the target table via JDBC conn. Use "?" as placeholder for each value to be inserted. E.g. INSERT INTO t ("name", "age", "gender") VALUES (?, ?, ?)

      Parameters:
      table - The name of the table.
      fields - The fields of the row that will be inserted.
      Returns:
      The SQL query to use for insert data into table.
    • isCascadingTruncateTable

      public scala.Option<Object> isCascadingTruncateTable()
      Return Some[true] iff TRUNCATE TABLE causes cascading default. Some[true] : TRUNCATE TABLE causes cascading. Some[false] : TRUNCATE TABLE does not cause cascading. None: The behavior of TRUNCATE TABLE is unknown (default).
      Returns:
      (undocumented)
    • isSupportedFunction

      public boolean isSupportedFunction(String funcName)
      Returns whether the database supports function.
      Parameters:
      funcName - Upper-cased function name
      Returns:
      True if the database supports function.
    • listIndexes

      public TableIndex[] listIndexes(Connection conn, Identifier tableIdent, org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions options)
      Lists all the indexes in this table.
      Parameters:
      conn - (undocumented)
      tableIdent - (undocumented)
      options - (undocumented)
      Returns:
      (undocumented)
    • listSchemas

      public String[][] listSchemas(Connection conn, org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions options)
      Lists all the schemas in this table.
      Parameters:
      conn - (undocumented)
      options - (undocumented)
      Returns:
      (undocumented)
    • quoteIdentifier

      public String quoteIdentifier(String colName)
      Quotes the identifier. This is used to put quotes around the identifier in case the column name is a reserved keyword, or in case it contains characters that require quotes (e.g. space).
      Parameters:
      colName - (undocumented)
      Returns:
      (undocumented)
    • removeSchemaCommentQuery

      public String removeSchemaCommentQuery(String schema)
    • renameTable

      public String renameTable(String oldTable, String newTable)
      Deprecated.
      Please override renameTable method with identifiers. Since 3.5.0.
      Rename an existing table.

      Parameters:
      oldTable - The existing table.
      newTable - New name of the table.
      Returns:
      The SQL statement to use for renaming the table.
    • renameTable

      public String renameTable(Identifier oldTable, Identifier newTable)
      Rename an existing table.

      Parameters:
      oldTable - The existing table.
      newTable - New name of the table.
      Returns:
      The SQL statement to use for renaming the table.
    • schemasExists

      public boolean schemasExists(Connection conn, org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions options, String schema)
      Check schema exists or not.
      Parameters:
      conn - (undocumented)
      options - (undocumented)
      schema - (undocumented)
      Returns:
      (undocumented)
    • supportsLimit

      public boolean supportsLimit()
      Returns ture if dialect supports LIMIT clause.

      Note: Some build-in dialect supports LIMIT clause with some trick, please see: OracleDialect.OracleSQLQueryBuilder and MsSqlServerDialect.MsSqlServerSQLQueryBuilder.

      Returns:
      (undocumented)
    • supportsOffset

      public boolean supportsOffset()
      Returns ture if dialect supports OFFSET clause.

      Note: Some build-in dialect supports OFFSET clause with some trick, please see: OracleDialect.OracleSQLQueryBuilder and MySQLDialect.MySQLSQLQueryBuilder.

      Returns:
      (undocumented)
    • supportsTableSample

      public boolean supportsTableSample()
    • updateExtraColumnMeta

      public void updateExtraColumnMeta(Connection conn, ResultSetMetaData rsmd, int columnIdx, MetadataBuilder metadata)
      Get extra column metadata for the given column.

      Parameters:
      conn - The connection currently connection being used.
      rsmd - The metadata of the current result set.
      columnIdx - The index of the column.
      metadata - The metadata builder to store the extra column information.