DefaultSource

Constructors
Constructor and Description
`DefaultSource()`

Methods
Modifier and Type	Method and Description
`scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>>`	`buildReader(SparkSession sparkSession, StructType dataSchema, StructType partitionSchema, StructType requiredSchema, scala.collection.Seq<Filter> filters, scala.collection.immutable.Map<java.lang.String,java.lang.String> options, org.apache.hadoop.conf.Configuration hadoopConf)`
`scala.Option<StructType>`	`inferSchema(SparkSession sparkSession, scala.collection.immutable.Map<java.lang.String,java.lang.String> options, scala.collection.Seq<org.apache.hadoop.fs.FileStatus> files)`
`scala.collection.immutable.Map<java.lang.String,java.lang.String>`	`prepareRead(SparkSession sparkSession, scala.collection.immutable.Map<java.lang.String,java.lang.String> options, scala.collection.Seq<org.apache.hadoop.fs.FileStatus> files)`
`org.apache.spark.sql.execution.datasources.OutputWriterFactory`	`prepareWrite(SparkSession sparkSession, org.apache.hadoop.mapreduce.Job job, scala.collection.immutable.Map<java.lang.String,java.lang.String> options, StructType dataSchema)`
`java.lang.String`	`shortName()` The string that represents the format that this data source provider uses.
`java.lang.String`	`toString()`

java.lang.Object
- org.apache.spark.ml.source.libsvm.DefaultSource

All Implemented Interfaces:

org.apache.spark.sql.execution.datasources.FileFormat, DataSourceRegister
```
public class DefaultSource
extends java.lang.Object
implements org.apache.spark.sql.execution.datasources.FileFormat, DataSourceRegister
```
libsvm package implements Spark SQL data source API for loading LIBSVM data as DataFrame. The loaded DataFrame has two columns: label containing labels stored as doubles and features containing feature vectors stored as Vectors.
To use LIBSVM data source, you need to set "libsvm" as the format in DataFrameReader and optionally specify options, for example:
```
   // Scala
   val df = spark.read.format("libsvm")
     .option("numFeatures", "780")
     .load("data/mllib/sample_libsvm_data.txt")

   // Java
   DataFrame df = spark.read().format("libsvm")
     .option("numFeatures, "780")
     .load("data/mllib/sample_libsvm_data.txt");
 
```
LIBSVM data source supports the following options: - "numFeatures": number of features. If unspecified or nonpositive, the number of features will be determined automatically at the cost of one additional pass. This is also useful when the dataset is already split into multiple files and you want to load them separately, because some features may not present in certain files, which leads to inconsistent feature dimensions. - "vectorType": feature vector type, "sparse" (default) or "dense".
See Also:

Constructor Summary

Constructors
Constructor and Description

DefaultSource()

Method Summary

Methods
Modifier and Type Method and Description

scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>> buildReader(SparkSession sparkSession, StructType dataSchema, StructType partitionSchema, StructType requiredSchema, scala.collection.Seq<Filter> filters, scala.collection.immutable.Map<java.lang.String,java.lang.String> options, org.apache.hadoop.conf.Configuration hadoopConf)

scala.Option<StructType> inferSchema(SparkSession sparkSession, scala.collection.immutable.Map<java.lang.String,java.lang.String> options, scala.collection.Seq<org.apache.hadoop.fs.FileStatus> files)

scala.collection.immutable.Map<java.lang.String,java.lang.String> prepareRead(SparkSession sparkSession, scala.collection.immutable.Map<java.lang.String,java.lang.String> options, scala.collection.Seq<org.apache.hadoop.fs.FileStatus> files)

org.apache.spark.sql.execution.datasources.OutputWriterFactory prepareWrite(SparkSession sparkSession, org.apache.hadoop.mapreduce.Job job, scala.collection.immutable.Map<java.lang.String,java.lang.String> options, StructType dataSchema)

java.lang.String shortName()
The string that represents the format that this data source provider uses.

java.lang.String toString()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.spark.sql.execution.datasources.FileFormat
buildReaderWithPartitionValues, buildWriter, supportBatch

Constructor Detail
- DefaultSource
```
public DefaultSource()
```

Method Detail

shortName
```
public java.lang.String shortName()
```
Description copied from interface: DataSourceRegister
The string that represents the format that this data source provider uses. This is overridden by children to provide a nice alias for the data source. For example:
```
   override def shortName(): String = "parquet"
 
```
Specified by:

shortName in interface DataSourceRegister

Returns:
(undocumented)

toString
```
public java.lang.String toString()
```
Overrides:

toString in class java.lang.Object

inferSchema

public scala.Option<StructType> inferSchema(SparkSession sparkSession,
                                   scala.collection.immutable.Map<java.lang.String,java.lang.String> options,
                                   scala.collection.Seq<org.apache.hadoop.fs.FileStatus> files)

Specified by:: inferSchema in interface org.apache.spark.sql.execution.datasources.FileFormat

prepareRead

public scala.collection.immutable.Map<java.lang.String,java.lang.String> prepareRead(SparkSession sparkSession,
                                                                            scala.collection.immutable.Map<java.lang.String,java.lang.String> options,
                                                                            scala.collection.Seq<org.apache.hadoop.fs.FileStatus> files)

Specified by:: prepareRead in interface org.apache.spark.sql.execution.datasources.FileFormat

prepareWrite

public org.apache.spark.sql.execution.datasources.OutputWriterFactory prepareWrite(SparkSession sparkSession,
                                                                          org.apache.hadoop.mapreduce.Job job,
                                                                          scala.collection.immutable.Map<java.lang.String,java.lang.String> options,
                                                                          StructType dataSchema)

Specified by:: prepareWrite in interface org.apache.spark.sql.execution.datasources.FileFormat

buildReader

public scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>> buildReader(SparkSession sparkSession,
                                                                                                                                                           StructType dataSchema,
                                                                                                                                                           StructType partitionSchema,
                                                                                                                                                           StructType requiredSchema,
                                                                                                                                                           scala.collection.Seq<Filter> filters,
                                                                                                                                                           scala.collection.immutable.Map<java.lang.String,java.lang.String> options,
                                                                                                                                                           org.apache.hadoop.conf.Configuration hadoopConf)

Specified by:: buildReader in interface org.apache.spark.sql.execution.datasources.FileFormat

Class DefaultSource

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.sql.execution.datasources.FileFormat

Constructor Detail

DefaultSource

Method Detail

shortName

toString

inferSchema

prepareRead

prepareWrite

buildReader