read.df {SparkR}R Documentation

Load a SparkDataFrame

Description

Returns the dataset in a data source as a SparkDataFrame

Usage

## Default S3 method:
read.df(path = NULL, source = NULL, schema = NULL,
  na.strings = "NA", ...)

## Default S3 method:
loadDF(path = NULL, source = NULL, schema = NULL, ...)

Arguments

path

The path of files to load

source

The name of external data source

schema

The data schema defined in structType

na.strings

Default string value for NA when source is "csv"

Details

The data source is specified by the 'source' and a set of options(...). If 'source' is not specified, the default data source configured by "spark.sql.sources.default" will be used.
Similar to R read.csv, when 'source' is "csv", by default, a value of "NA" will be interpreted as NA.

Value

SparkDataFrame

Note

read.df since 1.4.0

loadDF since 1.6.0

Examples

## Not run: 
##D sparkR.session()
##D df1 <- read.df("path/to/file.json", source = "json")
##D schema <- structType(structField("name", "string"),
##D                      structField("info", "map<string,double>"))
##D df2 <- read.df(mapTypeJsonPath, "json", schema)
##D df3 <- loadDF("data/test_table", "parquet", mergeSchema = "true")
## End(Not run)

[Package SparkR version 2.0.0 Index]