Distributed Data Connector R package
The dataconnector package allows R users to read CSV and ORC files from HDFS and the local file system.

The package is extensible and new file formats and file systems can be added easily.

It can be used together with Distributed R to distribute the file loading across cores and machines.

Supported formats:

Supported file systems:

  • HDFS
  • Local FS


$ git clone --recursive https://github.com/vertica/r-dataconnector.git 
$ R CMD INSTALL r-dataconnector/dataconnector



# Load a CSV file from the local file system
df <- csv2dataframe(url='/tmp/test.csv', schema='age:int64,name:string')

# Load a CSV file from HDFS
df <- csv2dataframe(url='hdfs:///test.csv', schema='age:int64,name:string')

# Load an ORC file from HDFS
df <- orc2dataframe(url='hdfs:///test.orc')

# write a file to HDFS
object2hdfs(mymodel, 'hdfs:///file.out', overwrite=1)


R> ?csv2dataframe
R> ?orc2dataframe


Apache 2.0.