write back into text format after reading a file stored into HDFS and running rhwatch in rhipe #42

bedantaguru · 2016-05-10T04:02:38Z

I was trying out rhipe and RHadoop [rmr rhdfs rhbase etc.] series of packages.

Now in both of the packages [rhipe and rmr] I can ingest / read the data stored into csv or text file. Both of them kind of supports creation of new file formats but I find rmr has more support for it or at least more resources to get started. Well, this requirement will be useful when one plans to perform few data processing on raw data stored in HDFS and finally want to store it back to HDFS in a format recognizable by other components of Hadoop like Hive Impala etc. Both of the packages can write in their native format recognizable by the package only. The package rmr supports few other formats.

For reference related to rmr have a look into this page.

However for rhipe I did not get any such document and I tried various ways it failed.

So my question is how can I write back into text [as for example, other recognizable format will also work] after reading a file stored into HDFS and running rhwatch in rhipe ?

I have asked same question here.

saptarshiguha · 2016-05-10T04:27:29Z

I'll respond to this tomorrow.
Cheers
SG

On Mon, May 9, 2016 at 9:02 PM, Indranil Gayen notifications@github.com
wrote:

I was trying out rhipe and RHadoop [rmr rhdfs rhbase etc.] series of
packages.

Now in both of the packages [rhipe and rmr] I can ingest / read the data
stored into csv or text file. Both of them kind of supports creation of new
file formats but I find rmr has more support for it or at least more
resources to get started. Well, this requirement will be useful when one
plans to perform few data processing on raw data stored in HDFS and finally
want to store it back to HDFS in a format recognizable by other components
of Hadoop like Hive Impala etc. Both of the packages can write in their
native format recognizable by the package only. The package rmr supports
few other formats.

For reference related to rmr have a look into this page
https://github.com/RevolutionAnalytics/rmr2/blob/master/docs/getting-data-in-and-out.md
.

However for rhipe I did not get any such document and I tried various ways
it failed.

So my question is how can I write back into text [as for example, other
recognizable format will also work] after reading a file stored into HDFS
and running rhwatch in rhipe ?

I have asked same question here
https://stackoverflow.com/questions/37129039/getting-data-in-and-out-of-rhipe-r-hadoop.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#42

saptarshiguha · 2016-05-10T16:13:15Z

Hello,

It is true that RHIPE doesn't have too many output / inputformats, but it
does have sadly not documented.

If you load RHIPE and type rhoptions$ioformats you'll see the options
available

text input/output
sequencefile storing protobufs (RHIPE's serialization) input/output
mapfile (which behave like on disk hashtables) input/output

An example of Text output

y <- rhwatch(map=function(a,b){
b <- formatC(b, format="f",digits=0)
rhcollect(NULL, c(a,b))
}
, reduce=reducers
, input=i
, output=rhfmt(type='text'
, folder=o
, writeKey=FALSE
, field.sep="\t"
, stringquote="")
, read=FALSE)

This converts an input file where 'a' is character vector and 'b' is
numeric vector. I want to write the

fields without a key (writeKey=FALSE),
separate each element with \t
no string quotess
output is placed in 'folder'

Also present in the package is HBase input (not sure if output works)

See
https://github.com/tesseradata/RHIPE/blob/d3eed56735ece58a7a39e44cd48cfd3522212766/src/main/R/rhfmt.R

But for that to work you'll need this JAR file which translates Hbase to
RHIPE ( https://github.com/saptarshiguha/RhipeHbaseMozilla )

RHIPE io formats are fairly pluggable i.e. you can write your own.

HTH
Saptarshi

On Mon, May 9, 2016 at 9:27 PM, Saptarshi Guha saptarshi.guha@gmail.com
wrote:

I'll respond to this tomorrow.
Cheers
SG

On Mon, May 9, 2016 at 9:02 PM, Indranil Gayen notifications@github.com
wrote:

I was trying out rhipe and RHadoop [rmr rhdfs rhbase etc.] series of
packages.

Now in both of the packages [rhipe and rmr] I can ingest / read the data
stored into csv or text file. Both of them kind of supports creation of new
file formats but I find rmr has more support for it or at least more
resources to get started. Well, this requirement will be useful when one
plans to perform few data processing on raw data stored in HDFS and finally
want to store it back to HDFS in a format recognizable by other components
of Hadoop like Hive Impala etc. Both of the packages can write in their
native format recognizable by the package only. The package rmr supports
few other formats.

For reference related to rmr have a look into this page
https://github.com/RevolutionAnalytics/rmr2/blob/master/docs/getting-data-in-and-out.md
.

However for rhipe I did not get any such document and I tried various
ways it failed.

So my question is how can I write back into text [as for example, other
recognizable format will also work] after reading a file stored into HDFS
and running rhwatch in rhipe ?

I have asked same question here
https://stackoverflow.com/questions/37129039/getting-data-in-and-out-of-rhipe-r-hadoop.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#42

bedantaguru mentioned this issue May 31, 2016

[feature request] There should be a function like drWrite.csv delta-rho/datadr#93

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write back into text format after reading a file stored into HDFS and running rhwatch in rhipe #42

write back into text format after reading a file stored into HDFS and running rhwatch in rhipe #42

bedantaguru commented May 10, 2016

saptarshiguha commented May 10, 2016

saptarshiguha commented May 10, 2016

write back into text format after reading a file stored into HDFS and running rhwatch in rhipe #42

write back into text format after reading a file stored into HDFS and running rhwatch in rhipe #42

Comments

bedantaguru commented May 10, 2016

saptarshiguha commented May 10, 2016

saptarshiguha commented May 10, 2016