Faunus Graph

Dan LaRocque edited this page Sep 5, 2014 · 13 revisions
This is the documentation for Faunus 0.4.
Faunus was merged into Titan and renamed Titan-Hadoop in version 0.5.
Documentation for the latest Titan version is available at http://s3.thinkaurelius.com/docs/titan/current.

The source of any Faunus job is a FaunusGraph. FaunusGraph is simply a wrapper to a collection of Hadoop- and Faunus-specific configurations. Most importantly, it captures the location and type of the input graph and output graph. A FaunusGraph is typically created using one of the FaunusFactory.open() methods.



FaunusGraph Construction

A Faunus configuration file is used to construct a FaunusGraph. Assume a file named bin/faunus.properties as represented below.

# input graph parameters
faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
faunus.input.location=graph-of-the-gods.json
# output data parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=output
faunus.output.location.overwrite=true

With FaunusFactory, a configuration file is turned in a FaunusGraph. The toString() of the FaunusGraph denotes the input and output format of the graph. For instance, as seen below, a graph of type GraphSON is the input and a graph of type GraphSON is the output.

gremlin> g = FaunusFactory.open('bin/faunus.properties')
==>faunusgraph[graphsoninputformat->graphsonoutputformat]

Hadoop-Specific Configurations

A FaunusGraph is loaded with Hadoop specific configuration information that is percolated from the master cluster configuration (e.g. set up during cluster construction) to various job level configurations.

gremlin> g.getConf()    
==>keep.failed.task.files=false
==>io.seqfile.compress.blocksize=1000000
==>dfs.df.interval=60000
==>dfs.datanode.failed.volumes.tolerated=0
==>mapreduce.reduce.input.limit=-1
==>mapred.task.tracker.http.address=0.0.0.0:50060
==>mapred.userlog.retain.hours=24
==>dfs.max.objects=0
==>dfs.https.client.keystore.resource=ssl-client.xml
==>mapred.local.dir.minspacestart=0
...

Note, it is possible to provide a prefix to look for in FaunusGraph.getConf(String prefix).

gremlin> g.getConf('mapred')
==>mapred.disk.healthChecker.interval=60000
==>mapred.task.tracker.http.address=0.0.0.0:50060
==>mapred.userlog.retain.hours=24
==>mapred.local.dir.minspacestart=0
==>mapred.cluster.reduce.memory.mb=-1
==>mapred.reduce.parallel.copies=5
...

Faunus Properties

Within the global configuration, there are Faunus-specific configurations. These properties can be isolated with FaunusGraph.getConf('faunus'). In general, any prefix string can be provided (e.g. mapred or mapred.map).

gremlin> g.getConf('faunus')        
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=graph-of-the-gods.json
==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
==>faunus.output.location=output
==>faunus.output.location.overwrite=true

Moreover, FaunusGraph provides getters/setters for mutating the most commonly used properties.

gremlin> g.setGraphOutputFormat(NoOpOutputFormat.class)
==>null
gremlin> g
==>faunusgraph[graphsoninputformat->noopoutputformat]
gremlin> g.getGraphOutputFormat()
==>class com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
gremlin> g.getProperties()       
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=graph-of-the-gods.json
==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
==>faunus.output.location=output
==>faunus.output.location.overwrite=true

Chaining Graphs

To conclude, a useful FaunusGraph method is getNextGraph(). This method generates a new FaunusGraph that is the “inverse” of the current with the input formats and output locations reconfigured to support easy graph chaining.

gremlin> g = FaunusFactory.open('bin/faunus.properties')
==>faunusgraph[graphsoninputformat->graphsonoutputformat]
gremlin> h = g.getNextGraph()
==>faunusgraph[graphsoninputformat->graphsonoutputformat]
gremlin> h.getConf('faunus')
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat
==>faunus.input.location=output/job-1
==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
==>faunus.output.location=output_
==>faunus.output.location.overwrite=true
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.