This library provides a small interface for outputting JSON reports from your MapReduce jobs alongside Hadoop. I built this because I got tired of constantly rolling my own :) It's perfect for generating small JSON reports about your data, rather than large blocks of text output.
json-output-format
available on Maven central, via Sonatype OSS:
<dependency>
<groupId>com.zackehh</groupId>
<artifactId>json-output-format</artifactId>
<version>1.0.1</version>
</dependency>
It's super simple to use;
- Extend
JsonOutputFormat
- Override
convertKey
andconvertValue
- Optionally, override
merge
Here is an example of counting into a JSON report (this is a variant of the famous word count example):
/**
* Counts the number of word occurrences into a JSON format.
*/
public class IntegerJsonOutputFormat extends JsonOutputFormat<Text, IntWritable> {
/**
* Converts a Text key to a String key.
*
* @param key the Text key.
* @return a String representation.
*/
@Override
protected String convertKey(Text key) {
return key.toString();
}
/**
* Converts an IntWritable to a JsonNode containing an Integer value.
*
* @param value the IntWritable value.
* @return a JsonNode containing an Integer value.
*/
@Override
protected JsonNode convertValue(IntWritable value) {
return JsonNodeFactory.instance.numberNode(value.get());
}
/**
* Defines a merge strategy for merging clashing keys. In this case
* we sum the left and right as we're interested in the total.
*
* @param left the existing JsonNode value.
* @param right the new JsonNode value.
* @return the JsonNode value to persist going forward.
*/
@Override
protected JsonNode merge(JsonNode left, JsonNode right) {
return JsonNodeFactory.instance.numberNode(left.asLong(0) + right.asLong(0));
}
}
In the typical example of a word count, you'd receive output like this:
word_one 15
word_two 30
word_three 45
Using the above IntegerJsonOutputFormat
, you'd receive this instead:
{
"word_one": 15,
"word_two": 30,
"word_three": 45
}
Voila, nice to read output :)
By default, your files are written as json_output-r-<id>.json
, in the traditional Hadoop format. You can customise the initial file name and extension by using the following configuration options:
conf.set("jof.ext", ".bak") // defaults to ".json"
conf.set("jof.file", "my_filename") // defaults to "json_output"
You can also create your own ObjectMapper
rather than using the default one in the case you wish to configure the JSON serialization features. For example, the below would use an ObjectMapper
which indents your JSON report:
public class IntegerJsonOutputFormat extends JsonOutputFormat<Text, IntWritable> {
/**
* Creates an ObjectMapper which allows for indentation.
*/
@Override
protected ObjectMapper createMapper() {
ObjectMapper mapper = new ObjectMapper();
mapper.enable(SerializationFeature.INDENT_OUTPUT);
return mapper;
}
}