JSON Output Format

This library provides a small interface for outputting JSON reports from your MapReduce jobs alongside Hadoop. I built this because I got tired of constantly rolling my own :) It's perfect for generating small JSON reports about your data, rather than large blocks of text output.

Setup

json-output-format available on Maven central, via Sonatype OSS:

<dependency>
    <groupId>com.zackehh</groupId>
    <artifactId>json-output-format</artifactId>
    <version>1.0.1</version>
</dependency>

Usage

It's super simple to use;

Extend JsonOutputFormat
Override convertKey and convertValue
Optionally, override merge

Example

Here is an example of counting into a JSON report (this is a variant of the famous word count example):

/**
 * Counts the number of word occurrences into a JSON format.
 */
public class IntegerJsonOutputFormat extends JsonOutputFormat<Text, IntWritable> {

    /**
     * Converts a Text key to a String key.
     *
     * @param key the Text key.
     * @return a String representation.
     */
    @Override
    protected String convertKey(Text key) {
        return key.toString();
    }

    /**
     * Converts an IntWritable to a JsonNode containing an Integer value.
     *
     * @param value the IntWritable value.
     * @return a JsonNode containing an Integer value.
     */
    @Override
    protected JsonNode convertValue(IntWritable value) {
        return JsonNodeFactory.instance.numberNode(value.get());
    }

    /**
     * Defines a merge strategy for merging clashing keys. In this case
     * we sum the left and right as we're interested in the total.
     *
     * @param left the existing JsonNode value.
     * @param right the new JsonNode value.
     * @return the JsonNode value to persist going forward.
     */
    @Override
    protected JsonNode merge(JsonNode left, JsonNode right) {
        return JsonNodeFactory.instance.numberNode(left.asLong(0) + right.asLong(0));
    }

}

In the typical example of a word count, you'd receive output like this:

word_one  15
word_two  30
word_three  45

Using the above IntegerJsonOutputFormat, you'd receive this instead:

{
  "word_one": 15,
  "word_two": 30,
  "word_three": 45
}

Voila, nice to read output :)

Customization

Output

By default, your files are written as json_output-r-<id>.json, in the traditional Hadoop format. You can customise the initial file name and extension by using the following configuration options:

conf.set("jof.ext", ".bak")         // defaults to ".json"
conf.set("jof.file", "my_filename") // defaults to "json_output"

Serialization

You can also create your own ObjectMapper rather than using the default one in the case you wish to configure the JSON serialization features. For example, the below would use an ObjectMapper which indents your JSON report:

public class IntegerJsonOutputFormat extends JsonOutputFormat<Text, IntWritable> {

    /**
     * Creates an ObjectMapper which allows for indentation.
     */
    @Override
    protected ObjectMapper createMapper() {
        ObjectMapper mapper = new ObjectMapper();
        mapper.enable(SerializationFeature.INDENT_OUTPUT);
        return mapper;
    }

}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src/main/java/com/zackehh/outputformat		src/main/java/com/zackehh/outputformat
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

JSON Output Format

Setup

Usage

Example

Customization

Output

Serialization

About

Uh oh!

Releases 2

Packages

Languages

whitfin/json-output-format

Folders and files

Latest commit

History

Repository files navigation

JSON Output Format

Setup

Usage

Example

Customization

Output

Serialization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages