Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LzoJsonRecordReader and JsonLoader may fail with NullPointerException #65

Closed
miguno opened this issue Jul 27, 2011 · 8 comments
Closed

Comments

@miguno
Copy link
Contributor

miguno commented Jul 27, 2011

We have run into some issues with LzoJsonRecordReader and JsonLoader where on some JSON data -- we haven't been able to narrow it down yet -- both classes will fail with a NullPointerException.

For instance, here is the relevant snippet of LzoJsonRecordReader#decodeLineToJson(). What seems to happen is that LzoJsonRecordReader#nextKeyValue() passes a null value to decodeLineToJson(). The problem might therefore originate in the line "int newSize = in_.readLine(currentLine_);" in nextKeyValue(). From what I remember, json-simple's parse method might actually return a null value as well (= adding a safeguard to LzoJsonLoader and JsonLoader might be a good idea anyways) but in this case the null value is already passed to json-simple, before the latter will actually return itself with a null value.

A similar NPE issue exists for the JsonLoader#parseStringToTuple().

/*
    LzoJsonRecordReader.java
*/

 public static boolean decodeLineToJson(JSONParser parser, Text line, MapWritable value) {
    try {
      JSONObject jsonObj = (JSONObject)parser.parse(line.toString());
      for (Object key: jsonObj.keySet()) {      /* *** The NPE is thrown here when jsonObj == null *** */
        Text mapKey = new Text(key.toString());
        Text mapValue = new Text();
        if (jsonObj.get(key) != null) {
          mapValue.set(jsonObj.get(key).toString());
        }

        value.put(mapKey, mapValue);
      }
      return true;
    } catch (ParseException e) {
      LOG.warn("Could not json-decode string: " + line, e);
      return false;
    } catch (NumberFormatException e) {
      LOG.warn("Could not parse field into number: " + line, e);
      return false;
    }

A quick fix is to simply catch for a NPE in LzoJsonRecordReader#decodeLineToJson() and JsonLoader#parseStringToTuple() as is already being done for e.g. ParseException. This fix has worked fine for us. As I said above, however, this might not fix the root of the problem.

I am not so familiar with the code, so I am not sure whether just catching NPEs is the best fix. Maybe you have a better idea!

@miguno
Copy link
Contributor Author

miguno commented Jul 27, 2011

Oh, and this is latest ElephantBird 2.0.x from the eb-dev integration branch (commit 50ad2a6).

@rangadi
Copy link
Contributor

rangadi commented Jul 27, 2011

JSONParser#parse does mention null as on of the possible returns. But could not easily see how that occurs.
Do you have the input that produces this problem?

yes, checking for null and logging a warning would be a good fix (same as a parse error).

@miguno
Copy link
Contributor Author

miguno commented Jul 29, 2011

I do have the input data that produces this problem, but I'd have to narrow it down to the exact snippet that triggers the bug. I already have a patch available, but I want to add a proper unit test + test data to it. Will work on this next week.

@miguno miguno closed this as completed Jul 29, 2011
@miguno miguno reopened this Jul 29, 2011
@miguno
Copy link
Contributor Author

miguno commented Aug 30, 2011

I have not forgotten this issue. Stay tuned!

@miguno
Copy link
Contributor Author

miguno commented Nov 10, 2011

I sent you a pull request at https://github.com/kevinweil/elephant-bird/pull/99.

The commit is based on the eb-dev branch as of commit 06608de.

@miguno
Copy link
Contributor Author

miguno commented Nov 10, 2011

Edit: New pull request with slightly improved patch at https://github.com/kevinweil/elephant-bird/pull/99. I closed the old pull request #98.

@miguno
Copy link
Contributor Author

miguno commented Nov 11, 2011

See my follow-up comments on commit 86bfa834d0, addressing Raghu's questions.

@miguno
Copy link
Contributor Author

miguno commented Nov 28, 2011

The patch was merged in, see https://github.com/kevinweil/elephant-bird/pull/102.

@miguno miguno closed this as completed Nov 28, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants