Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent NPE in TStructDescriptor when an unexpected enum is encountered ... #364

Merged
merged 3 commits into from
Jan 2, 2014

Conversation

dvryaboy
Copy link
Contributor

...by returning an empty map.

When an out-of-date thrift definition is used, and a new enum field is encountered, currently the enum map can be null, which results in an NPE. It seems like it is better to return an empty map and let the caller deal with the fact that the map doesn't contain all the enums the caller will encounter (the caller is then free to increment error counters, etc).

@dvryaboy
Copy link
Contributor Author

@rangadi and @julienledem please take a look. The specific context where I encountered this was parquet conversion, where the stack trace looked like this:

java.lang.NullPointerException
    at com.twitter.elephantbird.thrift.TStructDescriptor$Field.getEnumValues(TStructDescriptor.java:313)
    at parquet.thrift.ThriftSchemaConverter.toThriftField(ThriftSchemaConverter.java:123)
    at parquet.thrift.ThriftSchemaConverter.toStructType(ThriftSchemaConverter.java:71)
    at parquet.thrift.ThriftSchemaConverter.toThriftField(ThriftSchemaConverter.java:105)
    at parquet.thrift.ThriftSchemaConverter.toThriftField(ThriftSchemaConverter.java:120)
    at parquet.thrift.ThriftSchemaConverter.toStructType(ThriftSchemaConverter.java:71)
    at parquet.thrift.ThriftSchemaConverter.toThriftField(ThriftSchemaConverter.java:105)
    at parquet.thrift.ThriftSchemaConverter.toStructType(ThriftSchemaConverter.java:71)
    at parquet.thrift.ThriftSchemaConverter.toStructType(ThriftSchemaConverter.java:59)
    at parquet.hadoop.thrift.ThriftBytesWriteSupport.init(ThriftBytesWriteSupport.java:108)
    at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:273)
    at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:247)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:521)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:636)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1310)
    at org.apache.hadoop.mapred.Child.main(Child.java:260)

@rangadi
Copy link
Contributor

rangadi commented Dec 26, 2013

isEnum() would be false in this case. getEnumValueOf() is valid only after isEnum() is checked.

@dvryaboy
Copy link
Contributor Author

That means we are in the rather odd situation where field.getType can return enum, but isEnum returns false, and getEnumValues() throws an NPE.

@dvryaboy
Copy link
Contributor Author

On further reflection, I see why returning an empty collection is bad -- but just blowing up with an NPE is not super nice either. Maybe we should change all methods that dereference enumMap and enumIdMap to check them for null and return null if they are undefined?

@dvryaboy
Copy link
Contributor Author

Updated the PR with a better null check, as described in the comment above.

@rangadi
Copy link
Contributor

rangadi commented Dec 26, 2013

That means we are in the rather odd situation where field.getType can return enum, but isEnum returns false, and getEnumValues() throws an NPE.

I don't think that is possible. If TType is ENUM, isEnum would always be true.

Can you point me to the struct that triggered this?

@rangadi
Copy link
Contributor

rangadi commented Dec 26, 2013

I am fine with the patch (return 'null' rather than NPE). This is essentially a work around for not having to check isEnum().

JavaDoc added is very useful.. have a couple of comments there.

/**
*
* @return true if the field is a known enum. Note that this can return false
* in two cases: if the field is not an enum at all, or if the field you are reading
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the JavaDoc.

case (1) : yes, is false when a field is not enum (according to Thrift definition on classpath).
case (2) : What does it mean? TStructDescriptor is a singleton descriptor for each Thrift struct, only based on definition on the classpath. It does not depend on the message on the wire at all. May be the app stored separately that field_x is an Enum (which changed while reading), that is a different issue.

@dvryaboy
Copy link
Contributor Author

For (2), I am not entirely sure how this is happening, to be honest. I'm encountering this NPE when trying to convert a thrift file to parquet, using an out of date thrift definition. I assume that it's something like reading from the binary data and looking at the type bits in the stream to id the field type, and encountering a mismatch with what our inspection of the generated code gives us.

@rangadi
Copy link
Contributor

rangadi commented Dec 27, 2013

default Thrift deserializer would ignore the field since the type on the wire does not match type reported by runtime Thrift struct.

Though this patch can go in, looks like we should fix Parquet converter that is reading the thrift message.

I think it is better to remove the (2) in the javadoc.

@aniket486
Copy link
Contributor

If TType is ENUM, isEnum would always be true.

I found a case with typedef where this is not true. This is possibly also the reason for parquet conversion failure.

private void printInfo (TStructDescriptor struct) {
    for (Field f : struct.getFields()) {
      System.out.println(f.getType());
      System.out.println(f.isEnum());
    }
  }

  @Test
  public void testThriftConverter() throws Exception {
    printInfo(TStructDescriptor.getInstance(TestNoTypeDef.class));
    printInfo(TStructDescriptor.getInstance(TestTypeDef.class));
  }
enum TestPhoneType {
  MOBILE = 0,
  HOME = 1,
  WORK = 2
}

typedef TestPhoneType PHONETYPE

struct TestNoTypeDef {
  1: TestPhoneType phn
}

struct TestTypeDef {
  1: PHONETYPE phn
}

Output: (16== ENUM)

16
true
16
false

@rangadi
Copy link
Contributor

rangadi commented Dec 27, 2013

ah, typedefs again. can you try the same with #356 applied?

We should merge #356

@aniket486
Copy link
Contributor

Works with #356.

@dvryaboy
Copy link
Contributor Author

Ah that's what it was. Thanks guys. Does the change to return a null instead of throwing an NPE still make sense? Should I just change the javadocs to get rid of (2)?

@rangadi
Copy link
Contributor

rangadi commented Dec 29, 2013

I think so. We can merge this patch with an update to javadoc (you also need to merge with latest master).

public TEnum getEnumValueOf(String name) {
return enumMap.get(name);
return enumMap == null ? null : enumMap.get(name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without reading the docs for optional -- would that require the user of the library to depend on guava? The current main user outside of EB is parquet, and that lib has almost no external dependencies to simplify integration.

On Dec 29, 2013, at 11:20 AM, "P. Oscar Boykin" notifications@github.com wrote:

In core/src/main/java/com/twitter/elephantbird/thrift/TStructDescriptor.java:

 public TEnum getEnumValueOf(String name) {
  •  return enumMap.get(name);
    
  •  return enumMap == null ? null : enumMap.get(name);
    
    why not use Guava Optional here?

http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/base/Optional.html


Reply to this email directly or view it on GitHub.

…o tstruct_enum_npe

Conflicts:
	core/src/main/java/com/twitter/elephantbird/thrift/TStructDescriptor.java
@dvryaboy
Copy link
Contributor Author

dvryaboy commented Jan 1, 2014

Javadoc updated.

rangadi added a commit that referenced this pull request Jan 2, 2014
Prevent NPE in TStructDescriptor when an unexpected enum is encountered
@rangadi rangadi merged commit 9e6a12e into twitter:master Jan 2, 2014
@rangadi
Copy link
Contributor

rangadi commented Jan 2, 2014

merged.

@julienledem
Copy link
Contributor

thanks! Looks good to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants