Skip to content

Java: Delayed unsafe deserialization #8501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

artem-smotrakov
Copy link
Contributor

Deserialization can sometimes be implemented in two steps. An untrusted serialized object can be stored in a field but actual deserialization happens only when the object is necessary. CVE-2016-6194 in RabbitMQ is an example of such scenario (GitHub issue). Untrusted data that comes from a response is stored in RMQObjectMessage.buf field. Then, deserialization happens when getObject() method is called. Currently, java/unsafe-deserialization query doesn't catch this.

I'd like to propose the following updates to the query:

  • Add a flow step to the query that propagates taint from a byte array to a field. I suppose this step can match quite a lot of expressions. I guess enabling this step by default for all configs may be too much, but I think it should be okay for java/unsafe-deserialization query because unsafe deserialization is a severe issue.
  • Add flow steps for methods in DataInput and ObjectInput that read byte arrays.
  • Add flow sources and steps for RabbitMQ.

With these updates, the query detects CVE-2016-6194. Let me know what you think.

Copy link
Contributor

@Marcono1234 Marcono1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully these comments are useful. Feel free to consider them only suggestions since I am not a member of this project.

exists(FieldAccess access, Field field | field = access.getField() |
field.getType().hasName("byte[]") and
access = fromNode.asExpr() and
field.getAnAccess() = toNode.asExpr()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not an issue, but should this cast the access to FieldRead to explicitly exclude FieldWrite?

Suggested change
field.getAnAccess() = toNode.asExpr()
field.getAnAccess().(FieldRead) = toNode.asExpr()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I'm going to move this to a separate PR, let's discuss it there. Thanks for the suggestion!

ObjectInputStream ois = new ObjectInputStream(bais);
return ois.readObject(); // $unsafeDeserialization
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be good to also add a test where an assignment to a byte[] field transfers taint, e.g.:
(assuming that is the reason why you chose FieldAccess instead of the more specific FieldRead for your byteArrayFieldFlowStep predicate above)

...

class ObjectHolder {
    private byte[] buffer;

    public void init(byte[] tainted) throws Exception {
        buffer = tainted;
    }

    public Object getObject() throws Exception {
        ByteArrayInputStream bais = new ByteArrayInputStream(buffer);
        ObjectInputStream ois = new ObjectInputStream(bais);
        return ois.readObject(); // $unsafeDeserialization
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the idea!

@smowton
Copy link
Contributor

smowton commented Mar 21, 2022

I suggest splitting this PR into two parts: the uncontroversial addition of new sources, sinks and library-specific propagation steps, but then let's consider separately the difficult issue of when it's reasonable to treat a field write as a jump step (one that discards calling context, and supposes that probably at some point a read step takes place)

@pwntester
Copy link
Contributor

@artem-smotrakov can you show us in which specific case you need the byteArrayFieldFlowStep? It seems like there may be some framework-specific lifecycle (the framework calls user-defined methods as part of an event loop or similar) which we could model (example)

Co-authored-by: Marcono1234 <Marcono1234@users.noreply.github.com>
@artem-smotrakov
Copy link
Contributor Author

Thanks for the review and suggestions @Marcono1234 !

@artem-smotrakov
Copy link
Contributor Author

I suggest splitting this PR into two parts: the uncontroversial addition of new sources, sinks and library-specific propagation steps, but then let's consider separately the difficult issue of when it's reasonable to treat a field write as a jump step (one that discards calling context, and supposes that probably at some point a read step takes place)

Hi @smowton Sure, let's split it. I'm going to add a model for JMS API as well. If you're not sure about byteArrayFieldFlowStep, how about adding an experimental query for this kind of delayed deserialization?

@artem-smotrakov
Copy link
Contributor Author

hey @pwntester

can you show us in which specific case you need the byteArrayFieldFlowStep? It seems like there may be some framework-specific lifecycle (the framework calls user-defined methods as part of an event loop or similar) which we could model (example)

I got this idea while analysing a fix for CVE-2016-6194 in RabbitMQ. Here is the details.

RMQObjectMessage stores serialized data in buf field

https://github.com/rabbitmq/rabbitmq-jms-client/blob/3b907c4baea8ef9eb9f3fd73dd209d12948689fd/src/main/java/com/rabbitmq/jms/client/message/RMQObjectMessage.java#L26

Untrusted data can be written to buf by calling writeBody() method

https://github.com/rabbitmq/rabbitmq-jms-client/blob/3b907c4baea8ef9eb9f3fd73dd209d12948689fd/src/main/java/com/rabbitmq/jms/client/message/RMQObjectMessage.java#L74

But actuall deserialization of untrusted data happens later when getObject() is called

https://github.com/rabbitmq/rabbitmq-jms-client/blob/3b907c4baea8ef9eb9f3fd73dd209d12948689fd/src/main/java/com/rabbitmq/jms/client/message/RMQObjectMessage.java#L58

Honestly, it doesn't look framework-specific to me. Untrusted data flows via byte-arrays and standard Java I/O objects. That is why I thought we can cover this case in the query for unsafe deserialization. In CVE-2016-6194, untrusted data comes from an RabbitMQ message, therfore I think it makes sense to add a source for that.

As I suggested above, we could probably create an experimental query for this kind of delayed deserialization instaead of updating the existing one. What do you think?

@artem-smotrakov
Copy link
Contributor Author

artem-smotrakov commented Apr 17, 2022

I suggest splitting this PR into two parts: the uncontroversial addition of new sources, sinks and library-specific propagation steps

Hi @smowton I've moved this part to #8765 . Also, I've added a model for JMS API as @pwntester suggested.

@artem-smotrakov
Copy link
Contributor Author

let's consider separately the difficult issue of when it's reasonable to treat a field write as a jump step (one that discards calling context, and supposes that probably at some point a read step takes place)

@smowton I've moved this to a new experimental query so that this wouldn't affect the standard one.

#8766

Hopefully these comments are useful. Feel free to consider them only suggestions since I am not a member of this project.

@Marcono1234 I've applied your suggestions in the new pull request. Please have a look. Your commenta are as usual very useful, I am always happy to see them!

@artem-smotrakov
Copy link
Contributor Author

Hi @pwntester

can you show us in which specific case you need the byteArrayFieldFlowStep? It seems like there may be some framework-specific lifecycle

Honestly, it doesn't look framework-specific to me. Untrusted data flows via byte-arrays and standard Java I/O objects.

As an example, please have a look at the test from the new PR

https://github.com/github/codeql/pull/8766/files#diff-e799d2006a36e76240a753169ed054410e13dbd379365629a410d212bc91d0e2

@artem-smotrakov
Copy link
Contributor Author

This PR has been split to two

#8765 Adds new flow sources and steps for JMS API versions 1 and 2, RabbitMQ, and a few for Java I/O
#8766 Adds a new experimental query for delayed unsafe deserialization

Closing it.

@artem-smotrakov artem-smotrakov deleted the rabbitmq-deserialization branch April 18, 2022 10:48
@artem-smotrakov artem-smotrakov restored the rabbitmq-deserialization branch April 18, 2022 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants