Skip to content

Introduce semantic layer to prepare to share range analysis #7986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

dbartol
Copy link

@dbartol dbartol commented Feb 11, 2022

We'd like to share Java's range analysis with C++ (and Swift, C#, and maybe Go). Range analysis has several other interesting analyses on which it depends, including sign analysis, modulus analysis, and a little bit of constant analysis and nullness analysis. Sign analysis and modulus analysis are already shared with C#, but everything else is still Java-specific.

The first step to sharing all of this with other languages is to get the Java-specific portion separated from the sharable portion. After separate conversations with @aschackmull and @rdmarsh2, it seemed like a good time to consider a language-neutral interface that would allow all of these semantic analyses to work with multiple languages without having to adapt each {language, analysis} pair one at a time.

This PR factors out all of the dependencies that Java's range analysis had on import java. In their place, I've introduced a few modules under semmle.code.java.semantic.*:

  • SemanticExpr - Currently just directly wraps Java's Expr class, with the minimal set of subclasses and member predicates to make range analysis work. We'll need to think more about the right interface to expose here, such that it can be implemented relatively easy for each language.
  • SemanticSSA - Wraps a small subset of Java's SSA library, plus the SsaReadPosition stuff that was previously internal to range analysis.
  • SemanticCFG - Wraps a small subset of Java's CFG library.
  • SemanticGuard - Wraps a small subset of Java's guards library.
  • SemanticType - Unlike the wrappers above, this is separate concrete type system, populated from Java's type system. The interface is basically cut and pasted from what we've already been using as the IR's type system. Key differences from the Java type system include:
    • Numeric types are just described by their kind (signed, unsigned, FP) and size.
    • Character types are just integer types.
    • All pointers and references are just a single "address" type.
    • Classes (the object layout part, not the reference) are just an "opaque" type with a specific size.

I'm not claiming that any of the above semantic interfaces are what we should wind up with, but they're a good starting point by showing what we actually use today.

In adapting the existing Java analysis code to use the new interfaces, I started by replacing all uses of the Java-specific types with their semantic equivalents, and started fixing up compiler errors. Anything that was truly Java-specific was factored out into a separate file. For any dependencies on already-shared files, like sign analysis and modulus analysis, I added semantic wrappers for those files to avoid modifying any shared file. As a follow-up, we can port the sign and modulus analyses to the semantic interface and remove the need for those wrappers.

I don't expect to actually merge these changes into the Java repo until I've had a chance to try out the now-sharable analysis on a C++ adapter that exposes the semantic interfaces. This PR is mostly so interested parties can take a look. @aschackmull @rdmarsh2 @hvitved.

@dbartol dbartol closed this by deleting the head repository May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant