Skip to content

Add Superblock Scope #2665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

Conversation

yelhamer
Copy link
Collaborator

This PR is a suggested implementation for the superblock scope suggested in #2657 .

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

@yelhamer yelhamer changed the title initial commit: add superblock scope Add Superblock Scope May 15, 2025
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @yelhamer, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello everyone, gemini-code-assist here to provide a summary of this pull request. This PR introduces the superblock scope to capa, addressing the suggestion in issue #2657. The goal is to enable the identification of capabilities that manifest across a contiguous sequence of basic blocks within a function, often referred to as a superblock. This allows for more granular and potentially more accurate rule matching compared to just function or basic block scopes, by considering the control flow structure.

Highlights

  • New Superblock Scope: Introduces a new SUPERBLOCK scope in capa.rules to represent contiguous sequences of basic blocks within a function. Rules can now be defined and matched at this scope.
  • Superblock Matching Logic: Adds a SuperblockMatcher class in capa.capabilities.static.py responsible for building a basic block flow graph for a function, collecting features from all basic blocks, performing initial rule matching at the superblock scope, and then pruning matches to ensure they only span contiguous basic blocks (superblocks) of length 2 or more.
  • Superblock Address Type: Defines a new SuperblockAddress class in capa.features.address.py to represent the location of a matched superblock, storing a list of the addresses of the basic blocks it contains.
  • Extractor Updates: Adds abstract methods get_next_basic_blocks and get_basic_block_size to the StaticFeatureExtractor base class, and implements them for the Binja, dnfile, and Vivisect extractors. This provides the necessary control flow and size information for the SuperblockMatcher.
  • Integration into Analysis and Rendering: Integrates the superblock matching process into the static analysis pipeline (find_code_capabilities, find_static_capabilities), updates the result document structure (capa.render.result_document.py) and freezing logic (capa.features.freeze/__init__.py) to include and handle superblock matches and addresses, and updates the verbose renderer (capa.render.vverbose.py, capa.render.verbose.py) to display superblock locations.
  • Rule Parsing and Validation: Updates the rule parsing logic (capa.rules.__init__.py) to support the superblock subscope within function scope rules and updates feature inheritance to reflect the new scope hierarchy (instruction -> basic block -> superblock -> function).

Changelog

Click here to see the changelog
  • capa/capabilities/static.py
    • Added superblock_matches to CodeCapabilities.
    • Introduced FlowGraphNode and SuperblockMatcher classes for superblock identification and matching.
    • Modified find_code_capabilities to use SuperblockMatcher and include superblock results.
    • Modified find_static_capabilities to aggregate and process superblock matches.
  • capa/features/address.py
    • Added SuperblockAddress class to represent a sequence of basic block addresses.
  • capa/features/extractors/base_extractor.py
    • Added abstract methods get_next_basic_blocks and get_basic_block_size.
  • capa/features/extractors/binja/extractor.py
    • Implemented get_next_basic_blocks and get_basic_block_size for the Binja extractor.
  • capa/features/extractors/dnfile/extractor.py
    • Implemented get_next_basic_blocks and get_basic_block_size for the dnfile extractor.
  • capa/features/extractors/viv/extractor.py
    • Imported envi.
    • Implemented get_next_basic_blocks and get_basic_block_size for the Vivisect extractor.
  • capa/features/freeze/init.py
    • Added SUPERBLOCK to AddressType enum.
    • Updated Address model to handle SuperblockAddress values.
    • Added from_capa and to_capa logic for SuperblockAddress.
  • capa/loader.py
    • Added logic to track and include matched superblocks in the static layout computation.
    • Updated FunctionLayout structure to include matched_superblocks.
  • capa/render/result_document.py
    • Added SuperblockLayout model.
    • Added matched_superblocks field to FunctionLayout.
  • capa/render/verbose.py
    • Added formatting for SuperblockAddress.
  • capa/render/vverbose.py
    • Modified rendering functions (render_statement, render_node, render_match) to handle the is_superblock context for potentially rendering basic block locations within a superblock.
    • Added a debug log and relaxed a check in render_rules for file scope matches (potentially unrelated).
  • capa/rules/init.py
    • Added SUPERBLOCK to the Scope enum.
    • Included SUPERBLOCK in STATIC_SCOPES and STATIC_SCOPE_ORDER.
    • Updated SUPPORTED_FEATURES inheritance: basic block features are available in superblock scope, and superblock features are available in function scope.
    • Added logic to build_statements to support the superblock subscope within function rules.
    • Updated the error message for instruction subscope to include superblock as a valid parent.
    • Added SUPERBLOCK to STATIC_SCOPES_BY_DEPTH.
    • Added superblock_rules property to RuleSet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Basic blocks in a line,
A superblock, quite fine.
Features matched with care,
What malware is there?

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the superblock scope, which is a significant new feature for capa. The overall structure and integration across different modules seem well-thought-out.

However, there are a few critical concerns:

  1. Testing: The PR description states "No new tests needed". For a feature of this magnitude, comprehensive unit tests for SuperblockMatcher, SuperblockAddress, and integration tests for rules utilizing the new superblock scope are essential to ensure correctness and prevent regressions. This is a critical omission.
  2. Documentation: Similarly, "No documentation update needed" is indicated. The new superblock scope, its behavior, and how to write rules for it must be documented for users and rule developers.

Addressing these points, along with the specific code comments below, will greatly improve the quality and reliability of this new feature. The introduction of SuperblockMatcher and the associated logic in _prune is quite complex and warrants careful review and thorough testing.

Summary of Findings

  • Missing Tests: The pull request lacks tests for the new superblock scope functionality. Unit tests for SuperblockMatcher and SuperblockAddress, as well as integration tests for rules using this scope, are crucial.
  • Missing Documentation: The new superblock scope is not documented. Users and rule developers will need guidance on how to use this feature and write rules for it.
  • Potential Runtime Error in Superblock Pruning: In capa/capabilities/static.py, the get_superblock_head function uses an undefined variable locations, which will likely cause a NameError.
  • Incorrect Handling of File-Scope Rule Non-Matches in Renderer: In capa/render/vverbose.py, the logic for rendering file-scope rules appears to crash if a rule doesn't match, due to an IndexError when accessing matches[0].
  • Limited Superblock Support for .NET (dnfile): The dnfile extractor stubs get_next_basic_blocks, meaning superblock analysis will not work for .NET files using this extractor. This limitation should be clarified.
  • Code Complexity: The _prune method in SuperblockMatcher is complex and could benefit from simplification or more detailed comments for better maintainability.

Merge Readiness

This pull request introduces a valuable new feature. However, due to the critical issues identified (missing tests, missing documentation, and potential runtime errors), I recommend that these changes not be merged until these issues are thoroughly addressed. The complexity of the new matching logic also warrants careful testing. I am unable to approve this pull request; please ensure these concerns are resolved and further reviews are conducted before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant