-
Notifications
You must be signed in to change notification settings - Fork 615
Add Superblock Scope #2665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add Superblock Scope #2665
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased)
section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @yelhamer, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello everyone, gemini-code-assist here to provide a summary of this pull request. This PR introduces the superblock
scope to capa, addressing the suggestion in issue #2657. The goal is to enable the identification of capabilities that manifest across a contiguous sequence of basic blocks within a function, often referred to as a superblock. This allows for more granular and potentially more accurate rule matching compared to just function or basic block scopes, by considering the control flow structure.
Highlights
- New Superblock Scope: Introduces a new
SUPERBLOCK
scope incapa.rules
to represent contiguous sequences of basic blocks within a function. Rules can now be defined and matched at this scope. - Superblock Matching Logic: Adds a
SuperblockMatcher
class incapa.capabilities.static.py
responsible for building a basic block flow graph for a function, collecting features from all basic blocks, performing initial rule matching at the superblock scope, and then pruning matches to ensure they only span contiguous basic blocks (superblocks) of length 2 or more. - Superblock Address Type: Defines a new
SuperblockAddress
class incapa.features.address.py
to represent the location of a matched superblock, storing a list of the addresses of the basic blocks it contains. - Extractor Updates: Adds abstract methods
get_next_basic_blocks
andget_basic_block_size
to theStaticFeatureExtractor
base class, and implements them for the Binja, dnfile, and Vivisect extractors. This provides the necessary control flow and size information for theSuperblockMatcher
. - Integration into Analysis and Rendering: Integrates the superblock matching process into the static analysis pipeline (
find_code_capabilities
,find_static_capabilities
), updates the result document structure (capa.render.result_document.py
) and freezing logic (capa.features.freeze/__init__.py
) to include and handle superblock matches and addresses, and updates the verbose renderer (capa.render.vverbose.py
,capa.render.verbose.py
) to display superblock locations. - Rule Parsing and Validation: Updates the rule parsing logic (
capa.rules.__init__.py
) to support thesuperblock
subscope within function scope rules and updates feature inheritance to reflect the new scope hierarchy (instruction -> basic block -> superblock -> function).
Changelog
Click here to see the changelog
- capa/capabilities/static.py
- Added
superblock_matches
toCodeCapabilities
. - Introduced
FlowGraphNode
andSuperblockMatcher
classes for superblock identification and matching. - Modified
find_code_capabilities
to useSuperblockMatcher
and include superblock results. - Modified
find_static_capabilities
to aggregate and process superblock matches.
- Added
- capa/features/address.py
- Added
SuperblockAddress
class to represent a sequence of basic block addresses.
- Added
- capa/features/extractors/base_extractor.py
- Added abstract methods
get_next_basic_blocks
andget_basic_block_size
.
- Added abstract methods
- capa/features/extractors/binja/extractor.py
- Implemented
get_next_basic_blocks
andget_basic_block_size
for the Binja extractor.
- Implemented
- capa/features/extractors/dnfile/extractor.py
- Implemented
get_next_basic_blocks
andget_basic_block_size
for the dnfile extractor.
- Implemented
- capa/features/extractors/viv/extractor.py
- Imported
envi
. - Implemented
get_next_basic_blocks
andget_basic_block_size
for the Vivisect extractor.
- Imported
- capa/features/freeze/init.py
- Added
SUPERBLOCK
toAddressType
enum. - Updated
Address
model to handleSuperblockAddress
values. - Added
from_capa
andto_capa
logic forSuperblockAddress
.
- Added
- capa/loader.py
- Added logic to track and include matched superblocks in the static layout computation.
- Updated
FunctionLayout
structure to includematched_superblocks
.
- capa/render/result_document.py
- Added
SuperblockLayout
model. - Added
matched_superblocks
field toFunctionLayout
.
- Added
- capa/render/verbose.py
- Added formatting for
SuperblockAddress
.
- Added formatting for
- capa/render/vverbose.py
- Modified rendering functions (
render_statement
,render_node
,render_match
) to handle theis_superblock
context for potentially rendering basic block locations within a superblock. - Added a debug log and relaxed a check in
render_rules
for file scope matches (potentially unrelated).
- Modified rendering functions (
- capa/rules/init.py
- Added
SUPERBLOCK
to theScope
enum. - Included
SUPERBLOCK
inSTATIC_SCOPES
andSTATIC_SCOPE_ORDER
. - Updated
SUPPORTED_FEATURES
inheritance: basic block features are available in superblock scope, and superblock features are available in function scope. - Added logic to
build_statements
to support thesuperblock
subscope within function rules. - Updated the error message for
instruction
subscope to includesuperblock
as a valid parent. - Added
SUPERBLOCK
toSTATIC_SCOPES_BY_DEPTH
. - Added
superblock_rules
property toRuleSet
.
- Added
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Basic blocks in a line,
A superblock, quite fine.
Features matched with care,
What malware is there?
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the superblock scope, which is a significant new feature for capa. The overall structure and integration across different modules seem well-thought-out.
However, there are a few critical concerns:
- Testing: The PR description states "No new tests needed". For a feature of this magnitude, comprehensive unit tests for
SuperblockMatcher
,SuperblockAddress
, and integration tests for rules utilizing the newsuperblock
scope are essential to ensure correctness and prevent regressions. This is a critical omission. - Documentation: Similarly, "No documentation update needed" is indicated. The new
superblock
scope, its behavior, and how to write rules for it must be documented for users and rule developers.
Addressing these points, along with the specific code comments below, will greatly improve the quality and reliability of this new feature. The introduction of SuperblockMatcher
and the associated logic in _prune
is quite complex and warrants careful review and thorough testing.
Summary of Findings
- Missing Tests: The pull request lacks tests for the new superblock scope functionality. Unit tests for
SuperblockMatcher
andSuperblockAddress
, as well as integration tests for rules using this scope, are crucial. - Missing Documentation: The new superblock scope is not documented. Users and rule developers will need guidance on how to use this feature and write rules for it.
- Potential Runtime Error in Superblock Pruning: In
capa/capabilities/static.py
, theget_superblock_head
function uses an undefined variablelocations
, which will likely cause aNameError
. - Incorrect Handling of File-Scope Rule Non-Matches in Renderer: In
capa/render/vverbose.py
, the logic for rendering file-scope rules appears to crash if a rule doesn't match, due to anIndexError
when accessingmatches[0]
. - Limited Superblock Support for .NET (dnfile): The
dnfile
extractor stubsget_next_basic_blocks
, meaning superblock analysis will not work for .NET files using this extractor. This limitation should be clarified. - Code Complexity: The
_prune
method inSuperblockMatcher
is complex and could benefit from simplification or more detailed comments for better maintainability.
Merge Readiness
This pull request introduces a valuable new feature. However, due to the critical issues identified (missing tests, missing documentation, and potential runtime errors), I recommend that these changes not be merged until these issues are thoroughly addressed. The complexity of the new matching logic also warrants careful testing. I am unable to approve this pull request; please ensure these concerns are resolved and further reviews are conducted before merging.
This PR is a suggested implementation for the superblock scope suggested in #2657 .
Checklist