Skip to content

feat(query): polyglot from_directory with auto-detection#13

Merged
tob-scott-a merged 1 commit intomainfrom
polyglot-from-directory
Apr 23, 2026
Merged

feat(query): polyglot from_directory with auto-detection#13
tob-scott-a merged 1 commit intomainfrom
polyglot-from-directory

Conversation

@tob-scott-a
Copy link
Copy Markdown
Collaborator

QueryEngine.from_directory previously accepted one language at a time, but real repositories mix languages (Python + Solidity contracts, TypeScript + Rust, etc.). Callers had to build two engines and figure out how to combine them, or give up on multi-language analysis.

The language argument now accepts:

  • "auto" — walks the tree, detects every supported language with at least one matching file, parses each, and merges into a single graph. Skips common vendor directories (node_modules, .venv, target, etc.).
  • "python,rust" — explicit comma-separated list for when auto-detect would pull in too much or miss something.
  • "python" — single language (unchanged; the single-language path is preserved byte-for-byte when exactly one language is specified).

Entrypoint detection runs on the merged graph, so a repo with a Python main() and a Solidity external function surfaces both in attack_surface() from a single analyze call.

Also exposes detect_languages(path) as a public helper for callers that want the list without building a graph.

12 new tests: detection on single/multi/empty/vendor-heavy directories, auto-merge behavior, explicit list handling, error paths, and a regression guard that single-language behavior is unchanged.

QueryEngine.from_directory previously accepted one language at a time,
but real repositories mix languages (Python + Solidity contracts,
TypeScript + Rust, etc.). Callers had to build two engines and figure
out how to combine them, or give up on multi-language analysis.

The `language` argument now accepts:
- `"auto"` — walks the tree, detects every supported language with at
  least one matching file, parses each, and merges into a single graph.
  Skips common vendor directories (node_modules, .venv, target, etc.).
- `"python,rust"` — explicit comma-separated list for when auto-detect
  would pull in too much or miss something.
- `"python"` — single language (unchanged; the single-language path is
  preserved byte-for-byte when exactly one language is specified).

Entrypoint detection runs on the merged graph, so a repo with a Python
main() and a Solidity external function surfaces both in
attack_surface() from a single analyze call.

Also exposes `detect_languages(path)` as a public helper for callers
that want the list without building a graph.

12 new tests: detection on single/multi/empty/vendor-heavy directories,
auto-merge behavior, explicit list handling, error paths, and a
regression guard that single-language behavior is unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tob-scott-a tob-scott-a merged commit b3448e8 into main Apr 23, 2026
13 checks passed
@tob-scott-a tob-scott-a deleted the polyglot-from-directory branch April 23, 2026 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant