Skip to content

Preserve original_index across all conversion interfaces#105

Merged
PythonFZ merged 6 commits intomainfrom
feat/preserve-original-index
Apr 14, 2026
Merged

Preserve original_index across all conversion interfaces#105
PythonFZ merged 6 commits intomainfrom
feat/preserve-original-index

Conversation

@PythonFZ
Copy link
Copy Markdown
Member

@PythonFZ PythonFZ commented Apr 14, 2026

Summary

  • Fixes networkx2rdkit #104: original_index is now preserved across all nx/rdkit/ase conversions
  • networkx2rdkit: stores original_index as an RDKit atom int property
  • networkx2ase: stores original_index list in atoms.info
  • rdkit2ase: carries original_index from RDKit atom properties to atoms.info
  • rdkit2networkx: reads original_index from RDKit atom property when available, falls back to GetIdx()
  • Introduces molify.constants module with StrEnum keys (NodeAttr, EdgeAttr, GraphAttr) replacing magic strings

Test plan

  • 9 new tests covering all 4 conversion functions (sequential, subgraph, auto bond orders, fallback behavior)
  • Full suite: 450 tests pass, zero regressions
  • Pre-commit (ruff lint + format) clean

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Molecule conversions now support preservation and propagation of original atom indices when converting between ASE, NetworkX, and RDKit formats.
  • Refactor

    • Standardized internal attribute key management across molecule conversion modules.
  • Tests

    • Added comprehensive test coverage for original index propagation across supported molecule format conversions.
  • Chores

    • Updated .gitignore to exclude Git worktree directories.

PythonFZ and others added 2 commits April 14, 2026 16:04
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add original_index propagation to networkx2rdkit, networkx2ase,
rdkit2ase, and rdkit2networkx. Introduce molify.constants module
with StrEnum keys (NodeAttr, EdgeAttr, GraphAttr) replacing magic
strings in ase2x, networkx2x, and rdkit2x.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

Warning

Rate limit exceeded

@PythonFZ has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 38 minutes and 42 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 38 minutes and 42 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9c928498-e363-4212-8a77-a41b3e85529a

📥 Commits

Reviewing files that changed from the base of the PR and between 3ecc77b and 4648c18.

📒 Files selected for processing (9)
  • .gitignore
  • src/molify/ase2x.py
  • src/molify/compress.py
  • src/molify/constants.py
  • src/molify/pack.py
  • src/molify/smiles2x.py
  • src/molify/substructure.py
  • tests/integration/test_ase2x.py
  • tests/integration/test_networkx2x.py
📝 Walkthrough

Walkthrough

Introduces a standardized constants module defining string-based enums for networkx attribute keys. Refactors four conversion modules to use these constants instead of string literals, and implements optional propagation of ORIGINAL_INDEX metadata throughout the conversion pipeline (RDKit ↔ NetworkX ↔ ASE).

Changes

Cohort / File(s) Summary
Configuration
.gitignore
Added .worktrees/ directory to ignore patterns.
Constants Module
src/molify/constants.py
New module defining three StrEnum classes: NodeAttr (position, atomic_number, original_index, charge), EdgeAttr (bond_order), and GraphAttr (pbc, cell, connectivity, smiles).
Core Conversions
src/molify/ase2x.py, src/molify/networkx2x.py, src/molify/rdkit2x.py
Replaced hardcoded string attribute keys with constant references. Added optional propagation of NodeAttr.ORIGINAL_INDEX through the conversion pipeline: reads from source (RDKit properties or networkx nodes), stores in intermediate formats, and writes to target (ASE atoms.info or RDKit atom properties).
Integration Tests
tests/integration/test_networkx2x.py, tests/test_rdkit2ase.py
Added comprehensive test suites verifying ORIGINAL_INDEX preservation across conversions: TestNetworkx2RdkitOriginalIndex, TestNetworkx2AseOriginalIndex, TestRdkit2AseOriginalIndex, and TestRdkit2NetworkxOriginalIndex. Tests cover subgraphs, edge cases with None bond orders, and fallback behavior when properties are absent.

Sequence Diagram(s)

sequenceDiagram
    participant RDKit as RDKit Molecule
    participant NX as NetworkX Graph
    participant ASE as ASE Atoms
    
    rect rgba(100, 150, 200, 0.5)
    Note over RDKit,ASE: Forward Conversion: RDKit → NetworkX → ASE
    RDKit->>NX: rdkit2networkx()<br/>NodeAttr.ORIGINAL_INDEX<br/>from atom property
    NX->>NX: store in node attrs
    NX->>ASE: networkx2ase()<br/>collect ORIGINAL_INDEX
    ASE->>ASE: store in atoms.info[<br/>NodeAttr.ORIGINAL_INDEX]
    end
    
    rect rgba(200, 150, 100, 0.5)
    Note over RDKit,ASE: Reverse Conversion: NetworkX → RDKit
    NX->>RDKit: networkx2rdkit()<br/>read NodeAttr.ORIGINAL_INDEX
    RDKit->>RDKit: SetProp() on atom<br/>with ORIGINAL_INDEX value
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A nimble refactor, constants now reign,
String keys traded for enum domain,
Original indices preserved with care,
Through pipelines they dance everywhere!
The molecule remembers its kin,
As conversions spin 'round and begin. 🔄✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: introducing preservation of original_index across all conversion interfaces (networkx2rdkit, networkx2ase, rdkit2ase, rdkit2networkx) as the primary objective.
Linked Issues check ✅ Passed The PR fully addresses issue #104 by preserving original_index across conversions. networkx2rdkit stores it as RDKit atom property, while rdkit2ase, rdkit2networkx, and networkx2ase propagate it appropriately per the issue requirements.
Out of Scope Changes check ✅ Passed The PR introduces a new molify.constants module with StrEnum keys to replace magic strings, which is supporting infrastructure for the primary objective but not strictly required by issue #104. However, this standardization aligns with best practices and enables consistent attribute referencing.
Docstring Coverage ✅ Passed Docstring coverage is 94.44% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/preserve-original-index

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

PythonFZ and others added 2 commits April 14, 2026 16:11
`enum.StrEnum` was introduced in Python 3.11 but the project supports
>=3.10. Use `class StrEnum(str, Enum)` polyfill on older versions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/integration/test_networkx2x.py (1)

612-664: Excellent test coverage for original_index preservation.

The tests comprehensively cover:

  • Basic propagation from NetworkX to RDKit atoms
  • Subgraph scenarios with non-sequential node IDs
  • Auto bond order determination preserving original indices
  • NetworkX to ASE propagation

Minor: Static analysis flagged that node_id is extracted but unused in the loop at lines 622, 642, and 659. Consider renaming to _node_id to signal intentional disuse.

♻️ Proposed fix for unused loop variables
-        for i, (node_id, attributes) in enumerate(graph.nodes(data=True)):
+        for i, (_node_id, attributes) in enumerate(graph.nodes(data=True)):

Apply similarly at lines 642 and 659.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_networkx2x.py` around lines 612 - 664, Tests in
TestNetworkx2RdkitOriginalIndex use an unused loop variable named node_id in the
enumerations; rename node_id to _node_id in the three test methods
(test_original_index_stored_on_rdkit_atoms,
test_original_index_preserved_with_subgraph,
test_original_index_preserved_with_none_bond_orders) where the loop is written
as "for i, (node_id, attributes) in enumerate(...)" so static analysis stops
flagging unused variables while leaving behavior unchanged.
src/molify/ase2x.py (1)

165-186: LGTM with a note on cross-module consistency.

The ase2networkx function correctly uses GraphAttr.CONNECTIVITY and other constants for reading/writing attributes.

Note: The relevant code snippets show that src/molify/pack.py, src/molify/substructure.py, src/molify/compress.py, and src/molify/smiles2x.py still use the hardcoded string "connectivity". Since GraphAttr.CONNECTIVITY evaluates to "connectivity" (via StrEnum), this won't cause runtime issues. However, for long-term maintainability, consider migrating those modules to use the constants in a follow-up PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/molify/ase2x.py` around lines 165 - 186, Replace hardcoded "connectivity"
string usages in src/molify/pack.py, src/molify/substructure.py,
src/molify/compress.py, and src/molify/smiles2x.py with the
GraphAttr.CONNECTIVITY constant (the same constant used in ase2networkx) so
attribute reads/writes use the shared enum; locate places that access atom/info
or graph.graph with "connectivity" and swap the literal for
GraphAttr.CONNECTIVITY, ensuring imports of GraphAttr are added where missing
and tests still pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/molify/constants.py`:
- Around line 1-27: The file uses StrEnum (Python 3.11+) which breaks on Python
3.10; replace StrEnum usage by switching to an Enum that preserves string
behavior—import Enum and make the enums inherit (str, Enum) instead of StrEnum
for NodeAttr, EdgeAttr, and GraphAttr (keep the same member names and string
values such as "position", "bond_order", etc.) so existing code that expects
string-like enums continues to work.

---

Nitpick comments:
In `@src/molify/ase2x.py`:
- Around line 165-186: Replace hardcoded "connectivity" string usages in
src/molify/pack.py, src/molify/substructure.py, src/molify/compress.py, and
src/molify/smiles2x.py with the GraphAttr.CONNECTIVITY constant (the same
constant used in ase2networkx) so attribute reads/writes use the shared enum;
locate places that access atom/info or graph.graph with "connectivity" and swap
the literal for GraphAttr.CONNECTIVITY, ensuring imports of GraphAttr are added
where missing and tests still pass.

In `@tests/integration/test_networkx2x.py`:
- Around line 612-664: Tests in TestNetworkx2RdkitOriginalIndex use an unused
loop variable named node_id in the enumerations; rename node_id to _node_id in
the three test methods (test_original_index_stored_on_rdkit_atoms,
test_original_index_preserved_with_subgraph,
test_original_index_preserved_with_none_bond_orders) where the loop is written
as "for i, (node_id, attributes) in enumerate(...)" so static analysis stops
flagging unused variables while leaving behavior unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 57047009-1147-4100-a4ed-ca4496d6712a

📥 Commits

Reviewing files that changed from the base of the PR and between a1d86eb and 3ecc77b.

📒 Files selected for processing (7)
  • .gitignore
  • src/molify/ase2x.py
  • src/molify/constants.py
  • src/molify/networkx2x.py
  • src/molify/rdkit2x.py
  • tests/integration/test_networkx2x.py
  • tests/test_rdkit2ase.py

Comment thread src/molify/constants.py
PythonFZ and others added 2 commits April 14, 2026 16:24
Both _create_graph_from_connectivity and _add_node_properties now check
atoms.info for stored original_index before falling back to atom.index.
This closes the round-trip gap where nx -> ase -> nx lost the indices.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace remaining hardcoded "connectivity" and "smiles" string literals
in compress.py, pack.py, smiles2x.py, and substructure.py with
GraphAttr constants. Rename unused node_id to _node_id in tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@PythonFZ PythonFZ merged commit d1fcc79 into main Apr 14, 2026
22 checks passed
@PythonFZ PythonFZ deleted the feat/preserve-original-index branch April 14, 2026 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

networkx2rdkit

1 participant