Skip to content

Phase 2: Noosphere Engine analytics modules#188

Open
user1303836 wants to merge 4 commits intomainfrom
feature/noosphere-analytics
Open

Phase 2: Noosphere Engine analytics modules#188
user1303836 wants to merge 4 commits intomainfrom
feature/noosphere-analytics

Conversation

@user1303836
Copy link
Owner

Summary

Implements six Phase 2 analytics modules for the Noosphere Engine, following the architecture spec from chaos_research.md (ARCHITECT-1 through ARCHITECT-4).

Modules

  • Resonance Mirror (resonance_mirror/): Discord embed visualization of community coherence. Uses EI-driven color coding, Unicode bars, and trend arrows. Slash command: /mirror
  • Attractor Dashboard (attractor_dashboard/): 9-metric display (coherence, momentum, topic entropy/churn, activity entropy, reply depth, modularity) with change-point detection via ruptures. Slash command: /dashboard
  • Cordyceps Audit (cordyceps_audit/): Standalone bot influence measurement. Herfindahl index for message concentration + vocabulary Jaccard for linguistic mimicry. Slash command: /cordyceps
  • Morphogenetic Field (morphogenetic_field/): User interaction graph with cosine coupling scores, NetworkX modularity. Slash command: /morph
  • Cryptobiosis Mode (cryptobiosis.py): 3-state machine (active/entering/cryptobiotic) with configurable activity thresholds. Emits cryptobiosis_trigger events.
  • Pathology Detection (pathology.py + pathology_cog.py): Echo chamber, bot dominance, server death, flame war, clique formation. Z-score with hybrid absolute thresholds per Arch-2/Arch-3 spec. Slash command: /pathology

Architecture

  • All modules consume CommunityStateVector via on_state_vector_updated event listener
  • All unprompted output goes through bot.output_governor.send() (OutputGovernor pattern)
  • Shared data models in noosphere/shared/models.py (CommunityStateVector, ProcessedMessage)
  • Dependencies on Phase 0/1 infrastructure are coded to the spec interfaces

Dependencies added

  • ruptures -- change-point detection for Attractor Dashboard
  • networkx -- graph analysis for Morphogenetic Field modularity

Tests

85 tests covering all business logic across 7 test files:

  • Resonance mirror: bar rendering, color coding, trend arrows
  • Attractor dashboard: metric extraction, change-point detection, formatting
  • Cordyceps audit: Herfindahl index, vocabulary Jaccard, parasitism scoring
  • Morphogenetic field: user state, coupling computation, graph modularity
  • Cryptobiosis: state machine transitions, wake/sleep behavior
  • Pathology: all 5 detectors with healthy/flagged cases, z-score thresholds

Verification

  • ruff check . -- All checks passed
  • ruff format --check . -- All files formatted
  • mypy src/ -- Success: no issues found
  • pytest tests/ -- 676 passed (85 new + 591 existing)

Implement six Phase 2 features for the Noosphere Engine:

- Resonance Mirror: Discord embed visualization of community coherence
  with EI-driven color coding, Unicode bars, and trend arrows
- Attractor Dashboard: 9-metric display (coherence, momentum, topic
  entropy/churn, activity entropy, reply depth, modularity) with
  change-point detection via ruptures library
- Cordyceps Audit: Standalone bot influence measurement using Herfindahl
  index for message concentration and vocabulary Jaccard for linguistic
  mimicry detection
- Morphogenetic Field: User interaction graph with cosine coupling
  scores and NetworkX modularity computation
- Cryptobiosis Mode: 3-state machine (active/entering/cryptobiotic)
  with configurable activity thresholds
- Pathology Detection: Echo chamber, bot dominance, server death, flame
  war, and clique formation detection using z-score with hybrid absolute
  thresholds

All modules consume the CommunityStateVector via discord.py event bus
and follow the OutputGovernor pattern from the architecture spec.

85 tests covering all business logic. Dependencies: ruptures, networkx.
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@user1303836
Copy link
Owner Author

Code Review: PR #188 -- Noosphere Engine Phase 2 Analytics

Verdict: Changes Requested -- 6 issues to address before merge, 5 positive observations.


Issues Requiring Changes

1. shared/models.py diverges from canonical shared/data_models.py -- shared/models.py

This PR ships its own shared/models.py with int IDs, list[float] embeddings, and bare str classification. Per our alignment agreement:

  • IDs must be str (existing codebase uses String(36) throughout)
  • Embeddings should be np.ndarray | None (not list[float])
  • Classification should use MessageClassification enum (not bare str)
  • The canonical shared models live in shared/data_models.py (owned by dev-foundation)

This file needs to be deleted and all imports redirected to intelstream.noosphere.shared.data_models. This is the same issue we flagged during interface alignment -- it must be resolved before merge.

2. int IDs used throughout all modules -- all cog and business logic files

All cogs use int for guild_id/user_id (e.g., _latest: dict[int, CommunityStateVector] in resonance_mirror/cog.py:25, dict[int, dict[int, int]] in cordyceps_audit/cog.py:25, UserState.user_id: int in morphogenetic_field/field.py:15). These must be str to match the existing codebase convention and the canonical shared models.

Affected files:

  • resonance_mirror/cog.py lines 25-26
  • cordyceps_audit/audit.py line 14 (dict[int, int])
  • cordyceps_audit/cog.py lines 24-26
  • morphogenetic_field/field.py lines 15-16, 31-32, 38, 41, 63, 71, 86
  • morphogenetic_field/cog.py lines 22-23
  • cryptobiosis.py line 85
  • pathology_cog.py lines 27-28
  • All test files using int IDs

3. CryptobiosisState and PathologyType use bare enum.Enum -- cryptobiosis.py:17, pathology.py:11

Per alignment agreement, all enums must use str, enum.Enum for JSON serialization compatibility. Currently:

class CryptobiosisState(enum.Enum):  # Should be (str, enum.Enum)
class PathologyType(enum.Enum):      # Should be (str, enum.Enum)

4. MorphogeneticFieldCog._on_message passes msg.embedding as-is to update_user -- morphogenetic_field/cog.py:33

update_user expects list[float] (field.py:44), but the canonical ProcessedMessage.embedding is np.ndarray | None. Two problems:

  • No null check: if msg.embedding is None, np.array(None) will produce a 0-d array, breaking all downstream math
  • Type mismatch: np.ndarray gets re-wrapped in np.array() unnecessarily

Fix: guard against None, and accept np.ndarray directly in update_user.

5. top_couplings is O(n^2) with no limit on user count -- morphogenetic_field/field.py:86-94

For large servers with thousands of active users, top_couplings computes all pairwise cosine similarities. With 1000 users, that's ~500k dot products. Consider either:

  • Capping the user pool to the most-active N users before computing
  • Or documenting this as a known limitation for Phase 2

6. Attractor dashboard interaction_modularity in METRIC_FIELDS -- attractor_dashboard/metrics.py:19

The dashboard displays interaction_modularity, but this field on CommunityStateVector defaults to 0.0 and nothing in Phase 1 computes it. The Morphogenetic Field module computes modularity separately in its own state. This means the dashboard will always show 0.000 for Modularity unless something populates csv.interaction_modularity. Either:

  • Document that this field requires the Morphogenetic Field module to be running and wired into the state vector pipeline
  • Or add a math.nan sentinel check and display "N/A" when the field is uncomputed

Positive Observations

  • Business logic / cog separation: Every module cleanly separates pure computation (analyzer.py, audit.py, metrics.py, field.py, pathology.py) from Discord cog wiring. This makes testing straightforward and is a good pattern.

  • Pathology detection design: Hybrid approach using both z-score (relative to baseline) and absolute thresholds is exactly what Arch-2/Arch-3 specifies. The detect_echo_chamber dual condition (csv.egregore_index > 0.85 AND csv.topic_entropy < 1.0) OR (ei_z > 2.0 AND entropy_z < -2.0) is well-designed.

  • AttractorDashboard bounded history: MAX_HISTORY = 168 (one week of hourly snapshots) with proper truncation at history[-MAX_HISTORY:] prevents unbounded growth. Good -- this is exactly what PR Add Noosphere Engine foundation (Phase 0 + Phase 1) #187's MetricsComputer needs.

  • CryptobiosisMonitor state machine: Clean 3-state machine (active/entering/cryptobiotic) with separate entering-threshold and dormancy-threshold. The wakeup-threshold for sustained activity before returning to active is a thoughtful detail that prevents flapping.

  • Cordyceps VocabularyTracker: Counter.most_common(top_n) caps memory by only tracking the top 200 terms per side. Clean design.


Test Coverage Assessment

85 tests across 7 files. Coverage is thorough:

  • Cordyceps: 16 tests (HHI, Jaccard, parasitism, tracker)
  • Morphogenetic: 13 tests (user state, coupling, interactions, modularity)
  • Pathology: 11 tests (all 5 detectors + scan aggregation)
  • Cryptobiosis: 6 tests (all state transitions)
  • Attractor: 10 tests (extraction, change points, formatting)
  • Resonance: 11 tests (color, bar, trend, lines)
  • Shared models: 4 tests

Missing: no cog-level integration tests. That's acceptable for Phase 2 -- cog wiring is mostly boilerplate that delegates to tested business logic.


@greptile

- Rename shared/models.py to shared/data_models.py (canonical location)
- Adopt dev-foundation's CommunityStateVector: sentiment_alignment and
  interaction_modularity default to math.nan instead of 0.0
- Adopt dev-foundation's ProcessedMessage: add is_bot field, use
  MessageClassification enum, embedding as np.ndarray | None
- Add MessageClassification enum to constants.py
- Update cordyceps cog to use msg.is_bot instead of bot user ID check
- Update morphogenetic field cog to handle None embedding
- Update field.py to accept np.ndarray via np.asarray
- Update all imports across 9 source files and 4 test files
- Update test_shared_models.py for new model shapes
Align with dev-foundation's canonical data_models.py which uses str
for guild_id, user_id, channel_id, and message_id throughout.

- Update data_models.py: all ID fields from int to str
- Update morphogenetic_field: UserState, CouplingResult, MorphogeneticField
  and all method signatures to use str IDs
- Update cordyceps_audit: message_counts dict keys from int to str
- Update all cog dict type annotations from dict[int, ...] to dict[str, ...]
- Convert discord.py int IDs via str() in on_message and command handlers
- Update all test files to use string IDs
- Use enum.StrEnum for PathologyType, CryptobiosisState,
  MessageClassification (fixes ruff UP042)
- Document O(n^2) complexity in top_couplings
- Note interaction_modularity dependency on Phase 1 metrics_computer
@user1303836
Copy link
Owner Author

Re-review: All 6 Items Addressed

I've re-read every source and test file in this PR against the original review. All items are resolved.

Original Issues -- Status

# Issue Status
1 shared/models.py diverges from canonical data_models.py Fixed -- renamed to shared/data_models.py, matches canonical ProcessedMessage and CommunityStateVector
2 int Discord IDs throughout Fixed -- all IDs are str, cogs convert at boundary via str(interaction.guild.id), tests use "1", "42" etc.
3 Enums use bare enum.Enum Fixed -- CryptobiosisState, PathologyType, MessageClassification all use enum.StrEnum
4 MorphogeneticFieldCog._on_message missing null check on msg.embedding Fixed -- if msg.embedding is None: return at line 32
5 top_couplings O(n^2) Acceptable -- comment documents the tradeoff, fine for typical guild sizes
6 Dashboard interaction_modularity always 0 Improved -- CommunityStateVector now defaults to math.nan instead of 0.0, clearly signaling "not yet populated"

Verification Checklist

  • All imports point to intelstream.noosphere.shared.data_models (not the old shared/models)
  • ProcessedMessage has is_bot: bool field; CordycepsAuditCog uses msg.is_bot instead of bot user ID comparison
  • CommunityStateVector has sentiment_alignment, interaction_modularity, fractal_dimension, lyapunov_exponent, gromov_curvature with math.nan defaults
  • Event listeners use correct signatures: on_state_vector_updated receives CommunityStateVector, on_message_processed receives ProcessedMessage
  • AttractorDashboardCog._history bounded by MAX_HISTORY = 168
  • MorphogeneticFieldCog._reply_cache bounded with eviction at 10,000 entries
  • Test data uses str IDs consistently across all 7 test files
  • 85 new tests covering all 6 modules

Minor Note (not blocking)

data_models.py uses TYPE_CHECKING imports for datetime, numpy, and MessageClassification (lines 7-12). This is the same pattern from PR #187 and was flagged there -- not a new issue for this PR.

Verdict: Approve

This PR is clean and ready to merge. Well-structured modules with good test coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant