Skip to content

voardwalker-code/MA-Memory-Architect

Repository files navigation

MA v1.0 — Memory Architect

MA is the standalone AI development agent for NekoCore OS. It builds, researches, writes code, manages projects, runs recurring tasks, and maintains its own memory — all from a browser GUI or terminal CLI.

MA runs as a self-contained Node.js server with zero npm dependencies.


Quick Start

# 1. Start the server
node MA-Server.js

# 2. Open the GUI
# → http://localhost:3850

# 3. Configure your LLM (click ⚙ in the GUI, or use the CLI)
node MA-cli.js

On first launch, MA copies ma-config.example.jsonMA-Config/ma-config.json. Edit it or configure via the GUI.


Features

Feature Description
Multi-LLM Support OpenRouter, Ollama, OpenAI-compatible endpoints
Task Engine 8 task types with planning → execution → summary pipeline
Workspace Tools File read/write/list/delete/move/mkdir — sandboxed to MA-workspace/
Command Execution Sandboxed shell with configurable whitelist (30+ defaults)
Web Search & Fetch Search the web, fetch & extract page text
Memory System Episodic + semantic memory with keyword search
Knowledge Base 9 reference docs loaded on-demand by topic
Project Archives Persistent project state with open/close/status lifecycle
Agent Catalog 6 specialist agents (code-reviewer, senior-coder, etc.)
Blueprint System Task-type-specific execution guides for plan/execute/summarize phases
Slash Commands 25 commands for health, memory, knowledge, projects, config, pulses, chores, models
File Context Auto-detects file paths in chat and reads them for context
Drag & Drop Drop files into the GUI chat — content sent to MA as context
Ollama Integration Browse local models, pull new ones, auto-fill maxTokens from model info
Intelligent Model Routing Evaluates tasks, selects the best model from user roster, local-first, learns from results
Model Performance Tracking Records model grades per task type/language, avoids poor performers, promotes good ones
Token Budget Tracks context usage, reserves response budget, shows usage bar (up to 1M tokens)
Auto Self-Review Reads back written files to verify completeness
History Compression Compresses older chat turns to fit long conversations in context
Continuation Graceful stop/continue when hitting token limits
Pulse Engine Timer-driven recurring tasks: health scans, chore execution
Chores System Repeating tasks delegated to agents, graded by MA
Health Scanner 20-file integrity check with critical/warning reporting
User Guide Built-in HTML user guide accessible from the GUI (? button)

Architecture

MA/
├── MA-Server.js           HTTP server (port 3850)
├── MA-cli.js              Terminal CLI
├── MA-server/             Core modules (15 files)
│   ├── MA-core.js         Bootstrap, state, chat orchestration
│   ├── MA-llm.js          LLM calling (OpenRouter / Ollama / model management)
│   ├── MA-tasks.js        Intent classifier + task runner
│   ├── MA-pulse.js        Pulse engine (timers, health scans, chores)
│   ├── MA-model-router.js Intelligent model selection + performance tracking
│   ├── MA-workspace-tools.js  Tool execution engine
│   ├── MA-cmd-executor.js Sandboxed shell + whitelist
│   ├── MA-web-fetch.js    Web search / fetch
│   ├── MA-memory.js       Memory store (episodic/semantic)
│   ├── MA-project-archive.js  Project lifecycle
│   ├── MA-agents.js       Agent catalog
│   ├── MA-health.js       System health scanner
│   ├── MA-rake.js         RAKE keyword extraction
│   ├── MA-bm25.js         BM25 search scoring
│   └── MA-yake.js         YAKE keyword extraction
├── MA-client/             Browser GUI
│   └── MA-index.html      Single-file SPA
├── MA-Config/             Runtime config (gitignored)
├── MA-entity/             Entity definitions + agent roster
├── MA-knowledge/          Reference documentation (9 docs)
├── MA-blueprints/         Task execution guides
│   ├── core/core/         5 core blueprints
│   ├── modules/modules/   8 task-type blueprints
│   ├── nekocore/          NekoCore build blueprint (5 parts)
│   └── rem-system/        REM System build blueprint (6 layers)
├── MA-workspace/          Sandboxed project workspace
│   ├── rem-system/        REM System Core (23 modules, 205 tests)
│   └── nekocore/          NekoCore Cognitive Mind (97 modules, 176 tests)
├── MA-logs/               Pulse logs (health scans, chore results)
└── MA-scripts/            Utility scripts

Configuration

LLM Setup

Edit MA-Config/ma-config.json or use the GUI settings panel (⚙):

{
  "type": "openrouter",
  "endpoint": "https://openrouter.ai/api/v1/chat/completions",
  "apiKey": "sk-or-...",
  "model": "anthropic/claude-sonnet-4",
  "maxTokens": 12288
}
Field Values Default
type openrouter, ollama
endpoint API URL
apiKey Your key (blank for Ollama)
model Model identifier
maxTokens 1024–1000000 12288

Ollama (Local)

{
  "type": "ollama",
  "endpoint": "http://localhost:11434",
  "apiKey": "",
  "model": "llama3.1:8b",
  "maxTokens": 8192
}

When Ollama is selected in the GUI, the model field becomes a dropdown populated from your local Ollama instance. Selecting a model auto-fills maxTokens from the model's context length. You can also pull new models directly from the settings panel.

Model Roster (Intelligent Routing)

MA can route tasks to different models based on job requirements. Configure a roster of available models in MA-Config/model-roster.json or via /models add:

{
  "models": [
    {
      "id": "ollama/llama3.1:8b",
      "provider": "ollama",
      "model": "llama3.1:8b",
      "endpoint": "http://localhost:11434",
      "contextWindow": 131072,
      "tier": "local",
      "strengths": ["python", "javascript"],
      "weaknesses": ["rust"]
    }
  ]
}

MA evaluates each task's complexity, language, and context needs, then selects the best model:

  • Local models first — always prefers free local models when they can handle the job
  • Performance learning — tracks model grades (A–F) per task type and language
  • Strength/weakness matching — avoids models with known weaknesses for the task
  • Tier escalation — only uses premium models for complex/architect-level work
  • Cost efficiency — prefers cheaper models when quality is comparable

Use /models research <name> to have MA research a model's capabilities via the LLM.

Command Whitelist

MA can only execute commands on the whitelist. Managed via:

  • GUI: Settings → Command Whitelist tab
  • Slash: /whitelist, /whitelist add, /whitelist remove, /whitelist reset
  • File: MA-Config/cmd-whitelist.json

Default whitelist includes: cargo, rustc, python, node, npm, gcc, go, git, cat, grep, and more. Dangerous binaries (rm, curl, bash, powershell, etc.) are always blocked.


Ports

Port Purpose
3850 Default
3851–3860 Fallback range if default is busy

MA uses smart port management: if port 3850 is occupied, the server identifies what's running, prompts you, and starts on the next available port. Background launches (e.g. from the process manager) auto-resolve without prompting.


API Reference

Endpoint Method Body Description
/api/chat POST { message, history?, attachments? } Chat / run tasks
/api/config GET Get config status
/api/config POST { type, endpoint, apiKey, model, maxTokens } Set config
/api/entity GET Get entity info
/api/health GET System health scan
/api/commands GET List available slash commands
/api/slash POST { command } Execute slash command
/api/whitelist GET Get command whitelist
/api/whitelist/add POST { binary, subcommands? } Add to whitelist
/api/whitelist/remove POST { binary } Remove from whitelist
/api/whitelist/reset POST {} Reset to defaults
/api/ollama/models GET ?endpoint=... List local Ollama models
/api/ollama/show POST { endpoint?, model } Get model info (context length, etc.)
/api/ollama/pull POST { endpoint?, model } Pull a model from Ollama
/api/pulse/status GET Pulse timer status + config
/api/pulse/config POST { healthScan?, choreCheck? } Update pulse config
/api/pulse/start POST Start all pulses
/api/pulse/stop POST Stop all pulses
/api/pulse/logs GET ?type=health&lines=50 Read pulse logs
/api/chores GET List all chores
/api/chores/add POST { name, description?, assignTo?, intervalMs? } Add a chore
/api/chores/update POST { id, ...fields } Update a chore
/api/chores/remove POST { id } Remove a chore
/api/models/roster GET List model roster
/api/models/add POST { provider, model, endpoint, ... } Add model to roster
/api/models/update POST { id, ...fields } Update a roster model
/api/models/remove POST { id } Remove model from roster
/api/models/route POST { message, taskType?, agentRole? } Test model routing for a job
/api/models/performance GET All model performance records
/api/models/research POST { model } Research model capabilities via LLM
/api/memory/search GET ?query=...&limit=5 Search memories
/api/memory/store POST { type, content, meta } Store a memory
/api/memory/stats GET Memory statistics
/api/memory/ingest POST { filePath } Ingest file to memory

Tools Available to MA

MA uses these tools via [TOOL:name {json}] blocks in LLM output. Params are validated with Zod schemas.

Tool Usage Description
ws_list [TOOL:ws_list {"path":"dir/"}] List directory
ws_read [TOOL:ws_read {"path":"file"}] Read file (≤32KB)
ws_write [TOOL:ws_write {"path":"file"}]content[/TOOL] Write file
ws_append [TOOL:ws_append {"path":"file"}]content[/TOOL] Append to file
ws_delete [TOOL:ws_delete {"path":"file"}] Delete file/folder
ws_mkdir [TOOL:ws_mkdir {"path":"dir/"}] Create directory
ws_move [TOOL:ws_move {"src":"old","dst":"new"}] Move/rename file
web_search [TOOL:web_search {"query":"search"}] Web search
web_fetch [TOOL:web_fetch {"url":"https://..."}] Fetch page text
cmd_run [TOOL:cmd_run {"cmd":"command"}] Run shell command

All file tools are sandboxed to MA-workspace/. Command execution is sandboxed via the whitelist.


Memory & Session Persistence

How Chats & Memories Are Stored

MA maintains persistent memory across sessions using a flat-file storage system with full text indexing. Every conversation, task, and insight is automatically stored and becomes searchable context for future interactions.

Memory Types

Type Storage Purpose Persistence
Episodic MA-entity/entity_ma/memories/episodic/ Individual conversation events, tasks completed, interactions ✓ Full
Semantic MA-entity/entity_ma/memories/semantic/ Abstracted knowledge, patterns, insights extracted from episodes ✓ Full
Chat History MA-Config/chat-history.json Full chat transcript (user messages + MA responses) ✓ Full

Each memory record is a folder containing:

  • record.json — Metadata (id, type, topics, importance, decay, timestamps, access history)
  • semantic.txt — Plain-text content (what gets loaded into LLM context)

Storage Location

MA-entity/entity_ma/
├── memories/
│   ├── episodic/           ← Conversation events (~500 bytes–50 KB per memory)
│   │   ├── mem_xxx_yyy/
│   │   │   ├── record.json
│   │   │   └── semantic.txt
│   │   └── mem_aaa_bbb/
│   │       └── ...
│   ├── semantic/           ← Extracted knowledge (auto-compressed)
│   │   └── mem_ccc_ddd/
│   │       └── ...
│   └── index/
│       ├── memoryIndex.json  ← Fast topic-to-memory lookup tables
│       └── topicIndex.json   ← Topic frequency tracking
└── index/
    └── memoryIndex.json     ← Global index for all memories

Capacity & Scaling

MA's memory system has no hard storage limit and scales linearly:

  • Per-entity capacity: Depends on available disk space. Typical usage is ~50–200 MB per 10,000 memories.
  • Indexing: In-memory lookup uses three indexes (topic-to-memory, topic counts, recency). Indexes are cached and persisted to disk.
  • Search time: O(1) topic lookup + O(n) BM25 scoring over matching candidates, capped at configurable limit (default 10 results).
  • Retrieval: Automatic pagination via limit parameter; default returns top 10 most relevant memories per query.

Typical memory volume per user:

  • First week: ~50–100 episodic memories (5–10 per day)
  • First month: ~400–800 memories across episodic + semantic
  • After 1 year: ~5,000–15,000 memories (manageable, searches remain <100ms)

Retrieval & Ranking

When you ask MA something, it automatically searches stored memories using:

  1. Keyword Extraction — Your message is analyzed with RAKE (Rapid Automatic Keyword Extraction) and YAKE (Yet Another Keyword Extractor) to pull out key topics and phrases.

  2. Index Lookup — MA checks the topic index for matching memories in O(1) time.

  3. BM25 Scoring — For each candidate memory, MA ranks by:

    • Relevance (45%): How well the memory's topics match your query
    • Importance (35%): Manually-weighted importance score (0.0–1.0) assigned when the memory was stored
    • Recency Decay (20%): Older memories decay naturally over time (1% per day for standard memories, but minimum floor of 0.1 so nothing is ever fully forgotten)
  4. Access tracking — Every retrieved memory gets an updated access count and timestamp. Frequently-used memories are candidates for consolidation into semantic knowledge.

Result: Most relevant memories appear first; you get consistent context retrieval without manual tagging.

Cross-Session Continuity

MA remembers everything from previous conversations:

  • Start a new chat session → automatic memory search for relevant prior work
  • Mention a past project → MA finds all related memories and loads context
  • Reference your preferences → MA recalls them from semantic memory even months later
  • Return after days/weeks → Full chat history available plus all insights from intervening work

Example: If you ask "How did we solve the async bug last month?" MA will:

  1. Search episodic memories for "async bug" + related terms
  2. Retrieve the original debugging session
  3. Return the solution code + decision rationale
  4. Surface any follow-up notes or related errors

Memory Operations via API

Endpoint Method Purpose
/api/memory/search GET Search all memories by query, returns ranked results with scores
/api/memory/store POST Manually store episodic or semantic memory (rarely needed — MA auto-stores)
/api/memory/stats GET Get counts: episodic + semantic, total memory size, index health
/api/memory/ingest POST Ingest a file (project archive, codebase, documentation) as chunked semantic memories

Example stats output:

{
  "episodic": 342,
  "semantic": 47,
  "total": 389,
  "indexHealth": "valid",
  "diskUsage": "12.4 MB",
  "topicsTracked": 156,
  "avgAccessCount": 2.3
}

Memory Best Practices

  1. Search before starting — Use /memory stats or /memory search <topic> to understand what MA already knows about a subject.

  2. Tag important findings — When storing manual memories via the API, set importance: 0.7+ for insights you'll want to prioritize in future searches.

  3. Ingest documentation — Use /memory ingest <file> to load project READMEs, architecture docs, or codebase snapshots. MA chunks them automatically and indexes all topics.

  4. Review consolidation — As episodic memories age, MA automatically compresses related ones into semantic knowledge. Check with /memory stats to see the semantic knowledge growing.

  5. Reset if needed — Use node MA-Reset-All.js to clear all memories (wipes MA-entity/entity_ma/memories/ completely). Chat history is separate and can be reset independently.


Health Check

node -e "const h=require('./MA-server/MA-health');console.log(h.formatReport(h.scan()))"

Reports: file count, critical errors, warnings. Checks JS syntax, JSON validity, HTML tag balance.


Made With MA

Projects built using MA:

  • NekoCore OS — Cognitive WebOS built with Memory Architect

Version

MA v1.0 — Part of NekoCore OS.

License

MIT License - See LICENSE for details.

Part of NekoCore OS.

About

Memory Architect (MA) is a standalone AI development agent that automates your entire development workflow. Written in Node.js with zero npm dependencies, MA builds, researches, writes code, manages projects, and maintains its own persistent memory across sessions.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors