Skip to content

victorlavrenko/answer-engineering

Repository files navigation

Answer Engineering

Answer Engineering is a Python library for steering language model generation with explicit, local rules.

It is designed for situations where the path to an answer matters, not just the final output — for example when models must follow engineering practices, clinical protocols, safety procedures, or organizational standards.

Instead of retraining the model or post-processing the output, Answer Engineering intervenes during generation and redirects specific steps in real time. The result is behavior that is inspectable, reproducible, and policy-constrained.


Quickstart

The fastest way to understand the value of Answer Engineering is to run the notebook:

The problem

Language models are very good at producing output — sometimes too good.

For example, in vibe-coding tasks, models often start implementing new code immediately because they are trained to generate solutions. Human engineers, however, frequently pause to check whether an existing library or component can be reused instead.

This mismatch leads to unnecessary complexity, duplicated logic, and code that violates team conventions.

What the notebook does

The notebook demonstrates how a single rule can redirect the generation path.

The rule says:

Replace "Implement" with "Consider reusing an existing" and continue from there

This intervention happens locally during generation — at the moment the target phrase appears — not after the answer is finished.

What you will see

The notebook runs the same prompt twice:

  1. Baseline generation

    • The model immediately writes new code from scratch.
  2. Generation with Answer Engineering

    • The rule intercepts the reasoning step.
    • The model pauses to evaluate reuse options.
    • The final answer uses an existing component instead of reimplementing one.

The saved output cell shows the divergence clearly: the baseline implements, while the guided run reuses.

This is the core idea of Answer Engineering:

Change the trajectory, and the outcome changes naturally.


Why this project exists

After running the Quickstart, the key observation becomes hard to ignore: generation can be corrected locally, and the model will continue naturally from the corrected state.

Large language models do not maintain hidden commitments to earlier tokens. They simply continue from the visible text prefix. This means that when a trajectory step is edited — for example, redirecting "Implement" toward "Consider reusing an existing component" — the model proceeds as if that step had always been written that way.

In other words, trajectory correction is not a hack. It is a property of how autoregressive generation works.

Once this is understood, allowing generation to proceed without correction in protocol-sensitive settings starts to look like an unnecessary risk. If a step can be repaired immediately, the downstream reasoning — and the final outcome — can change in a predictable way.

This matters beyond individual steps.

Small trajectory corrections accumulate. A corrected assumption leads to a different branch of reasoning. A different branch of reasoning leads to a different decision. And a different decision often determines whether the system behaves safely, efficiently, or correctly.

Answer Engineering exists to make this capability explicit and reliable.

It provides a runtime layer that can:

  • detect when a trajectory enters a risky or non-compliant path
  • apply deterministic local edits at that moment
  • continue generation from the corrected state
  • record what changed and why

The result is not just cleaner intermediate steps, but more dependable final answers — because the reasoning path itself stayed within the required boundaries.

This repository includes both the runtime implementation and a reproducible evaluation pipeline that demonstrates this effect in a controlled benchmark.

For the full research description of the system, see:


Repository structure

This repository contains two related layers.

For the full documentation index, start with docs/README.md.

answer_engineering

The runtime library.

It provides:

  • rule parsing and compilation
  • deterministic trajectory intervention
  • observable runtime behavior
  • telemetry and inspection tools

This is the component you use to integrate Answer Engineering into applications.

ae_paper_reproduction

The research and evaluation layer.

It provides:

  • notebooks used in the paper
  • telemetry aggregation and reporting
  • reproducibility workflows
  • metric generation for the manuscript

You typically do not need this layer to use the runtime, but it is included so that all reported results can be reproduced exactly.


What works today

The current implementation supports rule-guided generation through a narrow public runtime API:

Current code-faithful documentation:

What this project is not

This repository is not currently a general-purpose agent framework, a production LLM serving platform, a broad prompt-engineering toolkit, or a generic safety moderation library.

Its core concern is controlled generation under explicit local rules.

Core runtime model

The canonical public call is GenerationRuntime.generate(request, policy).

The current execution path is described in Runtime model and Runtime entry points. At a high level:

GenerationRuntime.generate(...)
StreamSession.run()
GreedyDecoder.decode()
ExecutionSession.apply_step(...)
PlanRunner
GenerationResult

When rules are present, the runtime monitors the evolving answer, evaluates compiled rule plans, applies deterministic text edits, records telemetry, and continues generation from the edited state. This is not just post-processing: intervention happens during generation.

Rule system

Rules are authored in a compact Markdown-based domain-specific language and compiled into executable plans. The exact syntax is documented in Rule language reference, and practical authoring guidance is in Writing rules.

Rule families:

  • Replace — normalize protocol-critical terminology by replacing matched text with approved alternatives.
  • After — insert approved text after an anchor once the relevant concept has appeared.
  • Avoid — detect risky trajectories using prefix/postfix guards and redirect generation through fallback or probed continuations.
  • Force — enforce a required statement within a scope.

A minimal rule looks like this:

## Replace (once): sensorineural hearing loss
With:
- sudden sensorineural hearing loss

For the full grammar, modifiers, guard operators, scope syntax, options, and template expansion rules, see Rule language reference.

Minimal shape

from answer_engineering import GenerationRuntime, GenerationRequest, GenerationPolicy

runtime = GenerationRuntime(MODEL_ID)
answer = runtime.generate(
    GenerationRequest(question=QUESTION),
    GenerationPolicy(
        rules=RULES,
        system_prompt=SYSTEM_PROMPT,
    ),
)

Minimal story

Load a model, ask a question, and apply a ruleset during generation.

The ruleset defines local trajectory edits that are enforced while the answer is being produced. The resulting output reflects those enforced constraints and can be inspected together with the associated runtime telemetry.

Installation

For local development:

python -m pip install -U pip
python -m pip install -e ".[dev,hf]"

Then validate the repository:

./scripts/check

Contribution and validation details are in CONTRIBUTING.md. CI is defined in .github/workflows/ci.yml.

Reproducing the paper

The main reproduction entry point is:

Reproducibility documentation:

The reproduction layer emits structured artifacts such as evaluation summaries, telemetry summaries, paper tables, manifests, and generated TeX metrics. The current artifact flow is described in Paper artifacts.

Repository layout

Documentation map

Start here based on what you need:

Current status

This is an initial public implementation and research artifact. The core runtime is already meaningful, tested, and documented, but the repository is not architecturally finished.

The most accurate current-state summary is in Current codebase reality. In short: the runtime package has a relatively narrow public API and stronger subsystem boundaries, while the reproduction and paper layer remains more shaped by active research workflows.

Backward compatibility is not guaranteed.

Expected future development

Future work is expected in both architecture and capabilities.

Planned architectural directions include clearer runtime/reproduction boundaries, stronger extension seams, improved ownership of scoring and candidate-selection components, and continued convergence between documentation, tests, and implementation. See Current architecture, Codebase reality, and Extension points.

Planned capability directions include causal trajectory repair, alternative trajectory tracking, branch-aware scoring, uncertainty signaling, partial-history editing, and richer multi-rule protocol control. See Trajectory control vision and Functionality roadmap.

The long-term goal is not merely “more rules”. The target is a runtime layer that can identify where a protocol violation appeared, what earlier commitment caused it, which repairs are valid, whether multiple trajectories remain plausible, and how uncertainty should be surfaced.

Development validation

Before committing changes, run:

./scripts/check

This repository uses formatting, linting, type checking, convention checks, tests, and package-build validation. Details are in CONTRIBUTING.md, Golden snapshots, and the convention documents under conventions/.

Citation

If you use this repository as a research artifact, cite the paper:

Victor Lavrenko and Anastasiia Molodnitskaia.
Answer Engineering: Local Trajectory Editing for Protocol-Constrained Decision Making in Large Language Models.
2026.

Paper files:

License

MIT. See LICENSE.

About

Local trajectory editing for protocol-constrained decision making in large language models, with a reference implementation and reproducible paper results.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors