Skip to content

vastava/InferLength

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InferLength

InferLength is an installable Python package for static output-length prediction on local LLM inference workloads. It leverages on-the-fly activations and token entropy for highly accurate prediction with negligible cost, as introduced in Predicting LLM Output Length via Entropy-Guided Representations

The package supports:

  • training on the ForeLen benchmark, custom prompt data, or merged datasets
  • length prediction from a saved model artifact on disk, without retraining
  • a generic decorator that can attach predictions and local runtime logging to functions

The paper's Progressive Length Prediction method remains out of scope in this repository. The implementation here is static EGTP only.

Installation

  • Python 3.9+
  • PyTorch 2.4+

Install directly from the public repo:

pip install git+https://github.com/vastava/InferLength.git

For local development:

git clone https://github.com/vastava/InferLength.git
cd InferLength
pip install -e .

CLI

Fetch and normalize a ForeLen config:

egtp dataset fetch-forelen \
  --config-name qwen2.5-0.5b-longseq \
  --cache-dir .cache/forelen \
  --output outputs/forelen_qwen_longseq.jsonl

Normalize proprietary data:

egtp dataset normalize \
  --custom-file my_data.csv \
  --prompt-column prompt \
  --response-column response \
  --model-id /models/my-local-llm \
  --output outputs/my_data.jsonl

Train a predictor from ForeLen:

egtp train \
  --data-mode forelen \
  --forelen-config qwen2.5-0.5b-longseq \
  --cache-dir .cache/forelen \
  --model-id Qwen/Qwen2.5-0.5B-Instruct \
  --artifact-dir outputs/qwen-longseq-artifact

Train a predictor from proprietary data:

egtp train \
  --data-mode custom \
  --model-id /models/my-local-llm \
  --normalized-data outputs/my_data.jsonl \
  --artifact-dir outputs/my-artifact

Predict from a saved artifact:

egtp predict \
  --artifact-dir outputs/my-artifact \
  --prompt "Summarize the attached report in five bullets."

Python API

from egtp import EGTPDataset, EGTPTrainer, TrainConfig, TrainedPredictor, with_length_prediction

dataset = EGTPDataset.load_custom(
    "my_data.csv",
    column_map={"prompt_text": "prompt", "response_text": "response"},
    model_id_or_path="/models/my-local-llm",
)

trainer = EGTPTrainer(device="auto")
predictor = trainer.fit(
    dataset=dataset,
    model_id_or_path="/models/my-local-llm",
    train_config=TrainConfig(epochs=20, patience=3),
)
predictor.save("artifacts/my-local-llm")

loaded = TrainedPredictor.load("artifacts/my-local-llm")

@with_length_prediction(
    predictor=loaded,
    prompt_extractor=lambda args, kwargs: kwargs["prompt"],
    response_extractor=lambda result: result["text"],
    log_path="outputs/runtime_log.jsonl",
)
def run_inference(*, prompt: str):
    return {"text": "hello world"}

Notes

  • ForeLen is downloaded on demand from Hugging Face and is not bundled into the wheel.
  • Custom rows must provide prompt text and either response text or numeric target length.
  • Target length is defined as output token count under the training model tokenizer.

Example Usage Notebooks

Citation

@inproceedings{xie2026predicting,
  title={Predicting LLM Output Length via Entropy-Guided Representations},
  author={Xie, Huanyi and Chen, Yubin and Wang, Liangyu and Hu, Lijie and Wang, Di},
  booktitle={International Conference on Learning Representations},
  year={2026}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors