Skip to content

Feature/prompt compression#345

Merged
Ingvarstep merged 4 commits into
mainfrom
feature/prompt_compression
Apr 18, 2026
Merged

Feature/prompt compression#345
Ingvarstep merged 4 commits into
mainfrom
feature/prompt_compression

Conversation

@Ingvarstep
Copy link
Copy Markdown
Collaborator

Summary

Introduces prompt compression for uni-encoder GLiNER models (span, token, and relation-extraction variants). For a fixed label set, prompt embeddings are computed once and reused at inference — the encoder no longer re-processes the <>label1<>...<> prefix on every call, shortening the input sequence and cutting attention cost for a meaningful speedup, especially for short sequences.

What's included

  • BaseGLiNER.compress_prompt_embeddings(texts, labels, rel_labels=None, batch_size=8, ...) — runs a forward pass over a calibration corpus, extracts the pre-projection <> (and optionally <>) token representations, averages them per label, and stores the resulting (L, D) matrix as a non-trainable parameter on the underlying model.
  • Precomputed inference path — config.precomputed_prompts_mode=True switches forward / predict_entities to look up the stored embeddings instead of prepending label tokens. State travels through state_dict, so save_pretrained / from_pretrained round-trip the compressed model automatically.
  • Relation-extraction support — pass rel_labels=... to compress <> prompts for relex models alongside entities.
  • End-to-end distillation (opt-in) — distill=True makes compress_prompt_embeddings a one-call pipeline: the pre-compression model first generates pseudo-labels over texts, prompts are compressed, and the compressed model is fine-tuned on those pseudo-labels to recover the small accuracy drop from averaging. Exposes distill_threshold, distill_epochs,
    distill_lr, distill_batch_size, distill_output_dir, and distill_train_kwargs for control.
  • Benchmark script — benchmarks/eval_compressed_biomed.py compares raw vs. compressed (optionally distilled) GLiNER on knowledgator/biomed_NER, reporting F1, latency, and speedup.
  • Docs — docs/usage.md gains a "⚡ Prompt Compression" section covering the basic flow, relation extraction, and end-to-end distillation.

Usage

 model = GLiNER.from_pretrained("urchade/gliner_small-v2.1")                                                                                                                           
                                                                                                                                                                                       
 # One-call compression + distillation                                                                                                                                                 
 model.compress_prompt_embeddings(
     texts=calibration_texts,                                                                                                                                                          
     labels=["person", "organization", "location", "date"],
     batch_size=16,                                                                                                                                                                    
     distill=True,                       
     distill_epochs=3,
 )                                                                                                                                                                                     
                                             
 model.save_pretrained("./gliner-compressed")                                                                                                                                          

Trade-offs

  • Label set becomes fixed per compressed model — adding labels requires re-running compression.
  • Prompt averaging loses some context sensitivity; enabling distill=True typically recovers it.
  • Applies to uni-encoder variants only (bi-encoder already uses a separate label encoder).

@Ingvarstep Ingvarstep merged commit 7d87fd4 into main Apr 18, 2026
@Ingvarstep Ingvarstep deleted the feature/prompt_compression branch April 18, 2026 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant