Feature/prompt compression by Ingvarstep · Pull Request #345 · urchade/GLiNER

Ingvarstep · 2026-04-15T10:52:53Z

Summary

Introduces prompt compression for uni-encoder GLiNER models (span, token, and relation-extraction variants). For a fixed label set, prompt embeddings are computed once and reused at inference — the encoder no longer re-processes the <>label1<>...<> prefix on every call, shortening the input sequence and cutting attention cost for a meaningful speedup, especially for short sequences.

What's included

BaseGLiNER.compress_prompt_embeddings(texts, labels, rel_labels=None, batch_size=8, ...) — runs a forward pass over a calibration corpus, extracts the pre-projection <> (and optionally <>) token representations, averages them per label, and stores the resulting (L, D) matrix as a non-trainable parameter on the underlying model.
Precomputed inference path — config.precomputed_prompts_mode=True switches forward / predict_entities to look up the stored embeddings instead of prepending label tokens. State travels through state_dict, so save_pretrained / from_pretrained round-trip the compressed model automatically.
Relation-extraction support — pass rel_labels=... to compress <> prompts for relex models alongside entities.
End-to-end distillation (opt-in) — distill=True makes compress_prompt_embeddings a one-call pipeline: the pre-compression model first generates pseudo-labels over texts, prompts are compressed, and the compressed model is fine-tuned on those pseudo-labels to recover the small accuracy drop from averaging. Exposes distill_threshold, distill_epochs,
distill_lr, distill_batch_size, distill_output_dir, and distill_train_kwargs for control.
Benchmark script — benchmarks/eval_compressed_biomed.py compares raw vs. compressed (optionally distilled) GLiNER on knowledgator/biomed_NER, reporting F1, latency, and speedup.
Docs — docs/usage.md gains a "⚡ Prompt Compression" section covering the basic flow, relation extraction, and end-to-end distillation.

Usage

 model = GLiNER.from_pretrained("urchade/gliner_small-v2.1")                                                                                                                           
                                                                                                                                                                                       
 # One-call compression + distillation                                                                                                                                                 
 model.compress_prompt_embeddings(
     texts=calibration_texts,                                                                                                                                                          
     labels=["person", "organization", "location", "date"],
     batch_size=16,                                                                                                                                                                    
     distill=True,                       
     distill_epochs=3,
 )                                                                                                                                                                                     
                                             
 model.save_pretrained("./gliner-compressed")

Trade-offs

Label set becomes fixed per compressed model — adding labels requires re-running compression.
Prompt averaging loses some context sensitivity; enabling distill=True typically recovers it.
Applies to uni-encoder variants only (bi-encoder already uses a separate label encoder).

Ingvarstep added 4 commits April 15, 2026 12:35

implement prompts compression technique

4181dd4

add benchmarking of prompt compression

ac110ee

add direct distilation inside prompt embeddings compression

ab32183

update docs on prompts compression

94894e1

Ingvarstep merged commit 7d87fd4 into main Apr 18, 2026

Ingvarstep deleted the feature/prompt_compression branch April 18, 2026 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/prompt compression#345

Feature/prompt compression#345
Ingvarstep merged 4 commits into
mainfrom
feature/prompt_compression

Ingvarstep commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ingvarstep commented Apr 15, 2026

Summary

What's included

Usage

Trade-offs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant