What's Changed
- Back on nix main. by @Narsil in #2979
- hotfix: fix trtllm CI build on release by @Hugoch in #2981
- Add
strftime_now
callable function forminijinja
chat templates by @alvarobartt in #2983 - impureWithCuda: fix gcc version by @danieldk in #2990
- Improve qwen vl impl by @drbh in #2943
- Using the "lockfile". by @Narsil in #2992
- Triton fix by @sywangyi in #2995
- [Backend] Bump TRTLLM to v.0.17.0 by @mfuntowicz in #2991
- Updating mllama after strftime. by @Narsil in #2993
- Use kernels from the kernel hub by @danieldk in #2988
- fix Qwen VL break in intel platform by @sywangyi in #3002
- Update the flaky mllama test. by @Narsil in #3015
- Preventing single user hugging the server to death by asking by @Narsil in #3016
- Putting back the NCCL forced upgrade. by @Narsil in #2999
- Support sigmoid scoring function in GPTQ-MoE by @danieldk in #3017
- [Backend] Add Llamacpp backend by @angt in #2975
- Use eetq kernel from the hub by @danieldk in #3029
- Update README.md by @celsowm in #3024
- Add
loop_controls
feature tominijinja
to handle{% break %}
by @alvarobartt in #2998 - Pinning trufflehog. by @Narsil in #3032
- It's find in some machine. using hf_hub::api::sync::Api to download c… by @Narsil in #3030
- Improve Transformers support by @Cyrilvallez in #2970
- feat: add initial qwen2.5-vl model and test by @drbh in #2971
- Using public external registry (to use external runners for CI). by @Narsil in #3031
- Having less logs in case of failure for checking CI more easily. by @Narsil in #3037
- feat: Add the parsing of HF_HUB_USER_AGENT_ORIGIN environment variable for telemetry by @Hugoch in #3027
- update ipex and torch to 2.6 for cpu by @sywangyi in #3039
- flashinfer 0.2.0.post1 -> post2 by @danieldk in #3040
- fix qwen2 vl crash in continous batching by @sywangyi in #3004
- Simplify logs2. by @Narsil in #3045
- Update Gradio ChatInterface configuration in consuming_tgi.md by @angt in #3042
- Improve tool call message processing by @drbh in #3036
- Use
rotary
kernel from the Hub by @danieldk in #3041 - Add Neuron backend by @dacorvo in #3033
- You need to seek apparently. by @Narsil in #3049
- some minor fix by @sywangyi in #3048
- fix: run linters and fix formatting by @drbh in #3057
- Avoid running neuron integration tests twice by @dacorvo in #3054
- Add Gaudi Backend by @baptistecolle in #3055
- Fix two edge cases in
RadixTrie::find
by @danieldk in #3067 - Add property-based testing for
RadixAllocator
by @danieldk in #3068 - feat: add support for HF_HUB_USER_AGENT_ORIGIN to add user-agent Origin field in Hub requests. by @Hugoch in #3061
- Preparing for release. by @Narsil in #3060
- Fix a tiny typo in
monitoring.md
tutorial by @sadra-barikbin in #3056 - Patch rust release. by @Narsil in #3069
New Contributors
- @angt made their first contribution in #2975
- @celsowm made their first contribution in #3024
- @dacorvo made their first contribution in #3033
- @sadra-barikbin made their first contribution in #3056
Full Changelog: v3.1.0...v3.1.1