Skip to content

olla-v0.0.19

Choose a tag to compare

@github-actions github-actions released this 09 Oct 23:40
· 445 commits to main since this release
1b9ffd6

This release has several performance fixes (noticeably uplift for ARM), critical fixes for all archs and adds support for sglang and LemonadeSDK.

Encourage all to upgrade to this release.

What's Changed

Full Changelog: v0.0.18...v0.0.19

Changelog

  • 554b2fa GetHealthyEndpointsForModel could leak targets that no longer exist.
  • 4d3e12d adds parser
  • dcf3c52 adds the parser and converter
  • 267dcd2 atomic catalog store
  • 716e57f avoid alloc on response times
  • 203ce4a cleanup
  • 9cb11c9 constants for linting, will add more later
  • c7a7fc9 doc refresh
  • 7aeb09f documentation
  • 6748a50 documentation updates
  • c688fce factory too
  • 6ab4a15 fixed warnings and missed sglang reference
  • 16fa9d5 handler bits
  • ccc8f58 hotpath: reduce allocations
  • e2be222 initial SGLang work
  • 4d3d3e4 initial configuration based on what's available
  • 12a7d14 initial lemonade bits
  • 1d65097 note about format
  • 1b9ffd6 openai
  • c091490 perf: avoid resolvereference call if endpoint URL has no path
  • 985d8eb perf: avoid GC pressure and preallocate
  • fbaece8 perf: reduce string allocations
  • dcb9050 race fix: method instead of module level
  • e012a30 reduce hashing and allocations
  • 21de3da refactor and slightly different way to infer capabilities
  • 3b19336 refactor to use benchmark
  • 77a4b8c refeactor test
  • 53e83a6 rune fix
  • 3fcf132 slightly more complex fix to improve allocations in unified memory registry
  • 319f442 update docs and make supported backends a table.
  • 35b6cab update readme
  • 837dc42 use map rather than MapOf (deprecated)
  • f9e8a69 wire up handler too and initial profile