chore(deps): update container image docker.io/localai/localai to v2.16.0 by renovate #22420
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
v2.15.0-cublas-cuda11-ffmpeg-core
->v2.16.0-cublas-cuda11-ffmpeg-core
v2.15.0-cublas-cuda11-core
->v2.16.0-cublas-cuda11-core
v2.15.0-cublas-cuda12-ffmpeg-core
->v2.16.0-cublas-cuda12-ffmpeg-core
v2.15.0-cublas-cuda12-core
->v2.16.0-cublas-cuda12-core
v2.15.0-ffmpeg-core
->v2.16.0-ffmpeg-core
v2.15.0
->v2.16.0
Warning
Some dependencies could not be looked up. Check the Dependency Dashboard for more information.
Release Notes
mudler/LocalAI (docker.io/localai/localai)
v2.16.0
Compare Source
Welcome to LocalAI's latest update!
🎉🎉🎉 woot woot! So excited to share this release, a lot of new features are landing in LocalAI!!!!! 🎉🎉🎉
🌟 Introducing Distributed Llama.cpp Inferencing
Now it is possible to distribute the inferencing workload across different workers with llama.cpp models !
This feature has landed with https://github.com/mudler/LocalAI/pull/2324 and is based on the upstream work of @rgerganov in https://github.com/ggerganov/llama.cpp/pull/6829.
How it works: a front-end server manages the requests compatible with the OpenAI API (LocalAI) and workers (llama.cpp) are used to distribute the workload. This makes possible to run larger models split across different nodes!
How to use it
To start workers to offload the computation you can run:
However, you can also follow the llama.cpp README and building the rpc-server (https://github.com/ggerganov/llama.cpp/blob/master/examples/rpc/README.md), which is still compatible with LocalAI.
When starting the LocalAI server, which is going to accept the API requests, you can set a list of workers IP/address by specifying the addresses with
LLAMACPP_GRPC_SERVERS
:LLAMACPP_GRPC_SERVERS="address1:port,address2:port" local-ai run
At this point the workload hitting in the LocalAI server should be distributed across the nodes!
🤖 Peer2Peer llama.cpp
LocalAI is the first AI Free, Open source project offering complete, decentralized, peer2peer while private, LLM inferencing on top of the libp2p protocol. There is no "public swarm" to offload the computation, but rather empowers you to build your own cluster of local and remote machines to distribute LLM computation.
This feature leverages the ability of llama.cpp to distribute the workload explained just above and features from one of my other projects, https://github.com/mudler/edgevpn.
LocalAI builds on top of the twos, and allows to create a private peer2peer network between nodes, without the need of centralizing connections or manually configuring IP addresses: it unlocks totally decentralized, private, peer-to-peer inferencing capabilities. Works also behind different NAT-ted networks (uses DHT and mDNS as discovery mechanism).
How it works: A pre-shared token can be generated and shared between workers and the server to form a private, decentralized, p2p network.
You can see the feature in action here:
How to use it
--p2p
:A token is displayed, copy it and press enter.
You can re-use the same token later restarting the server with
--p2ptoken
(orP2P_TOKEN
).(Note you can also supply the token via args)
At this point, you should see in the server logs messages stating that new workers are found
Interested in to try it out? As we are still updating the documentation, you can read the full instructions here https://github.com/mudler/LocalAI/pull/2343
📜 Advanced Function calling support with Mixed JSON Grammars
LocalAI gets better at function calling with mixed grammars!
With this release, LocalAI introduces a transformative capability: support for mixed JSON BNF grammars. It allows to specify a grammar for the LLM that allows to output structured JSON and free text.
How to use it:
To enable mixed grammars, you can set in the
YAML
configuration filefunction.mixed_mode = true
, for example:This feature significantly enhances LocalAI's ability to interpret and manipulate JSON data coming from the LLM through a more flexible and powerful grammar system. Users can now combine multiple grammar types within a single JSON structure, allowing for dynamic parsing and validation scenarios.
Grammars can also turned off entirely and leave the user to determine how the data is parsed from the LLM to be correctly interpretated by LocalAI to be still compliant to the OpenAI REST spec.
For example, to interpret Hermes results, one can just annotate regexes in
function.json_regex_match
to extract the LLM response:Note that regex can still be used when enabling mixed grammars is enabled.
This is especially important for models which does not support grammars - such as transformers or OpenVINO models, that now can support as well function calling. As we update the docs, further documentation can be found in the PRs that you can find in the changelog below.
🚀 New Model Additions and Updates
Our model gallery continues to grow with exciting new additions like Aya-35b, Mistral-0.3, Hermes-Theta and updates to existing models ensuring they remain at the cutting edge.
This release is having major enhancements on tool calling support. Besides working on making our default models in AIO images more performant - now you can try an enhanced out-of-the-box experience with function calling in the Hermes model family ( Hermes-2-Pro-Mistral and Hermes-2-Theta-Llama-3)
Our LocalAI function model!
I have fine-tuned a function call model specific to leverage entirely the grammar support of LocalAI, you can find it in the model gallery already and on huggingface
🔄 Single Binary Release: Simplified Deployment and Management
In our continuous effort to streamline the user experience and deployment process, LocalAI v2.16.0 proudly introduces a single binary release. This enhancement, thanks to @sozercan's contributions, consolidates all variants (CUDA and non-cuda releases) and dependencies into one compact executable file.
This change simplifies the installation and update processes, reduces compatibility issues, and speeds up the setup for new users and existing deployments as now binary releases are even more portable than ever!
🔧 Bug Fixes and Improvements
A host of bug fixes have been implemented to ensure smoother operation and integration. Key fixes include enhancements to the Intel build process, stability adjustments for setuptools in Python backends, and critical updates ensuring the successful build of p2p configurations.
Migrating Python Backends: From Conda to UV
LocalAI has migrated its Python backends from Conda to UV. This transition, thanks to @cryptk contributions, enhances the efficiency and scalability of our backend operations. Users will experience faster setup times and reduced complexity, streamlining the development process and making it easier to manage dependencies across different environments.
📣 Let's Make Some Noise!
A gigantic THANK YOU to everyone who’s contributed—your feedback, bug squashing, and feature suggestions are what make LocalAI shine. To all our heroes out there supporting other users and sharing their expertise, you’re the real MVPs!
Remember, LocalAI thrives on community support—not big corporate bucks. If you love what we're building, show some love! A shoutout on social (@LocalAI_OSS and @mudler_it on twitter/X), joining our sponsors, or simply starring us on GitHub makes all the difference.
Also, if you haven't yet joined our Discord, come on over! Here's the link: https://discord.gg/uJAeKSAGDy
Thanks a ton, and.. enjoy this release!
What's Changed
Bug fixes 🐛
Exciting New Features 🎉
flash_attention
andno_kv_offloading
by @mudler in https://github.com/mudler/LocalAI/pull/2310🧠 Models
📖 Documentation and examples
👒 Dependencies
Other Changes
New Contributors
Full Changelog: mudler/LocalAI@v2.15.0...v2.16.0
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about these updates again.
This PR has been generated by Renovate Bot.