🌐 WebThinker: Empowering Large Reasoning Models with Deep Research Capability
-
Updated
May 30, 2025 - Python
🌐 WebThinker: Empowering Large Reasoning Models with Deep Research Capability
Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.
Ollama负载均衡服务器 | 一款高性能、易配置的开源负载均衡服务器,优化Ollama负载。它能够帮助您提高应用程序的可用性和响应速度,同时确保系统资源的有效利用。
Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache
A user-friendly Command-line/SDK tool that makes it quickly and easier to deploy open-source LLMs on AWS
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models
Breaking long thought processes of o1-like LLMs, such as DeepSeek-R1, QwQ
Add a description, image, and links to the qwq topic page so that developers can more easily learn about it.
To associate your repository with the qwq topic, visit your repo's landing page and select "manage topics."