🛠️ Awesome LMs with Tools

Language models (LMs) are powerful yet mostly for text-generation tasks. Tools have substantially enhanced their performance for tasks that require complex skills.

Based on our recent survey about LM-used tools, "What Are Tools Anyway? A Survey from the Language Model Perspective", we provide a structured list of literature relevant to tool-augmented LMs.

Tool basics ($\S2$)
Tool use paradigm ($\S3$)
Scenarios ($\S4$)
Advanced methods ($\S5$)
Evaluation ($\S6$)

If you find our paper or code useful, please cite the paper:

@article{wang2022what,
  title={What Are Tools Anyway? A Survey from the Language Model Perspective},
  author={Zhiruo Wang, Zhoujun Cheng, Hao Zhu, Daniel Fried, Graham Neubig},
  journal={arXiv preprint arXiv:2403.15452},
  year={2024}
}

$\S2$ Tool Basics

$\S2.1$ What are tools? 🛠️

Definition and discussion of animal-used tools

Animal tool behavior: the use and manufacture of tools by animals Shumaker, Robert W., Kristina R. Walkup, and Benjamin B. Beck. 2011 [Book]
Early discussions on LM-used tools

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Qin, Yujia, et al. 2023.07 [Paper]
A survey on augmented LMs, including tool augmentation

Augmented Language Models: a Survey Mialon, Grégoire, et al. 2023.02 [Paper]

$\S2.3$ Tools and "Agents" 🤖

Definition of agents

Artificial intelligence a modern approach Russell, Stuart J., and Peter Norvig. 2016 [Book]
Survey about agents that perceive and act in the environment

The Rise and Potential of Large Language Model Based Agents: A Survey Xi, Zhiheng, et al. 2023.09 [Preprint]
Survey about the cognitive architectures for language agents

Cognitive Architectures for Language Agents Sumers, Theodore R., et al. 2023.09 [Paper]

$\S3$ The basic tool use paradigm

Early works that set up the commonly used tooling paradigm

Toolformer: Language Models Can Teach Themselves to Use Tools Schick, Timo, et al. 2024 [Paper]

Inference-time prompting

Provide in-context examples for tool-using on visual programming problems

Visual Programming: Compositional visual reasoning without training Gupta, Tanmay, and Aniruddha Kembhavi. 2023 [Paper]
Tool learning via in-context examples on reasoning problems involving text or multi-modal inputs

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models Lu, Pan, et al. 2024 [Paper]
In-context learning based tool using for reasoning problems in BigBench and MMLU

ART: Automatic multi-step reasoning and tool-use for large language models Paranjape, Bhargavi, et al. 2023.03 [Preprint]
Providing tool documentation for in-context tool learning

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models Hsieh, Cheng-Yu, et al. 2023.08 [Preprint]

Learning by training

Training on human annotated examples of (NL input, tool-using solution output) pairs

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs Li, Minghao, et al. 2023.12 [Paper]

Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems Kadlčík, Marek, et al. 2023 [Paper]
Training on model-synthesized examples

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang, Qiaoyu, et al. 2023.06 [Preprint]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Qin, Yujia, et al. 2023.07 [Paper]

MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use Huang, Yue, et al. 2023.10 [Paper]

Making Language Models Better Tool Learners with Execution Feedback Qiao, Shuofei, et al. 2023.05 [Preprint]

LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error Wang, Boshi, et al. 2024.03 [Preprint]
Self-training with bootstrapped examples

Toolformer: Language Models Can Teach Themselves to Use Tools Schick, Timo, et al. 2024 Paper

$\S4$ Scenarios

Knowledge access 📚

Collect data from structured knowledge sources, e.g., databases, knowledge graphs, etc.

LaMDA: Language Models for Dialog Applications Thoppilan, Romal, et al. 2022.01 [Paper]

TALM: Tool Augmented Language Models Parisi, Aaron, Yao Zhao, and Noah Fiedel. 2022.05 [Preprint]

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings Hao, Shibo, et al. 2024 [Paper]

ToolQA: A Dataset for LLM Question Answering with External Tools Zhuang, Yuchen, et al. 2024 [Paper]

Middleware for LLMs: Tools are Instrumental for Language Agents in Complex Environments Gu, Yu, et al. 2024 [Paper]

GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information Jin, Qiao, et al. 2024 [Paper]
Search information from the web

Internet-augmented language models through few-shot prompting for open-domain question answering Lazaridou, Angeliki, et al. 2022.03 [Paper]

Internet-Augmented Dialogue Generation Komeili, Mojtaba, Kurt Shuster, and Jason Weston. 2022 [Paper]
Viewing retrieval models as tools under the retrieval-augmented generation context

Retrieval-based Language Models and Applications Asai, Akari, et al. 2023 [Tutorial]

Augmented Language Models: a Survey Mialon, Grégoire, et al. 2023.02 [Paper]

Computation activities 🔣

Using calculator for math calculations

Toolformer: Language Models Can Teach Themselves to Use Tools Schick, Timo, et al. 2024 [Paper]

Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems Kadlčík, Marek, et al. 2023 [Paper]
Using programs/Python interpreter to perform more complex operations

Pal: Program-aided language models Gao, Luyu, et al. 2023 [Paper]

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks Chen, Wenhu, et al. 2022.11 [Paper]

Mint: Evaluating llms in multi-turn interaction with tools and language feedback Wang, Xingyao, et al. 2023.09 [Paper]

MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning Das, Debrup, et al. 2024 [Paper]

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Gou, Zhibin, et al. 2023.09 [Paper]
Tools for more advanced business activities, e.g., financial, medical, education, etc.

On the Tool Manipulation Capability of Open-source Large Language Models Xu, Qiantong, et al. 2023.05 [Paper]

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang, Qiaoyu, et al. 2023.06 [Preprint]

Mint: Evaluating llms in multi-turn interaction with tools and language feedback Wang, Xingyao, et al. 2023.09 [Paper]

AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning Jin, Qiao, et al. 2024.02 [Paper]

Interaction with the world 🌐

Access real-time or real-world information such as weather, location, etc.

On the Tool Manipulation Capability of Open-source Large Language Models Xu, Qiantong, et al. 2023.05 [Paper]

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang, Qiaoyu, et al. 2023.06 [Preprint]
Managing personal events such as calendar or emails

Toolformer: Language Models Can Teach Themselves to Use Tools Schick, Timo, et al. 2024 [Paper]
Tools in embodied environments, e.g., the Minecraft world

Voyager: An Open-Ended Embodied Agent with Large Language Models Wang, Guanzhi, et al. 2023.05 [Paper]
Tools interacting with the physical world

ProgPrompt: Generating Situated Robot Task Plans using Large Language Models Singh, Ishika, et al. 2023 [Paper]

Alfred: A benchmark for interpreting grounded instructions for everyday tasks Shridhar, Mohit, et al. 2020 [Paper]

Autonomous chemical research with large language models Boiko, Daniil A., et al. 2023 [Paper]

Non-textual modalities 🎞️

Tools providing access to information in non-textual modalities

Vipergpt: Visual inference via python execution for reasoning Surís, Dídac, Sachit Menon, and Carl Vondrick. 2023 [Paper]

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action Yang, Zhengyuan, et al. 2023.03 [Preprint]

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn Gao, Difei, et al. 2023.06 [Preprint]
Tools that can answer questions about data in other modalities

Visual Programming: Compositional visual reasoning without training Gupta, Tanmay, and Aniruddha Kembhavi. 2023 [Paper]

Special-skilled models 🤗

Text-generation models that can perform specific tasks, e.g., question answering, machine translation

Toolformer: Language Models Can Teach Themselves to Use Tools Schick, Timo, et al. 2024 [Paper]

ART: Automatic multi-step reasoning and tool-use for large language models Paranjape, Bhargavi, et al. 2023.03 [Preprint]
Integration of available models on Huggingface, TorchHub, TensorHub, etc.

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face Shen, Yongliang, et al. 2024 [Paper]

Gorilla: Large language model connected with massive apis Patil, Shishir G., et al. 2023.05 [Paper]

Taskbench: Benchmarking large language models for task automation Shen, Yongliang, et al. 2023.11 [Paper]

$\S5$ Advanced methods

$\S5.1$ Complex tool selection and usage 🧐

Train retrievers that map natural language instructions to tool documentation

DocPrompting: Generating Code by Retrieving the Docs Zhou, Shuyan, et al. 2022.07 [Paper]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Qin, Yujia, et al. 2023.07 [Paper]
Ask LMs to write hypothetical tool descriptions and search relevant tools

CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets Yuan, Lifan, et al. 2023.09 [Paper]
Complex tool usage, e.g., parallel calls

Function Calling and Other API Updates Eleti, Atty, et al. 2023.06 [Blog]

An LLM Compiler for Parallel Function Calling Kim, Sehoon, et al. 2023.12 [Paper]

$\S5.2$ Tools in programmatic contexts 👩‍💻

Domain-specific logical forms to query structured data

Semantic parsing on freebase from question-answer pairs Berant, Jonathan, et al. 2013 [Paper]

Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task Yu, Tao, et al. 2018.09 [Paper]

Break It Down: A Question Understanding Benchmark Wolfson, Tomer, et al. 2020 [Paper]
Domain-specific actions for agentic tasks such as web navigation

Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration Liu, Evan Zheran, et al. 2018.02 [Paper]

WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents Yao, Shunyu, et al. 2022.07 [Paper]

Webarena: A realistic web environment for building autonomous agents Zhou, Shuyan, et al. 2023.07 [Paper]
Using external Python libraries as tools

ToolCoder: Teach Code Generation Models to use API search tools Zhang, Kechi, et al. 2023.05 [Paper]
Using expert designed functions as tools to answer questions about images

Visual Programming: Compositional visual reasoning without training Gupta, Tanmay, and Aniruddha Kembhavi. 2023 [Paper]

Vipergpt: Visual inference via python execution for reasoning Surís, Dídac, Sachit Menon, and Carl Vondrick. 2023 [Paper]
Using GPT as a tool to query external Wikipedia knowledge for table-based question answering

Binding Language Models in Symbolic Languages Cheng, Zhoujun, et al. 2022.10 [Paper]
Incorporate QA API and operation APIs to assist table-based question answering

API-Assisted Code Generation for Question Answering on Varied Table Structures Cao, Yihan, et al. 2023.12 [Paper]

$\S5.3$ Tool creation and reuse 👩‍🔬

Approaches to abstract libraries for domain-specific logical forms from a large corpus

DreamCoder: growing generalizable, interpretable knowledge with wake--sleep Bayesian program learning Ellis, Kevin, et al. 2020.06 [Paper]

Leveraging Language to Learn Program Abstractions and Search Heuristics] Wong, Catherine, et al. 2021 [Paper]

Top-Down Synthesis for Library Learning Bowers, Matthew, et al. 2023 [Paper]

LILO: Learning Interpretable Libraries by Compressing and Documenting Code Grand, Gabriel, et al. 2023.10 [Paper]
Make and learn skills (Java programs) in the embodied Minecraft world

Voyager: An Open-Ended Embodied Agent with Large Language Models Wang, Guanzhi, et al. 2023.05 [Paper]
Leverage LMs as tool makers on BigBench tasks

Large Language Models as Tool Makers Cai, Tianle, et al. 2023.05 [Preprint]
Create tools for math and table QA tasks by example-wise tool making

CREATOR: Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation Qian, Cheng, et al. 2023.05 [Paper]
Make tools via heuristic-based training and tool deduplication

CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets Yuan, Lifan, et al. 2023.09 [Paper]
Learning tools by refactoring a small amount of programs

ReGAL: Refactoring Programs to Discover Generalizable Abstractions Stengel-Eskin, Elias, Archiki Prasad, and Mohit Bansal. 2024.01 [Preprint]
A training-free approach to make tools via execution consistency

🎁 TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks Wang, Zhiruo, Daniel Fried, and Graham Neubig. 2024.01 [Preprint]

$\S6$ Evaluation: Testbeds

$\S6.1.1$ Repurposed existing datasets

Datasets that require reasoning over texts

Measuring Mathematical Problem Solving With the MATH Dataset Hendrycks, Dan, et al. 2021.03 [Paper]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models Srivastava, Aarohi, et al. 2022.06 [Paper]
Datasets that require reasoning over structured data, e.g., tables

Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning Lu, Pan, et al. 2022.09 [Paper]

Compositional Semantic Parsing on Semi-Structured Tables Pasupat, Panupong, and Percy Liang. 2015 [Paper]

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation Cheng, Zhoujun, et al. 2022 [Paper]
Datasets that require reasoning over other modalities, e.g., images and image pairs

Gqa: A new dataset for real-world visual reasoning and compositional question answering Hudson, Drew A., and Christopher D. Manning. 2019.02 [Paper]

A Corpus for Reasoning about Natural Language Grounded in Photographs Suhr, Alane, et al. 2019 [Paper]
Example datasets that require retriever model (tool) to solve

Natural Questions: A Benchmark for Question Answering Research Kwiatkowski, Tom, et al. 2019 [Paper]

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension Joshi, Mandar, et al. 2017 [Paper]

$\S6.1.2$ Aggregated API benchmarks

Collect RapidAPIs and use models to synthesize examples for evaluation

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Qin, Yujia, et al. 2023.07 [Paper]
Collect APIs from PublicAPIs and use models to synthesize examples

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang, Qiaoyu, et al. 2023.06 [Preprint]
Collect APIs from PublicAPIs and manually annotate examples for evaluation

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs Li, Minghao, et al. 2023.12 [Paper]
Collect APIs from OpenAI plugin list and use models to synthesize examples

MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use Huang, Yue, et al. 2023.10 [Paper]
Collect neural model tools from Huggingface hub, TorchHub, and TensorHub

Gorilla: Large language model connected with massive apis Patil, Shishir G., et al. 2023.05 [Paper]
Collect neural model tools from Huggingface

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face Shen, Yongliang, et al. 2024 [Paper]
Collect tools from Huggingface and PublicAPIs

Taskbench: Benchmarking large language models for task automation Shen, Yongliang, et al. 2023.11 [Paper]
Collect Action Sequences in real-world macOS/iPadOS/iOS.

ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents Shen, Haiyang, et al. 2024.07 [Paper]

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛠️ Awesome LMs with Tools

$\S2$ Tool Basics

$\S2.1$ What are tools? 🛠️

$\S2.3$ Tools and "Agents" 🤖

$\S3$ The basic tool use paradigm

Inference-time prompting

Learning by training

$\S4$ Scenarios

Knowledge access 📚

Computation activities 🔣

Interaction with the world 🌐

Non-textual modalities 🎞️

Special-skilled models 🤗

$\S5$ Advanced methods

$\S5.1$ Complex tool selection and usage 🧐

$\S5.2$ Tools in programmatic contexts 👩‍💻

$\S5.3$ Tool creation and reuse 👩‍🔬

$\S6$ Evaluation: Testbeds

$\S6.1.1$ Repurposed existing datasets

$\S6.1.2$ Aggregated API benchmarks

About

Releases

Packages

Contributors 8

zorazrw/awesome-tool-llm

Folders and files

Latest commit

History

Repository files navigation

🛠️ Awesome LMs with Tools

$\S2$ Tool Basics

$\S2.1$ What are tools? 🛠️

$\S2.3$ Tools and "Agents" 🤖

$\S3$ The basic tool use paradigm

Inference-time prompting

Learning by training

$\S4$ Scenarios

Knowledge access 📚

Computation activities 🔣

Interaction with the world 🌐

Non-textual modalities 🎞️

Special-skilled models 🤗

$\S5$ Advanced methods

$\S5.1$ Complex tool selection and usage 🧐

$\S5.2$ Tools in programmatic contexts 👩‍💻

$\S5.3$ Tool creation and reuse 👩‍🔬

$\S6$ Evaluation: Testbeds

$\S6.1.1$ Repurposed existing datasets

$\S6.1.2$ Aggregated API benchmarks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Packages