ToolMaker

LLM Agents Making Agent Tools

Georg Wölflein1,2 Dyke Ferber1,3 Daniel Truhn4 Ognjen Arandjelović2 Jakob N. Kather1,3,5

1 Else Kröner Fresenius Center for Digital Health, TU Dresden, Germany
2 School of Computer Science, University of St Andrews, UK
3 National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Germany
4 Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Germany
5 Department of Medicine I, University Hospital Dresden, Germany

Abstract

Tool use has turned large language models (LLMs) into powerful agents that can perform complex multi-step tasks by dynamically utilising external software components. However, these tools must be implemented in advance by human developers, hindering the applicability of LLM agents in domains which demand large numbers of highly specialised tools, like in life sciences and medicine. Motivated by the growing trend of scientific studies accompanied by public code repositories, we propose ToolMaker, a novel agentic framework that autonomously transforms papers with code into LLM-compatible tools. Given a short task description and a repository URL, ToolMaker autonomously installs required dependencies and generates code to perform the task, using a closed-loop self-correction mechanism to iteratively diagnose and rectify errors. To evaluate our approach, we introduce a benchmark comprising 15 diverse and complex computational tasks spanning both medical and non-medical domains with over 100 unit tests to objectively assess tool correctness and robustness. ToolMaker correctly implements 80% of the tasks, substantially outperforming current state-of-the-art software engineering agents. ToolMaker therefore is a step towards fully autonomous agent-based scientific workflows.

TLDR: We develop an agentic framework for autonomously creating LLM-compatible tools from papers with associated code repositories.

Citation

If you would like to cite our work, please use:

@misc{wolflein2025toolmaker,
  author        = {W\"{o}lflein, Georg and Ferber, Dyke and Truhn, Daniel and Arandjelovi\'{c}, Ognjen and Kather, Jakob Nikolas},
  title         = {{LLM} Agents Making Agent Tools},
  year          = {2025},
  eprint        = {2502.11705},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url           = {https://arxiv.org/abs/2502.11705}
}

Acknowledgements

We thank Junhao Liang, Michaela Unger, and David Charatan for contributing tasks to the benchmark. We also appreciate Jan Clusmann, Tim Lenz, and Lina Hadji-Kyriacou for their feedback on the manuscript, and are grateful to Annelies Blätterlein for designing the ToolMaker logo.