CodeAct

CodeAct#

Note

LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space and restricted flexibility. This work proposes to use executable Python code to consolidate LLM agents’ actions into a unified action space (CodeAct).

What is CodeAct?#

In Fig. 2, we first introduce a general multi-turn interaction framework for LLM agents’ real-world usage that considers three roles: agent, user, and environment. CodeAct employs Python code to consolidate all actions for agent-environment interaction. In CodeAct, each emitted action to the environment is a piece of Python code, and the agent will receive outputs of code execution (e.g., results, errors) as observation.

CodeAct Shows the Promise as a Strong Tool Use Framework#

Setup. We re-purpose API-Bank and test LLMs’ API-calling performance. For each evaluation instance, we instruct LLM to generate one atomic tool call in the format of a Python function call, JSON object, or text expression in a pre-defined format.

Results. For most LLMs, CodeAct achieves comparable or better performance even in atomic actions (the simplistic tool use scenario).

CodeAct Gets More Done with Fewer Interactions#

M3ToolEval. To the best of our knowledge, no existing tool-use benchmarks contain complex tasks requiring the composition of multiple tools while supporting evaluating different action formats. Hence, we curate a benchmark M3ToolEval to fill this gap.

Setup. We allow the model to generate fully functional Python code that enables control and data flow (e.g., ifstatement, for-loop). Within each turn, the model can either emit an action or propose an answer to be verified by an exact match with the ground-truth solution. The interaction will terminate when a maximum of 10 interaction turns are reached or a correct solution has been submitted.

Results. CodeAct generally has a higher task success rate. Moreover, using CodeAct requires a lower average number of turns.

CodeAct Benefits from Multi-turn Interactions and Existing Software Packages#

Thanks to its extensive knowledge of Python learned during pre-training, the LLM agent can automatically import the correct Python libraries to solve tasks without requiring user-provided tools or demonstrations.