AI Agents are these days facilitated by libraries and low-code platforms. AI agents are rightly called digital workers. Generative AI acquires agentic nature by tool calling — models’ abilities are extended beyond conversations. Tools are executed (functions) so as to take action on behalf of the user. The complex problem is thus tackled. It is a multi-step problem requiring sound decision-making by interacting with a number of external data sources.
There are two vital expressions of reasoning — reasoning through evaluation and planning and reasoning by employing tools.
In evaluation and planning, an agent splits up the problem by planning iteratively. It assesses progress at each step, adapts its approach to the task.
It can use Chain-of-Thought (CoT) technique, ReAct technique and Prompt Decomposition.
All these techniques enhance model’s ability to break down a problem into smaller components and to address these. It works iteratively. It takes into account results from each stage.
There is reasoning through tool use. Here the agent interacts with its environment. It has to choose a tool. These tools retrieve the data. They execute code and call APIs. The success depends upon proper use of tool calls.
It is not always necessary to combine these two to create solutions. OpenAI’s o1 model is good at reasoning through evaluation and planning. It uses CoT. It has scored high on Codeforces. It generates text-based responses. It currently lacks explicit tool calling abilities, though the responses could suggest tools based on their descriptions.
Many other models are fine-tuned for tool calling. They generate function calls and interact with APIs. The Berkeley Function Calling Leaderboard (BFCL) compares different models on tool calling tasks.
Both types of reasoning are good enough independently. When combined, they create agents that can effectively breakdown complicated tasks and autonomously interact with the environment.
If there are too many demands on a single agent, it can overwhelm it. There could be a collection of many agents and prompts working behind the scene collectively to complete the task.
There should be proper tool selection. Most models use JSON format for tool calling. There are other formats such as YAML or XML. Regardless of the format, the model needs to include appropriate parameters for each tool call. The dataset used should be diverse and cover complexity of multi-step multi-turn function calling.