What is the difference between function calling and tool use?

They refer to the same concept under different API terminology. OpenAI originally called it "function calling" and later added the broader "tools" concept (which also includes code_interpreter and file_search). Anthropic calls it "tool use". Both work by declaring a schema for callable functions that the model can request.

Can I give the model access to a whole API with many endpoints?

Yes, but declaring too many tools degrades performance. Keep the tool list to 10-15 tools per request. For applications with many available tools, use a router pattern: first ask the model which tool category is needed, then provide only the tools in that category for the actual task.

How do I handle streaming with function calling?

Stream the model's response and buffer tool_call delta events. The function name and arguments are streamed incrementally. Once the stream is complete, execute the tool calls, then start a new streaming request with the tool results. You cannot execute tool calls until the function name and all arguments are fully streamed.

What is the difference between function calling and tool use?

They refer to the same concept under different API terminology. OpenAI originally called it "function calling" and later added the broader "tools" concept (which also includes code_interpreter and file_search). Anthropic calls it "tool use". Both work by declaring a schema for callable functions that the model can request.

Can I give the model access to a whole API with many endpoints?

Yes, but declaring too many tools degrades performance. Keep the tool list to 10-15 tools per request. For applications with many available tools, use a router pattern: first ask the model which tool category is needed, then provide only the tools in that category for the actual task.

How do I handle streaming with function calling?

Stream the model's response and buffer tool_call delta events. The function name and arguments are streamed incrementally. Once the stream is complete, execute the tool calls, then start a new streaming request with the tool results. You cannot execute tool calls until the function name and all arguments are fully streamed.

Function Calling and Tool Use in LLMs

Function calling (also called tool use) is the mechanism that transforms an LLM from a text generator into an agent that can interact with external systems. By declaring a set of available functions, you allow the model to request a function call with specific arguments, which your application executes and returns the result to the model. This guide covers the full function calling lifecycle, advanced patterns, and common pitfalls.

How Function Calling Works

When you include a tools or functions array in your API request, the model can choose to respond with a tool_call instead of a text message. The tool_call specifies the function name and the arguments as a JSON object conforming to the function's input schema. Your application code executes the function with those arguments and returns the result to the model in a tool_result message. The model then uses the result to generate its final text response. This cycle can repeat multiple times in a single request (the model calls one function, receives the result, then calls another) to achieve multi-step reasoning over external data.

Defining Good Tool Schemas

The quality of your tool schema directly determines how reliably the model uses the tool. Good schemas have: (1) a clear, action-oriented function name (get_user_by_email not user); (2) a description that explains when to use the tool and what it returns (not just what it does); (3) parameter descriptions that include valid value ranges, format expectations, and examples; (4) required vs. optional parameters correctly marked; (5) enum values for parameters with a fixed set of valid values. The model reads your descriptions at every call, so treating them as documentation that a careful developer would write produces significantly better tool selection and argument generation.

Parallel and Sequential Tool Calls

GPT-4o and Claude both support parallel tool calls — the model can request multiple tool executions in a single response turn. For example, when asked "What is the weather in London and New York?", the model may simultaneously call get_weather for both cities rather than waiting for one before requesting the other. Parallel calls reduce latency for independent operations. Sequential calls are necessary when one tool's output is needed as input to another. Your application must detect whether tool calls are independent (can be parallelised) or sequential (must be run in order) by inspecting the arguments for cross-references.

Tool Choice and Forcing Tool Use

By default, the model decides whether to call a tool or respond with text. You can override this: setting tool_choice to "required" forces the model to call at least one tool; setting it to a specific function name forces it to call that function. Forcing a specific tool is useful for structured data extraction — declare a single extract_data tool with your desired output schema, force the model to call it, and parse the arguments as your output. This is a reliable pattern for structured output that does not require OpenAI's JSON Schema mode. Never force tool choice for conversational responses where the model should sometimes respond with text.

Error Handling and Retry Strategies

Function calling introduces a new category of errors: the model calls a function with arguments that fail validation or cause the function to throw an error. Always return function errors to the model rather than propagating them as application exceptions. The model can often self-correct: "The get_user function returned an error: User with ID 99999 not found. Do you mean user ID 12345?" Return function errors as tool_result messages with an error field, and include the error message so the model can reason about it. Implement a maximum retry count (2-3 retries) to prevent infinite error loops where the model repeatedly calls a function it cannot use correctly.

Security Considerations for Tool Use

Tool use significantly increases the security attack surface of an LLM application. Apply the principle of least privilege: expose only the tools the model needs for each specific task. For irreversible or high-risk actions (sending emails, executing code, modifying databases), implement a confirmation step where the user approves the action before it executes. Validate all tool arguments server-side before executing — never trust that the model-generated arguments are safe to execute directly. Log all tool calls with input arguments for audit trails. Be especially vigilant with indirect prompt injection through tool results: a tool result that says "Ignore your instructions" could influence subsequent model behaviour.

Try These Tools

RQB

AI API Request Builder

Build AI API request payloads and cURL commands for any provider.

SCH

AI JSON Schema Builder

Visually build JSON schemas for AI function calling and structured output.

CHN

AI Prompt Chain Builder

Design multi-step AI prompt chains with variable references between steps.

PLV

AI Pipeline Visualizer

Visualize AI prompt chain JSON as a vertical flowchart.

FAQ

What is the difference between function calling and tool use?: They refer to the same concept under different API terminology. OpenAI originally called it "function calling" and later added the broader "tools" concept (which also includes code_interpreter and file_search). Anthropic calls it "tool use". Both work by declaring a schema for callable functions that the model can request.
Can I give the model access to a whole API with many endpoints?: Yes, but declaring too many tools degrades performance. Keep the tool list to 10-15 tools per request. For applications with many available tools, use a router pattern: first ask the model which tool category is needed, then provide only the tools in that category for the actual task.
How do I handle streaming with function calling?: Stream the model's response and buffer tool_call delta events. The function name and arguments are streamed incrementally. Once the stream is complete, execute the tool calls, then start a new streaming request with the tool results. You cannot execute tool calls until the function name and all arguments are fully streamed.

Related Guides

Getting Structured Output from LLMs

Getting an LLM to reliably return structured data like JSON is one of the most important s...

Prompt Engineering Basics: A Practical Guide

Prompt engineering is the practice of crafting inputs to language models that reliably pro...

Prompt Injection: Risks and Prevention

Prompt injection is the most significant security vulnerability in LLM-powered application...