Function-calling is a powerful feature in large language models (LLMs), enabling interactions with external systems, real-time data retrieval, and the automation of complex workflows. OpenAI, for example, allows developers to register functions directly with their assistant, eliminating the need to include the functions in the prompt itself. However, there’s an often-overlooked drawback: hidden costs. In fact, OpenAI’s system includes all registered functions in the prompt during processing, creating significant token overhead with each request — even when functions are not actually used.
Measuring Hidden Costs of Registered Functions
To understand the impact, let’s start with a baseline prompt that doesn’t require any function calls:
Capital of Canada
This simple prompt, with no functions registered, uses 65 input tokens.
Once we register functions — even without calling them — the token count increases significantly due to the automatic attachment of the full function set. For our measurements, let’s use OpenAI’s sample get_weather function:
{
"name": "get_weather",
"description":"Determine weather in my location",
"strict": true,
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state e.g., San Francisco, CA"
}, "unit": {
"type": "string",
"enum": ["c", "f"]
}
},
"additionalProperties": false,
"required": ["location", "unit"]
}
}
Token Overhead with Registered Functions
Here are some measurements using the simple prompt against various number of registered functions:
This overhead accumulates even when functions remain unused, resulting in a high token count solely from having functions registered. Additionally, the size of a basic function like get_weather is minimal, which is manageable with a single function. However, with a larger function set — such as 50 functions — more detailed descriptions are necessary to help the LLM accurately distinguish between them. Moreover, many functions will be inherently more complex than a simple weather function. Therefore, the overhead values in the table represent the most optimistic estimates, and actual usage could be substantially larger.
Additional Costs When Functions Are Called
When a function is actually called — such as with the prompt “Weather in Taipei” — the token usage increases further:
In this case, there is in fact two submissions to the LLM — the first being the original prompt and the second with the invoked function results. It seems that even in the second submission, the entire function set is attached to the prompt.
Contextual Function-Calling
To address these hidden costs, a more efficient approach could be Contextual Function-Calling. Instead of registering a static list of functions, Contextual Function-Calling dynamically selects functions based on the specific prompt and attaches only those that are relevant. This approach would break function-calling into two phases:
1. Function Identification Phase: The LLM evaluates the prompt to identify relevant functions, filtering out any that don’t apply. If none are needed, the LLM proceeds without adding functions to the prompt.
2. Parameter Mapping Phase: Once relevant functions are selected, the LLM extracts prompt entities and maps them to function parameters.
This approach could greatly reduce token usage by including only the functions that are contextually necessary. For example, with Contextual Function-Calling, a prompt like “Capital of Canada” would avoid unnecessary functions, while a prompt like “Weather in Taipei” would attach only the get_weather function. By focusing on context to determine function relevance, this method could significantly cut costs, particularly in complex cases with extensive function libraries.
Contextual Function-Calling introduces a dynamic, context-aware approach to function selection, potentially transforming function-calling into a far more cost-effective and scalable solution in LLM applications.