Using cached prompts can save up to 90% on API input costs, the company said.
Credit: T. Schneider / Shutterstock
Anthropic announced Wednesday that it is introducing prompt caching in the application programming interface (API) to its Claude family of generative AI models, which will allow developers to save frequently used prompts between API calls.
Prompt caching allows customers to provide Claude with long prompts that can then be referred to in subsequent requests without having to send the prompt again. “With prompt caching, customers can provide Claude with more background knowledge and example outputs—all while reducing costs by up to 90% and latency by up to 85% for long prompts,” the company said in its announcement.
The feature is now available in public beta for Claude 3.5 Sonnet and Claude 3 Haiku, with support for Claude 3 Opus, its largest model, coming “soon.”
A 2023 paperfrom researchers at Yale University and Google explained that, by saving prompts on the inference server, …