- Velvet
- Posts
- Add LLM caching to your app
Add LLM caching to your app
Return results in milliseconds and don't waste calls on identical requests
Round-trip times to LLM providers can be lengthy, upwards of 2-3 seconds per request. With caching, you can return results to identical queries in milliseconds. You also won't pay the LLM provider for the generated response.
What we’ll cover in this article
Use a cache layer to reduce latency and costs
How to implement caching
Reduce latency by an average of 87%
Articles from Velvet
Open AI announced fine-tuning on gpt-4o-mini, free through September. Use logs from Velvet to identify and export a training set.
We’re hiring a founding engineer! Seeking entrepreneurial full-stack engineers. Our system is built on Next.js, Typescript, Postgres, and OpenAI.
Warehouse OpenAI logs to PostgreSQL | Read the docs →