Velvet
Posts
Add LLM caching to your app

Add LLM caching to your app

Return results in milliseconds and don't waste calls on identical requests

August 02, 2024

Round-trip times to LLM providers can be lengthy, upwards of 2-3 seconds per request. With caching, you can return results to identical queries in milliseconds. You also won't pay the LLM provider for the generated response.

What we’ll cover in this article

Use a cache layer to reduce latency and costs
How to implement caching
Reduce latency by an average of 87%

Articles from Velvet

Open AI announced fine-tuning on gpt-4o-mini, free through September. Use logs from Velvet to identify and export a training set.

Learn more →

We’re hiring a founding engineer! Seeking entrepreneurial full-stack engineers. Our system is built on Next.js, Typescript, Postgres, and OpenAI.

Learn more →

Warehouse OpenAI logs to PostgreSQL | Read the docs →