• Velvet
  • Posts
  • Add LLM caching to your app

Add LLM caching to your app

Return results in milliseconds and don't waste calls on identical requests

Round-trip times to LLM providers can be lengthy, upwards of 2-3 seconds per request. With caching, you can return results to identical queries in milliseconds. You also won't pay the LLM provider for the generated response.

What we’ll cover in this article

  • Use a cache layer to reduce latency and costs

  • How to implement caching

  • Reduce latency by an average of 87%

Articles from Velvet

Open AI announced fine-tuning on gpt-4o-mini, free through September. Use logs from Velvet to identify and export a training set.

We’re hiring a founding engineer! Seeking entrepreneurial full-stack engineers. Our system is built on Next.js, Typescript, Postgres, and OpenAI.

Warehouse OpenAI logs to PostgreSQL | Read the docs