Cache
LLM Foundry caches all responses by default.
If a requests has the same attributes below as a previous request, the previous response is returned.
- URL: e.g.
/open/v1/chat/completions - HTTP Method: e.g.
POST - Body: e.g.
"messages": [{"role": "user", "content": "Hello"}]} - HTTP Headers: If you send a different set of HTTP header, a new request is generated. But these headers are ignored when comparing:
Authorization,Cache-Control,Cookie,Host,Origin,User-Agent,Set-Cookie,Connection,Accept,Accept-Encoding,Accept-Language,Content-Length,Vary,Sec-*,X-*.
The X-Cache: HIT response header is set if the request was served from a cached response.
Send a Cache-Control: no-cache header to bust the cache, i.e. bypass previously cached responses.