The problem we want to solve: you deploy a new version of your site, but some browsers are still using the old CSS and JS files, until you manually force-refresh with CTRL-F5.
You are probably missing a
cache-control: public, must-revalidate header, or a variation of that. It is not enough to specify only
etag, because the browser might choose to cache it indefinitely, without ever revalidating it.
Quick recap on HTTP caching
Since HTTP has been around for a while, this topic has become extremely big, so I'm going to narrow this down to what I consider the most important. When a browser requests a file, it will look into the cache first, and see what cache metadata it has for it:
- I already have a copy of this file, and you said it expires on 2020-05-03 20:00.
- I already have a copy of this file, you said it was last modified on 2020-05-03 20:00.
- I already have a copy of this file, and the etag hash is DEADBEEF.
The first one is expiration based caching, the second and third is validation based caching. The first one involves the Expires header, the second one the last-modified header, and the third one the etag1 header.
The main difference between expiration and validation based caching, is that expiration based caching never validates the cache entry. The browser will not even send a request to the server, when it encounters an asset cached with an
With validation based caching, the browser will send a
Not-modified-since, or a
etag header when requesting the file, and server can check that based on this information, whether or not the cached file the browser has is still valid. If the server decides the cached file is valid, it will send a
304 Not modified status code, and an empty body, telling the browser to use the cached file.
Expiration based caching does not makes sense for static assets (see later for a special case) like CSS/JS/images, since you don't know in advance when/if you will modify them, so that leaves us with validation based caching. Incidentally, many web servers (well, nginx definitely does), are configured to send both etags, and last-modified headers by default. So that would mean we should never run into the scenario of having stale JS/CSS files used by the browser, since if a deployment modifies these files, both the etag, and the last-modified date will change as well. So what gives?
Unfortunately sending down these headers, doesn't mean that the browser will revalidate the cached files. Without also sending a
cache-control header, the browser will just cache it forever, without ever revalidating the file (which can also be useful, see later). I'm not sure about the specifics on this, if this is something that the standards dictate, or it is up to the browser to decide what to do in the absence of a cache-control header, but this is how it works today on Firefox, and it's very counter intuitive.
So the fix is, to also specify a cache-control header. It can take on many values2, but the one we need is:
Cache-control: "public, must-revalidate"
This tells the browser to cache the assets, but before using them, always revalidate with the server (through the user of etags or last-modified). You can plop this either in an nginx
location directive, or specify it on your backend, and it will solve the problem mentioned above.
Going the extra mile
The solution above is good already, but we can make it even better. Currently, the browser will still send a request for every file to check whether the cache is valid, but we can avoid that as well. If you set up your build pipeline, so that it always generates a unique filename, based on the contents of the files, you can fall back to the original behaviour, and the browser will not even request the file at all, after it has been cached once. Webpack can already do this kind of filename generation through the HashedModuleIds plugin, but other build systems can probably do it as well. If you go this route, you should also specify:
Cache-control: "public, immutable"
On your files, just to make sure.