Skip to main content

Caching Pitfalls

The nature of caching anything is that it is a tradeoff: in return for having quick access to some data without having to recalculate it, the data returned is not up-to-date.

This presents a problem - being careful in exactly what data you cache, particularly with regards to complex queries and searches.

Overly-specific cached data

Supposing you are caching search results for an art website. If your search results include information specific to the user who requested them (e.g. whether they liked an artwork that was part of the search results), then you have a problem: even if all the other search parameters are the same, you can't return this cached data for another user's search, because they will now be shown that they liked an artwork even if they didn't like it.

Caching works best if it involves generic data that is commonly requested. If data specific to a user or some other entity is included in a result, it becomes very hard to cache that result efficiently, as the number of cache misses (where there is no cached data for the request being made) will be very high.

Poorly designed cache expiry times

Depending on your use case, some data will be considered "outdated" faster than other data. For example, caching the search results for new artworks on your website with an expiry time of 1 week would be problematic: it would cut down on a lot of database queries, but would also mean visitors would see exactly the same results in that week, making them think the website was not used much or was not working properly.

Similarly, if the results are only cached for 30 seconds, the opposite problem happens: a user doesn't know how many people submitted artworks in the last 30 seconds, and clearing the cache at that rate means your database will get much higher load than it should. There has to be a balance struck between ensuring the data is not outdated, and making the caching time sensible enough to significantly reduce the load on your database servers.