Skip to main content

Introduction to caching

Caching is a fairly big topic, but for the purposes of this chapter, we are looking at server-side caching. That means your webservers, cache servers and other infrastructure - i.e. not your user's computer - performing caching.

The one we will pay the most attention to here is caching data in a dedicated cache server, such as a Redis server. This is very beneficial for any website that serves dynamic content, as it relieves a lot of your strain on your other infrastructure, especially your database server, in the context of hard work like performing database searches.

Although databases are made to be extremely efficient, large and complex search queries can take considerable time. On its own that isn't a big deal - you'll rarely see it takes ages in the console outside of a huge production database - but if for example your users are running a ton of searches, and you don't cache them, it can quickly bring your database server to its knees.

The objective: caching data that doesn't need to be up-to-date

For example: let's suppose a user on your website searches for "huge anime tiddies". Your database goes and searches a gigantic table of data for information matching that query, and returns the result to the user.

The user is not going to be interested in whether searching for "huge anime tiddies" at, say, 17.59PM on a Monday has a different result than if they searched at 17.58PM that same Monday. For one, they don't even know if anybody uploaded new content matching that term in that timeframe, but more importantly, it's not necessary.

When you have a few users, this doesn't matter: the load is small. When you have, say, a million users - and 10,000 of them are searching for anime tiddies at any given time - the load is huge. Your database server is performing a ton of searches, likely returning exactly the same results, yet performing all the same expensive computations and checking the entire contents of the database tables involved in the query every time.

Instead, it makes sense to cache the results of the query for a given time period, depending on how up-to-date the information needs to be. If you cache it for 15 minutes, then the query is only run once every 15 minutes at most; anyone searching for "huge anime tiddies" when there is already a cached result gets the cached result. Unlike your database server, the cache is not searching for anything or performing any computation; it's just returning the result it already has. This is much more efficient for your website, resulting in faster searches and less strain on your servers.

Cache server characteristics

Data is not persistent

Cache servers are not designed to store data "persistently". Think of it like your computer's memory versus the hard disk: the memory is faster than the hard disk, and used for caching, but does not store any data if you switch the computer off. Cache servers work on the same principle: part of their performance is that they keep all data in memory, which also means they lose that data if they are shut down. Bear this in mind when storing data in cache servers.

Storage capacity is far lower than other server types

To be fast, cache servers keep all data in RAM, rather than on disk. As a result, they have far less total storage capacity you can store data in.

This is not generally a problem, but it does mean being careful not to store unnecessarily huge data in your cache server if you don't need to, because it can quickly cause it to fill up with more data than it can hold.