Storing User Sessions

In most websites, user "sessions" are stored to keep users logged in, among other things. There are some important things to note about this that are explained below.

User sessions aren't only for logged-in users

A user session can be for any website visitor. A non-logged-in user might, for example, choose to use a dark or light theme on your website; generally, this is handled in a session kept for that visitor.

Understanding how user sessions are stored

In very small example websites, the entire tech stack might be on one webserver. In these cases, the server would store user sessions in local files on that webserver. However, in production, most websites do not use one webserver (and even for those that do, they are better off planning for a scenario where there might be more than one webserver in future). This has important implications, and understanding them means explaining the concept of a load balancer.

Load Balancers

In website terms, a load balancer is like a sort of gate-keeper. If you imagine a big website like Amazon.com, there are likely thousands of webservers that each host the "Amazon" website, and upon entering the URL in your browser, you might be directed to any one of those webservers. Each time you view an Amazon page, your browser goes to the load balancer, and the load balancer tells it which webserver to use. The point of the load balancer is both to prevent one webserver being overloaded, and also to allow multiple webservers to be used.

Without a load balancer, your browser wouldn't know where to actually go if you typed "amazon.com" into the navigation bar; in the backend, the DNS entry for Amazon has to resolve to some IP address. That IP address, for the purposes of this explanation, is the load balancer: the load balancer then picks a webserver for your request, and sends your browser to that address to serve the Amazon webpage. This lets a website use multiple webservers while still having one "point" of entry.

The problem then is: suppose you log in. Your session is stored on the webserver you're on, not the load balancer. If you're directed to a different Amazon webserver later, how will it know you're still logged in? Under this system, it wouldn't. Its local files do not have the session data the other webserver had.

There are two solutions to this problem, and you will have to choose one of them.

Option 1: Sticky Sessions (not recommended)

In a sticky sessions configuration, the load balancer remembers which webserver it sent a particular user to, and then sends them continually to that webserver for the duration of their session. This way, the webserver can still keep the user's session in its local files, and since the user is always being directed to the same webserver for that session, there's no more problem on their end.

The problem with this approach is twofold. Firstly, sticky sessions make it harder for the load balancer to do the job it is named for - load balancing. If the webserver your session is 'stuck to' is overloaded, the load balancer can't move you to another webserver with less load, because your session isn't on the other webserver; it can only balance the load between webservers by directing visitors who have no existing session to a less-overloaded webserver.

Secondly, storing session data on webservers is a fairly inefficient process. Because the sessions are generally stored on disk, this means retrieving them can be a little slow. As a result, another, preferred option is usually chosen.

Option 2: Managing sessions in a cache server (recommended)

Instead of holding user session data on the webservers themselves, you can configure the webservers to store user session data in a central location - a cache server, designed for holding small amounts of data to be stored and retrieved extremely quickly. Redis servers are very popular for this, but other choices exist.

A cache server, such as those that operate using Redis, has very small amounts of storage compared to a typical hard disk or webserver. This is intentional; the cache server keeps everything in memory (RAM), which means it is extremely fast. In addition, cache servers (and Redis) are built for simplicity, to avoid complex operations that could take a long time.

By keeping all user sessions in this one location, there is no longer a need for sticky sessions. This means session storage and retrieval is faster, due to being in memory and not on disk, and also means that the load balancer can move a user between webservers at any time without messing up their session.

Various server-side frameworks have tools for doing this. PHP has tools for allowing sessions to be stored and managed in Redis; for instructions on setting up PHP to use Redis as a session handler, take a look at this article: PHP.ini modifications.