Caching

Introduction

Unlike persistent storage tools such as Relational or Non-Relational databases, caching tools are a form of transient datastore designed to keep information for a finite period of time. This is most often done to improve the performance of an application by reducing the load on the database.

Imagine that you are given a large book containing all the names of Noroff’s students and every course they are signed up for, the start and end dates for each of these courses, and so on. Students may approach you and request this information, and you are obliged to find their data and provide it to them when asked. Without a caching strategy, you soon find a long queue of students waiting for their information, as it takes too long to find each unique location in the book to satisfy the demand.

By adding a blank piece of paper to your workflow, you now have the ability to make notes that will save a trip into the pages of the book. For example, you may have noticed that almost every student asks about the summer holiday and the details of their next course. Recording holiday dates in the cache will reduce the time needed by around half per student - a big saving for a small effort. Next, you could begin caching information on the courses by copying out the entire entry into your cache or by simply storing the page number needed to find it.

If you write out the whole entry, you risk running out of space on your page. If you write down the page numbers for each course, you can fit more into your cache, but you still need to consult the book. This tradeoff is the exact type of decision a backend developer will make when deciding on an application’s caching strategy.

Broadly speaking, there are four main types of server-side caching technologies:

  1. In-memory caching: Information is stored in the RAM memory of the server for rapid recall. This is the fastest type of caching, but data is lost when the server is restarted. Popular examples of this form of caching include Memcached and Redis.

  2. Disk caching: Information is stored on the hard drive of the server. This is slower than in-memory caching, but data is not lost when the server is restarted. This is commonly done at the OS level using a Page File.

  3. Distributed caching: Information is stored in a cluster of servers that share information. This is the most resilient type of caching but is slower than in-memory caching. Popular examples of this form of caching include Memcached, Hazelcast and Aerospike.

  4. Reverse proxy caching: Information is stored by a special server sitting in front of the application server network. Responses are cached as they are generated by the web server. Popular examples of this form of caching include Nginx, Squid and Varnish.

Brief history

Caching has been used to offset performance limitations with technologies such as Non-Relational databases since the dawn of the Internet. Early caching was typically done on the client side rather than on the server side, which led to issues surrounding data duplication and source of truth.

Disk caching was the first popular approach, due to the relative cost of hard drive space in comparison to RAM at that time. As RAM became cheaper, in-memory caching became more popular due to it’s speed and simplicity.

The earliest reverse proxy caching project that is still in use today, Squid, was developed in 1996 by The National Laboratory for Applied Network Research (NLANR).

Later in 2002, Nginx was created by Igor Sysoev as a lightweight alternative to the dominant Apache web server. Although it was originally designed as a web server, it has since become popular due in part to built-in reverse proxy caching.

In 2003, Brad Fitzpatrick at LiveJournal developed Memcached, the first in-memory caching tool to gain widespread adoption. This was followed by Redis in 2009, which was designed to be a more robust alternative to Memcached.

As the hosting industry moved towards cloud computing, distributed caching became more popular due to the ability to scale out the number of servers in a cluster. This also allowed for the use of shared-nothing architectures (sharding), which are more resilient to failure.

In the modern day, many caching tools offer multiple strategies in a bid to select the best approaches from all available options. It is commonplace for application operators to employ multiple layers of caches together to provide the best performance. Caches are essential to providing performant API service, and without them, application operators would spend significantly more on resources. End users would also notice a considerable decrease in the performance of all of their favourite web services without this technology.

Indexing

Indexing is a technique used to speed up the retrieval of data from a database. Before the server serves any requests, an index file is created by analysing all of the tables in the database. This file simply contains the location of each piece of data rather than it’s entire contents. This allows for a much faster lookup time, as the server does not need to scan the entire table to find the requested data.

In the book analogy, this would be most similar to the approach of recording page numbers for each course. It might make sense to do this early on before the students arrive at your desk. Since the book is not frequently updated, you may only have to create this index once every few months. On the downside, you still need to consult the book to find the data, even if you have a record in your cache. This makes the process slower but more space efficient.

Since indexing is done before service is required, when changes are made to the structure of the DB, there is little downside to applying this strategy to your application. In fact, modern database applications will often do this automatically or at least include this as a configuration option. Every book, for example, contains an index that serves the same purpose amongst the first or last pages. In our analogy, you may not need to create this index at all!

It’s important to note that indexing is much more challenging to implement in a NoSQL database than in a SQL database since the structure of the data cannot be guaranteed. Consider the difference between indexing a fictional novel and a historical textbook. The novel has chapters and pages, but each chapter is a long block of continuous prose. The textbook has distinct sections with headings, diagrams, images and additional resources. We can write a much more detailed index about the textbook than the novel. This is roughly analogous to the difference between indexing a relational database and a NoSQL database.

Response caching

Response caching is a popular and straightforward strategy for caching data from a server. In the analogy of the book, this would be the equivalent of writing down the results from every student’s request on the paper page. Doing this by hand would be laborious, and over time, you may find yourself copying the entire book manually, but this task is trivial for a software application.

This form of caching is popular amongst API developers, as it is easy to implement and can provide significant performance improvements. Each API request is cached for a set period of time, and the next time an identical request arrives, the server responds with the cached data rather than querying the database. Since the resources needed to save the cached version are smaller than those needed to query the database, there is a net win in terms of response time and cost.

This approach is not without limitations or drawbacks, however. If the data served by the API is dynamic and changes frequently, or without warning - then the cache will be out of date and will need to be invalidated. The solitary purpose of an API is to provide reliable information to it’s users, so the issue of stale data is significant. The primary way to mitigate this issue is to control the lifespan of cached data so that data is refreshed periodically. In the analogy of the book, this is akin to throwing away the paper page every morning and starting afresh.

Although this strategy is popular amongst API developers, it was first used to speed up the response times of more conventional web applications, such as blogs or forums. When viewing an HTML file on the web, it is more likely than not that you are viewing a cached version of that file. This approach is not appropriate for all applications, such as a banking application where data is frequently updated and must be accurate.

Object caching

Object caching is a more sophisticated approach that aims to save on database activity by caching highly requested data from complex queries. In the analogy of the book, this would be the equivalent of writing down the results from every student’s request on the paper page, but only the key data for the most popular courses. This would allow for some balance between the indexing and response caching strategies in terms of speed and space efficiency.

Instead of caching the entire response and effectively preventing any API activity, an Object Cache will save the raw result of the query operation and return this the next time the same query is attempted. This has the same sort of benefits as Response Caching; however, these savings can be returned to more users at once. Response caching can only cater for a single request at a time, whereas a result in the object cache might be useful for hundreds of discrete requests.

This form of caching is also NoSQL-friendly since it does not require any knowledge of the structure of the data. The only information needed to operate this cache is the query and the result of the query. The language or structure of the database does not affect the operation of this form of cache. In practice, it is common to implement more than one layer of caching, with the Object Cache sitting between the Response Cache and a pre-indexed database.

Content caching

Although we have discussed methods to speed up data retrieval, there are many other resources besides raw data that we need to render and interact with a modern web application. Each page requires a static JavaScript bundle, one or more CSS files, images and other multimedia assets. These files are often large and can take a long time to load compared to the time spent to fetch an API request - making their optimisation a crucial consideration.

A Content Delivery Network (CDN) aims to cache these static assets and serve them from a server located as geographically close as possible to the user requesting them. For example, a website based in the USA most likely stores its image assets on a server in the USA. If a user from Japan visits the site, they will send their requests to the USA to be fulfilled. Due to the significant physical distance between the origin and destination, this request will take a long time to complete. If, instead, the American company employs a CDN, the request will be rerouted to a server located in Japan or a nearby country - dramatically cutting down the round trip time. For content-heavy applications such as video streaming services, this can significantly improve performance.

There are, however, similar drawbacks to the other forms of caching. Most importantly, cached assets might become stale if their originals are updated. For example, an important PDF file outlining the legal terms of service might be updated in response to a lawsuit - but the cached version will still be served to users. This could result in legal misunderstandings as customers are viewing different versions of the same file.

Mitigating cache invalidation

As we have seen, any caching strategy has some significant limitations. Most notably, the data that is cached must be kept up-to-date and accurate. If the data is stale, the cache will serve out-of-date information to users. Several strategies can be employed to mitigate this issue, and we will discuss some of the most common approaches below.

Time To Live

TTL (Time-To-Live) is a caching policy that determines how long an item will remain in the cache before it is considered stale and removed. Each item in the cache is assigned a TTL value when it is added to the cache. The cache management system periodically checks the age of each item and removes any items that have exceeded their TTL.

The TTL value is typically set in seconds, and it can be set differently for different types of items or different use cases. For example, a website that displays real-time stock prices may have a very short TTL (e.g. a few seconds) for stock price data, while a website that displays weather forecasts may have a longer TTL (e.g. a few hours) for forecast data.

TTL-based caching can help to ensure that the cache remains fresh and up-to-date by removing items that are no longer relevant or accurate. However, it also has some limitations: if the TTL is set too low, the cache may not be able to take full advantage of its benefits, while if the TTL is set too high, the cache may contain stale data for an extended period of time.

Least Recently Used

Least Recently Used (LRU) is a caching policy that removes the least recently used items from the cache when it becomes full. The idea behind this policy is that items that have been used recently are more likely to be used again in the near future, so they should be kept in the cache.

When an item is added to the cache, it is added to a list or queue that keeps track of the order in which items were used. Each time an item is accessed, it is moved to the front of the list. When the cache becomes full, and a new item needs to be added, the item at the back of the list (i.e., the least recently used item) is removed from the cache to make room for the new item.

LRU caching is relatively simple to implement and has the advantage of being able to adapt to changing usage patterns over time. For example, if a particular item becomes less frequently used over time, it will eventually be removed from the cache to make room for more frequently used items.

LRU caching is often used when the cache size is fixed and the number of items that can be stored in the cache is limited. It is also used in scenarios where the data changes frequently, and it is important to keep the cache up-to-date with the latest data.

Least Frequently Used

Least Frequently Used (LFU) is a caching policy that removes the least requested items from the cache when it becomes full. The idea behind this policy is that items that have been accessed less frequently are less likely to be accessed frequently again in the near future, so they should be removed from the cache.

When an item is added to the cache, it is also added to a counter that keeps track of how many times it has been accessed. Each time an item is accessed, its counter is incremented. When the cache becomes full and a new item needs to be added, the item with the lowest access count is removed from the cache to make room for the new item.

It is worth noting that LFU has some limitations. For example, if an item is accessed once and then never accessed again, it will be kept in the cache indefinitely. Another limitation is that it does not account for recency; items accessed recently but infrequently are treated the same as items accessed a long time ago but infrequently.

Resources