Deep Dive In Caching
You might have heared about caching many times and might have implemented the same in your projects. In this blog, we’ll cover in depth what exactly we mean by caching, how does it works and in what scenarios we can use it and what are eviction policies. Stay with me and let’s start.
What is caching?
Caching is a technique used to store frequently accessed data in a temporary storage area, called a cache, so that we can access it quickly when we require that data.
Caching is a powerful concept that helps you to scale your application and make the throughput much higher and latency very low.
This can improve the performance and scalability of a system by reducing the number of times the same data needs to be retrieved from a slower, more persistent storage location.
You can understand caching using this simple explanation:
When you visit a website, your browser will automatically cache certain elements of the site, such as images and stylesheets. This means that when you navigate to other pages on the site, the browser doesn’t need to re-download these elements, which can save time and bandwidth.
This is also why the first time you visit a site, it may take longer to load than subsequent visits.
Caching is generally done on memory, as we know accessing data from the memory is extremely fast then accessing it from the disk. Here are few numbers to justify it:
- Reading from RAM: 10–100 ns(nano seconds)
- Reading from an SSD: 50–100 us (micro seconds)
- Reading from a hard drive: 5–10 ms (milli seconds)
When it comes to latency, the lower the value, the better it is.
Here are few latency number with their order of magnitude:
Let’s try to understand caching with few example and also understand where we can use it.
Application level
Developers can use caching at the application level by implemented using caching libraries such as Memcached, Redis, and Varnish. These libraries allow developers to store data, such as database query results or rendered pages, in memory, so that they can be quickly retrieved when needed, rather than having to be recalculated or re-rendered.
This can significantly improve the performance of an application, especially in situations where the application is under heavy load.
Suppose you have developed an application where people can connect with their friends, chat with them, see their posted media and more.
Here is how your architecture looks like:
When client request data from the server, our server query the database for that data and return the user with that data. This looks like ok to us , what is the issue in this? The issue is when the load on the server increase, it become very expensive to read data from the database, As we are aware that fetching data from the disk is very slow.
Caching can help us solve this problem. we can use libraries to fix this issue. You can use Redis, memcache etc to fix it. Let’s understand how this will work.
In above diagram, we have two requests Request 1
and Request 2
.
Request 1 , will request for a data related to user A,
1
The request is handled by the server.1.2
The server checks the data in cache first.1.3
we don’t have data for that user in cache, so we query the database for that data.1.4
Database return the data to the server1.5
Update the cache with that data.1.6
The server responds the user with that data.
Request 2, will request for a data related to user A,
2
The request is handled by the server.2.1
The server check the data in cache.2.2
The data is returned from cache.2.3
Server responds the user wit data
This will improve the Performance of the application to next level as you are not requesting data from disk , the data is being fetched from the cache and accessing data from cache is super fast.
Network level
Caching can be implemented using proxy servers and content delivery networks (CDNs).
Proxy Server
A proxy server is a server that acts as an intermediary between a client and a server. It can be used to cache frequently requested resources, such as images and documents, to improve the performance of the system.
When a client requests a resource from a server, the proxy server will first check to see if it has a cached copy of the resource. If it does, it will return the cached copy to the client, rather than forwarding the request to the server. This can significantly reduce the amount of traffic on the network and improve the response time for the client.
Suppose there is a company that has a website with a lot of static content, such as images and videos. The company may have servers located in different parts of the world, but the majority of the traffic to the website comes from a specific region.
To improve the performance of the website for users in that region, the company can set up a proxy server in that region, and configure it to cache frequently requested resources.
When a user in that region requests a resource from the website, the request will be handled by the proxy server. The proxy server will check to see if it has a cached copy of the resource, and if it does, it will return the cached copy to the user, rather than forwarding the request to the origin server.
Content Delivery Network
A CDN is a network of servers that are distributed around the world, and are used to deliver content to users based on their location.
When a user requests content from a CDN, the closest server to the user will deliver the content, rather than having the user’s request be routed to the origin server. The content is then cached on the CDN server, so that if another user in the same location requests the same content, it can be quickly delivered from the cache.
Here is an example to understand how CDN works:
- Client A sends a request for image.png to a CDN server.
- The CDN server determines the client’s geographic location and selects the server that is closest to the client.
- The selected server retrieves the requested content from the origin server (the server where the content is stored) and sends it back to the client and store the image in the cache.
- If another client B requests the same content again, the CDN server can retrieve it from the cache on the nearest server, which reduces the load on the origin server and improves the performance of the system.
Hardware level
Caching can be implemented using various types of memory such as RAM and SSD. These types of memory are faster than traditional hard drives, and can be used to store frequently accessed data, such as operating system files and application data.
In modern computer, the CPU also have level of cache memory L1, L2 and L3, which are small and fast memory storage on the chip, that help the CPU to quickly access the frequently used data, this is called CPU caching.
What are Cache eviction policies?
Cache eviction policies determine how a cache will decide which data to remove when it becomes full.
Different eviction policies may be more appropriate for different types of data and usage patterns, and the choice of eviction policy can have a significant impact on the performance of the cache.
Some of the most common cache eviction policies include:
- Least Recently Used (LRU): This policy removes the least recently used data from the cache. The idea behind this policy is that data that has not been accessed recently is less likely to be accessed in the future. This policy is appropriate for scenarios where the working set of data is relatively stable, and it is safe to assume that data that hasn’t been accessed in a while is no longer needed.
- Least Frequently Used (LFU): This policy removes the least frequently used data from the cache. The idea behind this policy is that data that is accessed less frequently is less likely to be accessed in the future. This policy is appropriate for scenarios where the working set of data is dynamic and it is hard to predict which data will be accessed in the future.
- First In First Out (FIFO): This policy removes the data that was inserted into the cache first. The idea behind this policy is that older data is less likely to be accessed in the future. This policy is appropriate for scenarios where the working set of data is relatively stable, and it is safe to assume that older.
In summary, caching is a technique that can be used to improve the performance and scalability of a system by storing frequently accessed data in a temporary storage area. This allows the system to quickly retrieve the data without the need to access the original data source, which can be slower and more resource-intensive. Caching can be implemented at various levels of a system, such as at the application, network, or hardware level, and the use of appropriate eviction policies can help to optimize the performance of the cache.
If you like this blog and want to read more content like this do check out my series and follow for email updates.
Do checkout these blogs:
- Deep dive into System design
- Most commonly used algorithms.
- Consistent Hashing
- Bit , Bytes And Memory Management
- CAP Theorem Simplified
- Event Driven Architecture
Other Life changing blogs for productivity and Focus: