I came across this article from the facebook blog yesterday on their experiment of cache efficiency.

In school, we learned about arranging data in specific ways to leverage the cache cohesion in memory, and that was a relatively difficult concept for me.

Now that I've been working for a while and haven't really had to think about it much, the cache we implement at work are usually just a simple in-memory cache to save database lookups.


Still, this facebook post has indicated it's still important to think about how cache works, even in the age of Gigabytes of available memory

There are some very important take-aways from the study.


The idea of the experiment is quite simple:

  • Create a separate end point to log requests from a special img tag which sends a special conditional header:
  • If the image is new to the browser, it will send a regular GET request to the server
    • Server will send the image data (facebook used the tiniest GIF ever, which is another interesting, but not so useful read) along with a Last-Modified Date and an ETag for browser to cache
    • If the browser has seen the image before, it will send one of or both of if-none-match or if-modified-since headers and no image data, browser will also receive a new modified-date which will be the if-modified-since header it sent
  • Implement the special request on a page that loads all the time

So this is a pretty simple idea, feels like everyone could set up something like this for their project. Could be part of a reporting framework perhaps?

Even if your project does not get more than 10 views a month, it's still good to think about it and keep it in the back of the head in case scaling starts to cause performance degradations


Important question to ask: How long do browser cache stay populated?

  • I totally did not think about this before, so I'd assume many of you might also be saying "aha" right now and realize this is just as important a question as what the cache hit/miss rate is
  • From the blog


Based on our study, there is a 42% chance that any request will have a cache that is, at most, 47 hours old on the desktop. This is a new dimension, and it might have more impact for some sites than others

What the authors of the blog suggest

    • The best practices tell us to use external styles and scripts, include Cache-Control and ETag headers, compress data on the wire, use URLs to expire cached resources, and separate frequently updated resources from long-lived ones