From Facebook: Web performance: Cache efficiency exercise

I came across this article from the facebook blog yesterday on their experiment of cache efficiency.

In school, we learned about arranging data in specific ways to leverage the cache cohesion in memory, and that was a relatively difficult concept for me.

Now that I've been working for a while and haven't really had to think about it much, the cache we implement at work are usually just a simple in-memory cache to save database lookups.


Still, this facebook post has indicated it's still important to think about how cache works, even in the age of Gigabytes of available memory

There are some very important take-aways from the study.


The idea of the experiment is quite simple:

  • Create a separate end point to log requests from a special img tag which sends a special conditional header:
  • If the image is new to the browser, it will send a regular GET request to the server
    • Server will send the image data (facebook used the tiniest GIF ever, which is another interesting, but not so useful read) along with a Last-Modified Date and an ETag for browser to cache
    • If the browser has seen the image before, it will send one of or both of if-none-match or if-modified-since headers and no image data, browser will also receive a new modified-date which will be the if-modified-since header it sent
  • Implement the special request on a page that loads all the time

So this is a pretty simple idea, feels like everyone could set up something like this for their project. Could be part of a reporting framework perhaps?

Even if your project does not get more than 10 views a month, it's still good to think about it and keep it in the back of the head in case scaling starts to cause performance degradations


Important question to ask: How long do browser cache stay populated?

  • I totally did not think about this before, so I'd assume many of you might also be saying "aha" right now and realize this is just as important a question as what the cache hit/miss rate is
  • From the blog


Based on our study, there is a 42% chance that any request will have a cache that is, at most, 47 hours old on the desktop. This is a new dimension, and it might have more impact for some sites than others

What the authors of the blog suggest

    • The best practices tell us to use external styles and scripts, include Cache-Control and ETag headers, compress data on the wire, use URLs to expire cached resources, and separate frequently updated resources from long-lived ones

Initial Apache setup

These are easy items to look up, I just had to look them up too many times

So it feels like a good idea to put them all in one place

1. Create Virtual Host so I don't have to sudo vim every file I change

Apache path: 


Create new config file, or create a copy of an existing file

cp existingconfig.config new.config

The fields will appear something like this

<VirtualHost *:80>
	DocumentRoot "/your/path/here"
	<Directory "/your/path/here">

Use following options under Directory

Order Allow, Deny
Allow from all
Options Indexes FollowSymLinks
AllowOverride All
Require all granted

Allow from all is needed if .htaccess files are used

FollowSymLinks are used with .htaccess files

Require all granted - Important for apache2

Stackoverflow post as proof


2. a2ensite to enable virtual host

3. Enable mod_rewrite to allow friendly URL mode in php

sudo a2enmod rewrite

4. Reload and Restart

sudo service apache2 reload
sudo service apache2 restart
Do you like this page? x