home / blog

HTTP caching

Often browsers cache when you do not wish them to do so. POST requests by forms are exempt according to RFC 2616, so we only need worry about GET requests.

A default installation of apache by default will not serve up expires or cache control headers so caching will vary between browsers. Here are the headers sent by my web-host for the root index.php.

wget --save-headers http://www.adamish.com/ -O - | head

HTTP/1.1 200 OK
Date: Tue, 02 Aug 2011 19:59:59 GMT
Server: Apache
Vary: Accept-Encoding
Connection: close
Content-Type: text/html

This means that browsers/proxies/caches etc. will not know how long to cache the page, so results will vary between visitors.

If you look at the headers served up by facebook for example they push a number of different options all with the intent of switching off caching altogether. See RF2616-sec14 for the gory details.

HTTP/1.0 200 OK
Cache-Control: private, no-cache, no-store, must-revalidate
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Pragma: no-cache
Content-Type: text/html; charset=utf-8

So it all depends on your content. Is it “static” with regular update, private per user, entirely dynamic? If it’s entirely dynamic then you’ll need headers like facebook.

If it has a regular update cycle then you could send headers to reflect this like below for 5 minutes. This has the advantage of performance of caching, but short update cycle so visitors never left seeing week old copy of your page.

The code, note how care has been taken to stick to RFC1123 for the date format…

$age = 60 * 5; // 5 mins
header('Expires: ' . gmdate("D, d M Y H:i:s \G\M\T", time() + $age));
header("Cache-Control: max-age=$age");

Static files should really be cached for performance reasons, but at some point you’ll need to update them, and then you’ll be stuck with people still seeing older versions. There are several strategies widely used for this, here in ascending order of correctness.

  • Make links to files with random number as get parameter e.g. /js/horseworld.js?4743838 – non-deterministic and not guaranteed by HTTP specification.
  • Make links to files with version as get parameter /js/horseworld.js?123 – better, but still not guaranteed by HTTP specification.
  • Make links to file hash with files modification as get parameter. e.g. /js/horseworld.js?37a29d5c9a69007eeda5832247f5a389 – better as actually linked to file contents, but again not guaranteed by HTTP specification.
  • Make links to files with version in path e.g. /js/horseworld.v1.2.js, and rename file every time you modify it. Better, path part actually respected in HTTP caching specification.
  • Make links to files with file modification time in path e.g. /js/horseworld.v1312315748.js – and rewrite to /js/horseworld.js with mod-rewrite. Better still as automatically updated when file modified.

So how to implement the mod-rewrite scheme… firstly we need some PHP to rewrite links to include file modification time.

	function getCacheVUrl($url) {
		$localfile = $_SERVER['DOCUMENT_ROOT']. "/" . $url;
		if (file_exists($localfile)) {
			$base = basename($url);
			$dir = dirname($url);
			if (preg_match("/(.+)+\.([a-z]+)/", $base, $m)) {
				$base = $m[1].".cachev_".filemtime($localfile).".".$m[2];
			}
			if ($dir != "/") {
				$dir = $dir . "/";
			}
			$retVal = $dir . $base;
		} else {
			$retVal = $url;
		}
		return $retVal;
	}

We’ll also need a mod-rewrite rule in the .htaccess file (requires mod-rewrite setup too)…

RewriteRule  ^(.+)\.cachev_[0-9]+\.(.+)$  $1\.$2 [L]

And finally change links in pages…

<script type="text/javascript" src="<?php echo getCacheVUrl("/js/main.js"); ?>"></script>
This entry was posted in geek and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published.