Critical Static Page Caching

By:

on

April 24, 2007

Note: We've discontinued this process and are using Drupal's boost module as it now fairly effortlessly allows us to produce cached versions of all pages for anonymous users.

Some day there will be a Drupal module that effortlessly allows users to build static caches of their entire site so that plain old HTML can be delivered to 95% of your visitors and you can choose the refresh rate that is appropriate for you. Most sites have old content on it that won't change but every few months and regardless of how many times it has been crawled it seems that search engines are keen to load the page once again, just to make sure. For much of this content you never want to waste another cycle of your server's cpu loading up the PHP and hitting MySQL to deliver it because it isn't really dynamic content.

Unfortunately this solution doesn't exist just yet, for any CMS that I am aware of. There have been some excellent performance advancements within the Drupal core code in Drupal 5, but some pages are going to be very resource intensive to generate. There is an alpha module build for 4.7, considerable discussion about this on Drupal's issue tracker and even a group dedicated to high performance sites. But there wasn't anything that seemed to be ready for production that would generate static caching for my sites.

Especially pages like the home page will get hit regularly (but won't change all that often). When you are trying to make an impression on a new client you need your page to load immediately so that you can make the best use of what time they are going to give you (best not have any extra time waiting for the page to load). There are also times where a single page will get slashdotted and you need a static version of just one page which is under a lot of load.

I came up with the following hack after talking to a friend who was critiquing Content Management Systems and mentioned that simply having a cache of the front page would do a lot to improve the performance of a site. I realized that I hadn't seen a way to do this, so I cobbled one together using some simple command line tools, cron & a simple Apache configuration.

We are primarily developing on Drupal and doing considerable work with Drupal multi-site installations so wanted a means to allow domain specific caching.

I created a shell script (/home/drupal/bin/static_cache.sh) with the following:

cd /home/drupal/files/tmp
wget http://openconcept.ca/index.php
mv index.php index.html

Remember that this script needs to be made executable in order for it to work - chmod +x /home/drupal/bin/static_cache.sh - does the trick. I then created an alias that search for paths with just a / in the path (ie. the home page) and created an alias to my temporary file.

AliasMatch ^/$ /home/drupal/files/tmp/index.html

The cron was set to run once a day as none of these sites required much more than this.

# Min Hour Day Month Weekday
00 * * * * /home/drupal/bin/static_cache.sh >/dev/null

This will work for any version of Drupal, and indeed should work for other CMS applications like Joomla, WordPress & eZ Publish.

About The Author

Mike Gifford is the founder of OpenConcept Consulting Inc, which he started in 1999. Since then, he has been particularly active in developing and extending open source content management systems to allow people to get closer to their content. Before starting OpenConcept, Mike had worked for a number of national NGOs including Oxfam Canada and Friends of the Earth.