Analysis of GoC .gc.ca Domains

By:

on

May 30, 2012

I ran into Ben Balter's article Analysis of Federal Executive .Govs and really liked the simple but effective analysis that he did on the US government. Not that the technology behind it is simple, his Site Inspector and WordPress Domain Inventory plugin module are great. It does a pretty good job at tracking the site status, support for non-WWW and IPv6, use of Content Delivery Network (CDN), choice of Content Management System, Cloud Provider, Analytics, JavaScript Libraries, and HTTPs support. I do hope that I have some time to add some enhancements to this project, but first I was interested in simply gathering data about the nature of the Government of Canada's web sites.

Ben was able to use a public list of domains published on Data.gov to begin his analysis, but unfortunately the Canadian Open Data site doesn't seem to provide this. Fortunately, I could build a list on the main GoC site and then posted just the ones with gc.ca domains to this Wikipedia page where hopefully it will be maintained and expanded.  The USA list is broken down by agency, but unfortunately I don't have this type of metadata on the domains I've gathered.

I surveyed over 220 government domains using this tool and was surprised at some of my findings, even though they are very much in step with those in the USA.  

Summary of Canadian Government sites:

  • 27 still had no non-www support - yikes this is a one line fix in a Apache config file
  • There was limited support for next generation IPv6 (see TBS statement)
  • 76 supported HTTPs for encrypted transactions
  • Only 4 use the Akamai CDN to speed up delivery
  • None seem to be using the big USA based cloud servers
  • Google Analytics is only used by 13 sites though some are probably using more traditional log analysis
  • The most popular Javascript Library in the world and also in the GoC is jQuery (42), but it is also the one adopted by the Web Experience Toolkit.  There is also some use of Dojo (1), Mootools (6) & Prototype (3)
  • Drupal (17) is also the most popular CMS, followed by WordPress (7), Sharepont (3) and finally Joomla (1). More comments & links below.
  • Microsoft-IIS 5.0 to 7.5 (104), Apache 2.0.63 to 2.2.3 (69), Zeus 4.2 to 4.3 (13), Lotus-Domino (6), Oracle-Application-Server 9.0.2 to 10.1.3.4.x (6), Zope 2.7.8-final (1), nginx (1)

3rd Party Services

Content Delivery Network (CDN) are increasingly being used to improve response times for websites.

The following use the CDN Akamai:

Google Analytics is a powerful tool, but has only recently been adopted by several government departments due to Google's service agreements. Only these sites (as far as this script can tell) are using analytics tools that allow them to get a good understanding of how citizens are actually using these sites.

Sites using Google Analytics:

Open Source CMS Solutions

Drupal is used by:

Wordpress is used by:

Joomla is used in http://crr.ca but I was a bit surprised not to see any Typo3 instances. 

Other CMS Solutions

Sharepoint was used by 3 sites, but no other proprietary CMS was listed in this survey. It is used more extensively for Intranets.

Interwoven has been known to be deployed at DFAIT, Agriculture Canada, Industry Canada, Canada Post.  There are several CMS solutions that essentially bake the HTML output into flat HTML files that makes it very difficult to sniff out. It might be possible to guess by searching for signature URLS or unique files, but it may not be possible in all instances. 

I had never heard of the Canada Science and Technology Museums Corporation but it is a crown corporation running several government sites with what is probably a custom built software solution.  If there are other solutions like this that don't have a critical mass of global users, then they are unlikely to show in this list as well. 

I decided to write a quick script to pull the generator metatag to pull out other, less common CMS solutions.  Using this I found sites reporting the use of CommonSpot Content Server, PRISM(TM) & DotNetNuke & FrontPage. This doesn't necessarily reflect the back end however.

I'd like to see a few other reviews added to this script.  I'd like to know:

  • Are there RSS, RDFa or Atom feed available?
  • Automated validation for accessibility & HTML compliance
  • Which version of HTML is being supported?
  • Page load times
  • Which versions of known CMS's
  • Privacy policy statements (Terms & conditions)
  • Links to social media sites like Twitter, Facebook & YouTube
  • Mobile readiness (either responsive theme or presence of a m.example.com)
  • Finding links to related sub-domains linked to from scanned pages

However, already this script provides access to a lot of information which institutions do not have a means to keep track of.  Keeping track of the tools used for sites inside & outside of organizational firewalls is often quite difficult. 

As noted in Ben Balter's original post:

Please note: This data is to be treated as preliminary and is provided “as is” with no guarantee as to its validity. The source code for all tools used, including the resulting data, is available in GitHub. If you find a systemic error, I encourage you to fork the code and I will try my best to recrawl the list to improve the data’s accuracy.

About The Author

Mike Gifford is the founder of OpenConcept Consulting Inc, which he started in 1999. Since then, he has been particularly active in developing and extending open source content management systems to allow people to get closer to their content. Before starting OpenConcept, Mike had worked for a number of national NGOs including Oxfam Canada and Friends of the Earth.