GovLoop - Knowledge Network for Government

Having a website crash due to high traffic is a failure of management, not load

Today has provided an interesting lesson for several organisations, with the crash of both the David Jones and ClickFrenzy websites in Australia.

But first, some background.

ClickFrenzy is a new 24-hour sale for Australian online retailers starting from 7pm on Tuesday 20 November.

Based on the US 'Cyber Monday' sale, which now attracts over 10 million buyers, ClickFrenzy was designed to entice Australian online shoppers to buy from local online retailers by offering massive discounts on product prices for a short period of time.

The event was announced over a month before it was due to start and has been promoted through newspapers, online and in some retail stores, with the ClickFrenzy team expecting thousands of shoppers to log on, likening it to a "digital boxing day sale".

I've kept an eye on the ClickFrenzy site and signed up to receive an email alert when the sale began.

Just before the sale started I hopped back onto the ClickFrenzy site to see how it was going, and only saw a basic page of text, with no graphics or formatting. Puzzled I tried reloading - and the site wouldn't load at all.

That's when I hopped onto Twitter and learnt from the #clickfrenzy hashtag that the ClickFrenzy site had already crashed from the load and no-one had any idea when it would be back online.

This meant that the list of participating retailers (many of whom had been kept secret) was inaccessible. No shopper knew who had the specials, meaning few sales could occur. Of the retailers that were known to be participating, many of their sites crashed too (such as Priceline and Myers).

In competition with ClickFrenzy, David Jones had decided to run its own independent 24-hour sale over a similar time period. Their sale, named 'Christmas Frenzy', was to be run from their main website.

How did their launch go? Their site also crashed, and was down for several hours, taking down not only the shopping site but all their corporate information.


So we had two major online sames on the same day from Australian retails, and both experienced crashes due to the volume of traffic.

What was to blame? Both claimed the failure was due to unprecedented demand. So many people tried to get onto both sites that their servers could not cope (the same reason given for the mySchools website issues at launch in 2010 and the CFA website issues during the Victorian fires in 2009).

Let's unpick that reasoning.

The world wide web is twenty years old. Amazon.com is 18 years old. The US 'Cyber Monday' sale is six years old.

David Jones is an experienced retailer, with significant IT resources and has been operating an online store for some time. Their Christmas Frenzy sale was planned and well promoted.

Click Frenzy is being run by experienced retailers as well. They built an emailing list of people interested in the event and also widely promoted the sale. The retailers supporting them are large names and operate established online shopping sites as well.

In both cases the organisers had a wealth of experience to draw on. The growth of Amazon, the US Cyber Monday sales, their own website traffic figures and email list sign-ups, not to mention a host of public examples of how to manage web server load well, and badly, from media sites, social networks and even government sites (such as mySchools and CFA examples above).

There are many IT professionals with experience on how to manage rapid load changes on web servers.

There's scalable hosting solutions which respond almost instantly to fast-increasing loads, such as during an emergency or with breaking news, and 'scale up' the site to support much larger numbers of simultaneous users. (Though in the case of Christmas Frenzy and Click Frenzy a large increase in load was expected, rather than unexpected.)

There's even automated processes for testing how much load a website will be able to bear by simulating the impact of thousands or millions of visitors.


In other words, there's no longer any technical reason why any organisation should have their website fail due to expected or anticipated load.


Load is not a reason, it is a justification.

We have the experience, knowledge and technology to manage load changes.

What the Click Frenzy and Christmas Frenzy failures illustrate is that some organisations fail to plan for load. They haven't learnt from the experience of others, don't invest in the right infrastructure and may not even test their sites.

They are literally crossing their fingers and praying that their website won't crash.

A website crashing when it receives a high level of load that could be expected or planned for is crashing due to a failure of management.


The next time your agency's management asks you to build a website which is expected to have a big launch or large traffic spikes, ask them if they're prepared to invest the funds necessary for a scalable and tested website, built on the appropriate infrastructure to mitigate the risk of sudden large increases in traffic.

If they aren't then let them know to cross their fingers and pray - and that a website crash due to high traffic is a failure of management, not load.

You might even get a Downfall parody video to memorialise the failure - as Click Frenzy received within two hours of their launch crash.





Original post

Views: 337

Comment

You need to be a member of GovLoop - Knowledge Network for Government to add comments!

Join GovLoop - Knowledge Network for Government

Comment by GovLoop on November 21, 2012 at 8:38am

So true - holds the same for gov't sites that go down in times of emergency.  No reason these days

© 2014   Created by GovLoop.

Badges  |  Report an Issue  |  Terms of Service