The recent outage at Salesforce.com has created a buzz. Being responsible for the webMethods architecture and infrastructure at my company, I know all to well about unplanned outages and the affects on the customer. Blaming the software vendor for your outage? Well it depends. Somewhere around 99.999% of all outages are caused by human error. Either not installing correctly, failing to test the architecture and its cluster capabilities, introducing change into the environment, not following procedures, operating in an un-supported format and the list goes on. I have seen cases where the vendors software was to blame but it is pretty rare.
One important lesson I have learned over the years. Architecture your application and infrastructure with high availability in mind from the start. Most of your clients will not be willing to pay for it and will swear they don't need it. Then the first time it crashes all hell breaks lose. Sometimes you have to think for them. If a business application is worth doing in the first place(somebody is paying for it), chances are people are doing to need it most of the time, which means even small outages will cause you grief.
There are a lot of different ways to accomplish this. Your available budget will help guide you. More on this topic to come.