Out-of-this-World stories from Rocket Software

 

Even IT companies get IT problems …

In the last two weeks, we at Rocket Software have faced two distinct problems with our own IT systems – a critical database server failed, and just today, our managed phone system went down for the morning. It’s only at times like this when our carefully prepared DR (Disaster Recovery) plans can be fully tested. Read more for the problems we faced, how we coped – and think, how prepared are you for similar eventualities?

We have a number of servers in our office at Rocket Software, and two of them are dedicated SQL Server database servers used for storing a wide range of information, from test databases for improving TempID, database used for developing new products, and our own internal CRM databases. One of these servers failed over the new year holiday, which resulted in the CRM database being off-line for Danny and Stephen, who both like to work from home during the holidays!

All our internal systems are automatically monitored 24 hours a day come rain or shine, hail or snow, and any snags or outages immediately reported to the IT team by email and text message. Paul and myself were immediately on the case to determine the cause of the server outage since we can both connect to any of our servers from home, at any time. The cause of this outage, however, was later determined to be a faulty network card, which prevented any other access to the server. Since we were both away from the office at the time, we couldn’t physically attend the server to diagnose the problem further or restore the server from backups. This resulted in the CRM database being off-line until the first working day back in the new year. Fixing the server was almost trivial – simply replace the network card and full access was restored. We learned the lessons from this outage and now have failover connectivity to each of our servers.

This morning saw us facing a different set of problems – our managed office telephone system failed. We were unable to make or receive calls on our 0845 or 0870 numbers. Obviously for a company providing critical IT support services to our customers, we can’t be in a position where we can’t talk to our customers by phone. Fortunately, we have a second set of phones in the office which use VoIP – Voice Over IP – which means that even if the main phones or the BT lines fail, we’ve always got a backup system in place. We’ve also got email, twitter and the web site to communicate the problem to our customers and we immediately used all these channels to let everyone know what was going on. Fortunately the phones came back up by lunchtime, but without a backup system in place we would have had very concerned customers not able to get in contact with us.

The only problem which needs addressing with the experience of the phones going down is our telecoms provider not being able to immediately redirect our non-geographic 0845 and 0870 numbers to the alternative phone system. Had this facility been in place then nobody outside of Rocket Software would even had been aware of the problem. This is something which we’re currently looking at getting a solution for.

All in all, however, it’s reassuring to know that we have active monitoring in place which alerts us of any problems, and backup systems in place to take over when those problems can’t be immediately resolved.

Why not give us a call to see how we can give you the same peace of mind?


Registered Office: Earl Mill Business Centre, Dowry Street, Oldham OL8 2PF