First, having been through similar crisis experiences (but not affecting such a massive number of users), we should congratulate the Skype team, especially the developers, on their dedication and determination to not only find the problem late last week but also to ensure it does not repeat itself.. I certainly appreciate that they at least put out regular information updates but also that their priority had to be resolving the technical challenge at hand. On the other hand, four years of continuous operation with few, if any, outages while building and scaling to support as many as 10 million concurrent users online does have to be some kind of record.
Personally all my Skype services and devices, including IM+ and iSkoop on the Blackberry as well as a couple of PC Free phones I am evaluating, are working again and have been up since about 2130 GMT Friday.
Skype has provided a high level explanation of what happened August 16 such as to deny Skype access to the majority of their users. Since then the Internet has been ripe with attempts to explain what happened, from Russian conspiracies to VoIP architecture issues. But, as Skype readily admits, it was a software problem within their own p2p architecture that caused the problem. But along with their explanation arise some questions:
"The disruption was initiated by a massive restart of our user’s computers across the globe within a very short timeframe as they re-booted after receiving a routine software update." Which software update? Upgrades to Skype 3.5? Was their any role in the release of the Skype bugfix release to 22.214.171.124? Or on a more extensive scale, was it caused by the latest Windows security software upgrade which automatically generated a reboot of your PC resulting in massive Skype logins? (Someone has actually noted that the next Windows patch update is due out on Tuesday, 9/11.)
"This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources, prompted a chain reaction that had a critical impact." What action has been taken to ensure the lack of peer-to-peer network resources has been addressed? Without getting into a war of words, can Skype provide a response to the SightSpeed comments on the Skype p2p architecture? Is Skype going to "seed" a network of SuperNodes to provide the backup peer-to-peer resources necessary to prevent such an outage going forward?
As a one time manager of a telemarketing team, I had to deal with a (pre-VoIP) telecom infrastructure that would occasionally suffer from server/PBX crashes; every hour off the air meant thousands of dollars of lost business. (And the problem largely rested with an inherited, but poorly laid out, internal wiring mesh/mess.) Many small-to-medium businesses are starting to use Skype as a business-critical resource and third parties are making significant investments in "Extras" to build on Skype for business. What assurance can be given to businesses that they will not encounter this problem again? Does Skype need to set up a premium business service that provides additional resources to prevent such an outage at least for businesses?
And, as an entity of a public company, eBay needs to reassure their shareholders (and analysts) that Skype remains a viable business for their investment. Perhaps a Skypecast or use of highspeedconferencing.com to do this (provided such a service meets SEC regulations for information dissemination)?
With no claim to a monopoly on questions that arise, these are the immediate ones that occur to me as I read the Skype explanation post.
Tags: Skype, Skype outage 2007, Skype 3.5, Windows Update, p2p architecture
Powered by Qumana