Focused capacity management has 'avoided 20 major incidents' at BSkyB
Cost implications 'huge' for IT managers who don't prioritise capacity, says broadcaster
A good IT capacity management policy has "avoided 20 major incidents" at BSkyB in recent months, according to the company's head of capacity, Alan Collier.
"It stops cock-ups," said Collier. "It's not very glamorous and doesn't have shiny lights on it, but in focusing on this we've recently avoided about 20 major incidents on billing systems, and on frontline customer-facing high profile products.
"We're spotting potential massive failures and major incidents, and stopping them well in advance. We're also saving a fortune by going into procurement and purchasing well in advance. We can know six months in advance how much storage we need to buy, what type, and where, which makes a huge difference to the quality of deals we can do with our purchasing."
Collier believes far too many organisations still neglect this area of IT management.
"I think that [failing to address] medium-term capacity decision making, and not understanding the capacity of your services, is really common," said Collier.
"It falls through the gaps. People constantly come up against the problems of not making those decisions.
"They'll be running out of storage or capacity. It's very common in the press for big websites that launch without enough capacity, or which can't grow fast enough."
Cloud's status as "the new big thing", said Collier, makes these kinds of problems even worse.
"Everyone's throwing their services at clouds, assuming those clouds have infinite capacity," said Collier.
"Whether you throw it into a public cloud and presume it's Amazon's problem to sort out capacity there, or you build your own, it's all just another set of infrastructure. You may be provisioning to it well, and managing it in a fast and dynamic way, but ultimately you've got to make the same decisions about your private cloud as you do about any other set of infrastructure."
But ultimately, said Collier, "people are rushing headlong into cloud assuming that everything will take care of itself in terms of capacity."
When Collier joined BSkyB two years ago, the firm's capacity management team had still not addressed "some basic problems".
"Sky had had a capacity management function for about four or five years, but they've never really got off the ground or achieved anything really useful, I think would be a polite way of putting it," said Collier.
"It's a problem of scale, and that's why doing it at Sky is a challenge," said Collier. "The IT scale here is huge; we're getting on to five petabytes of storage, we're getting on for 10,000 or more virtual machines, and between 400 and 500 different IT services, and that's just the ones you pretend to know about."
Focused capacity management has 'avoided 20 major incidents' at BSkyB
Cost implications 'huge' for IT managers who don't prioritise capacity, says broadcaster
To supplement 10 health monitoring tools Sky was using from "loads of random providers", Collier implemented BMC's Capacity Optimization (BCO) software. One of the reasons Collier was attracted to the solution was that it does not require agents to be installed throughout the IT estate. With 9,000 virtual machines and rising (Sky consumes VMs "like candy", said Collier), 12,000 servers and 600GB of stored data per month, an agent-based solution would have been "an absolute nightmare" to roll out, Collier said.
"The BCO tool allows us to pull all the information from the back of the health monitoring tools. Initially you've got about 10 different health tools feeding into the BCO tool, and it puts all that in a database and then we can analyse that with a view to looking at long-term trends."
Collier believes the BCO software will help his team to impress on BSkyB's business managers the importance of far-reaching capacity planning.
"That's the icing on the cake, if you can start having that conversation with the business, in terms they can understand. So in our case rather than talking in terms of megabytes or gigabytes we talk in terms of number of customers, or number of TV channels with active users – that's a much more mature conversation you can have with a business ,and that's our ultimate goal really."
Collier said that, ultimately, companies need to start realising that "if they don't get their capacity management right, the cost implications are huge, and the reputational damage from having your services go down, having to buy everything late and in a panic gives a bad impression of the IT department generally".