Poor coding and GUI caused America's largest-ever telephone outage
Poorly coded critical system helped Level 3 to accidentally blacklist itself
The largest-ever telephone outage in the US was caused by poor coding and an equally poor graphical user interface (GUI), according to the Federal Communications Commission's (FCC) official investigation into the incident.
In October 2016, telecoms firm Level 3 was hit by an outage that prevented its customers from making and receiving calls for 84 minutes.
Affecting customers across the US, the outage resulted in around 111 million calls being blocked. The FCC report indicates that the situation was largely caused by poor coding in the company's call blacklisting system, combined with an unintuitive GUI.
The technician was unaware of the consequences of leaving a field in the network management software blank
It occurred as a result of a company employee entering numbers into the company's network management software for blocking rogue callers.
However, they failed to enter the numbers into a particular field - leaving them blank instead of making a duplicate entry, which the technician didn't think needed to be filled.
Instead of ignoring this, though, the software interpreted the empty field as a wildcard that effectively included all of the company's numbers.
Once activated, even calls to emergency services were blocked.
Overall, an estimated 2.9 million voice-over-IP customers and 2.3 million wireless users were caught up in the outage. Out of 111 million blocked calls, 109 million of them were internet-based.
Branding it an outage with "nationwide impact", the FCC described it as the largest outage to take place in the US that it has ever seen.
Defending Level 3, the FCC said that the company identified the incident within four minutes and began implementing emergency systems to mitigate the worst effects of the outage.
"The technician was unaware of the consequences of leaving a field in the network management software blank," it explained.
"Level 3 personnel had not previously observed or experienced this behaviour in their network management software. According to Level 3, this was the first time that anti-fraud operations in network equipment caused an outage."
According to the FCC report, Level 3 was using "vendor-supplied network management software" at the time. But it has since implemented a provisioning system to ensure that such a fault doesn't happen again.