The following is a report on the voice (phone) services outage experienced on the 6th November 2018 between approximately 9:00 and 15:00.
The lost voice services were caused by an upstream service provider outage beyond our control. The upstream provider is essential to our voice services delivery.
The upstream provider's issues were the result of a cyber attack that came in two waves during the day. Unfortunately this attack had more impact that it should due to a small misconfiguration in the upstream providers South Island datacentre. The initial DDoS attack was quickly mitigated but resulted in a saturated link that caused corruption of the voice services databases. These databases had to be restored from backup which took some time. Unfortunately the restructuring of the network had not yet occurred to prevent this happening again, which it did after the second wave of the DDoS attack. This put the upstream provider back to square one thereby delaying the time to service restoration.
A Distributed Denial-of-Service (DDoS) attack is a targeted attack, typically against a particular organisation or network segment, where the attacker bombards their target with an enormous amount of data from many sources. The data is meaningless and the goal is to overload the targets systems to prevent them from working, not to gain unauthorised access to systems. It is basically a mischevious attack with tremendous consequences. Examples of this often involve people's home computers being used in these attacks, unknown to the owner, but under the control of the attacker due to a virus infection. The attacker uses this technique to maximise the resources at their disposal, maximise the amount of data they can send simultaneously, and remain anonymous.
A DDoS attack isn't particularly unusual or rare and everyone connected to the internet is vulnerable to this type of attack. It should be expected but cannot be easily protected against without at least some downtime. This downtime would have been limited to around 10 minutes while measures were taken to redirect the bogus traffic. Unfortunately, an insfrastructure implementation error by the upstream provider resulted in a link becoming overloaded that shouldn't have, which resulted in massive network traffic congestion concluding in database corruption... twice in one day.
The infrastructure configuration error that exasberated the cyber attack consequences has now been corrected and thus should not happen again.
Nevertheless, we are somewhat frustrated with this outage and will be taking steps to mitigate our vulnerabilities.
We most sincerely apologise for the inconvenience this issue has caused to all!
|