Rogers is downed by a config error. The company promises to invest billions in reliability
- Post by: Irjar Jira
- July 25, 2022
- Comments off
Canadian telecom giant Rogers will spend C$10 billion ($7.7 billion) to ensure that day-long outage earlier this month doesn’t happen again, its CEO has said.
We also discovered that Rogers believed that the IT failure was caused by an early morning configuration change that blocked traffic to the ISP’s central routers. This made them inoperable for both wired and wireless customers.
Tony Staffieri made Sunday’s spending pledge in a letter that was shared via Rogers. He described a four-point plan for “enhanced reliability” that he hopes will help customers “restore their confidence in Rogers, and win back [their] trust.” The ISP offers cellular, cable, and broadband internet services in Canada, and has roughly 10 million wireless subscribers alone.
Rogers is enhancing its offering by “physically separating our wireless and internet services in order to create an always-on’ network.” Staffieri stated that the move would protect broadband internet customers in the event of a network outage. Staffieri stated that if the core of the cellular network goes down, it will not take out wired internet connectivity and, possibly, vice versa.
Canadian ISP Rogers falls over for hours, takes out broadband, cable, cellphones
Staffieri also stated that Rogers has partnered with unnamed “leading tech firms” to conduct a thorough review of its network. Rogers stated that the report would be available to all Canadians in the wireless industry. “
Rogers will make a multi-billion dollar investment over the next three-years. Staffieri stated that additional oversight, testing and “greater use” of artificial intelligence are all possible to improve service. Staffieri did not elaborate and Rogers spokespersons were unable to provide further information.
Outside of its own operations, Rogers said it is taking steps to ensure 911 call centers, which Rogers subscribers were unable to reach during the outage, remain accessible in future during any network downtime. “We have made meaningful progress on a formal agreement between carriers to switch 911 calls to each other’s networks automatically,” Staffieri said.
Canadian officials aren’t happy
On July 9, Rogers issued a memo from Staffieri. It stated that the outage was largely over and that it was due to “a network failure following a maintenance upgrade in our core network which caused some routers to malfunction early Friday morning.”
That didn’t satisfy Canada’s communications watchdog, which in a letter to Rogers on July 12 said the outage bore striking similarities – and justifications – to a screw-up in April 2021 that knocked Rogers’ services offline.
“Rogers has publicly attributed the cause of this [July 2022] service outage to a maintenance upgrade in its core network. This is reminiscent of another significant network outage in April 2021 that Rogers similarly attributed to a software update,” said Fiona Gilfillan, an executive director at the Canadian Radio-television and Telecommunications Commission (CRTC).
In Gilfillan’s letter, Gilfillan stated that the CRTC needed “comprehensive” information about the events leading up to the outage as well as what transpired during and after. Also, the plans of the provider to prevent another IT failure.
Rogers responded [.DOCX] to the CRTC last Friday. Redacted copy is available on the CRTC website.
However, there were new details that were made public. For example, Rogers admitted that a configuration change was made to Rogers’ routers which allowed for an excessive amount of internet traffic through them. We were told that this caused the core devices of the ISP to fail.
This sounds suspiciously like a BGP error. This is the relevant section:
” While every effort was made in order to prevent and limit outages, the consequences of the coding changes affected the network very rapidly,” the response stated.
After identifying the problem, the team “began the process to restart all the internet gateways, core, and distribution routers in order to establish connectivity to our wireless networks (including 9-1-1), enterprise, and cable networks that deliver voice, data, and video connectivity to our customers. The service was gradually restored, beginning in the afternoon and continuing throughout the evening.
” Although Rogers experienced some stability issues over the weekend, which did affect some customers, the network was effectively back up by Friday night. “
We were also told that the configuration update was the sixth stage of a seven-part maintenance project that had been ongoing for weeks. Although the update was done early in the morning, it caused minimal disruption. In this instance, however, the entire ISP was affected.
“At 4: 43AM EDT, a specific coding was introduced in our distribution routers which triggered the failure of the Rogers IP core network starting at 4: 45AM,” the note added. (r)
Rogers’ radio stations were also affected by the outage. Some stations were out of service for several hours, while others were down for a few minutes. Some had to resort to impromptu solutions in order to resume transmission. CHST-FM, for example, was cut off from the airwaves throughout the morning.
” Evergreen programming was broadcast from the base of the transmitter, until the engineering team was able establish a connection using an alternate Internet connection.