Ensuring network uptime

VERTEL

By Trevor Manning, Director of Operations
Monday, 21 October, 2019

For telecommunications networks, achieving a high-availability end-to-end service requires reliable building blocks, as well as a set of alternatives that do not share any common failure points.

For life- and mission-critical networks, such as hospitals and utilities, or for business-critical networks where ICT is a strategic enabler, ultra-high network uptime is essential. It’s well accepted that network redundancy is a crucial element in ensuring network uptime.

There are four key elements to consider in the network architecture: customer applications, the core network, access networks and interconnect networks.

Customer applications can be impacted by the local area network (LAN), on-premise infrastructure or the carrier’s network terminating unit (NTU). The core network is expected to provide 99.999% uptime, which equates to less than five minutes of downtime per year.

Access networks are less highly available but fully duplicated hardware can be used to protect against hardware failure. And, all carriers end up meeting in ultrahigh-availability data centres where switching between networks and applications takes place.

In reality, especially in regional and remote areas, carriers may share some parts of their backhaul with other carriers. This can mean that true redundancy doesn’t exist, potentially compromising the real availability of the network.

If physical infrastructure is shared, then any diversity will be rendered useless. For example, if dual fibres run through the same pit and that pit gets flooded, or the bundled cable is physically cut, then both services will experience an outage at the same time. Even if a dual-carrier strategy is deployed without media diversity, a joint outage may occur.

Realising close to 100% uptime is only achievable by using multiple carriers and multiple media solutions to eliminate any common failure points.

Organisations relying on critical networks should ask themselves five key questions:

Are there any common points of failure in the network?
Do redundancy paths share any common physical locations?
Are redundancy paths, even with multiple carriers, completely uncorrelated with the failure of the media being used?
Do protected services terminate in the same physical location?
Do protected services share a power supply?

Answering yes to any of these questions could pose a problem for uptime. Providing a life- or business-critical service requires careful consideration in terms of how true redundancy can be achieved. Protected services are only as good as the weakest element of the carrier’s network.

Even with all the electronics, paths and technologies duplicated, soft issues related to people and systems can bring the network down.

Carrier duplication with media diversity is the best form of network service diversity. If carefully constructed, true network and media redundancy can be achieved resulting in an ultrahigh-availability service for critical applications.

There are four key levels at which redundancy should be considered:

Hardware. At the hardware level, all electronic equipment should be fully protected with automatic service restoration.
Path. In microwave networks, at the path level, shallow ducting can cause multipath outages, so it’s important to choose a provider that uses techniques to guarantee protection against multipath outages. Heavy rain can also affect bands above 10 GHz, so links should be individually designed according to the rain conditions of that area.
Network. At the network level, a ring or mesh topology should be used for the core.
Service. At the service level, it’s important to diversify both carrier and media to avoid failure.

By ensuring redundancy at these levels, uptime can be improved and mission- and life-critical services can operate with more certainty. However, economic considerations can limit the amount of physical redundancy a business can afford.

A cost-effective risk mitigation is to use software-defined wide area networking (SD-WAN) to prioritise critical traffic over multiple physical paths. This can help overcome issues around redundancy by choosing the clearest path to ensure traffic can always find a way through.

Legacy WAN networks don’t include this functionality but businesses don’t need to replace existing systems to gain the benefits of SD-WAN. Instead, they can overlay SD-WAN onto existing networks to achieve the required redundancy and reliability without incurring excessive costs or disruption.

Image credit: stock.adobe.com/au/vladimircaribb