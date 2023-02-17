Introduction

Enterprise Disaster Recovery strategy has shifted in terms of how to meet the demand and availability of an information technology (IT) infrastructure. As more and more companies are adopting cloud, chief information officers (CIOs) and IT leaders are looking at cloud regional deployment as a way of achieving their disaster recovery (DR) requirements and meeting their recovery service level agreements (SLAs).

When creating a DR strategy, careful consideration must be given to ensure business continuity. An IT organization must evaluate their recovery time objective (RTO) and recovery point objective (RPO) requirements and then ensure that the cloud architecture can match these requirements across all disaster types within the required recovery SLAs of the organization.

This paper evaluates key considerations that an organization must make when building an IT/application architecture on public cloud and debunks some common myths surrounding cloud availability and the SLAs provided by cloud providers, enabling organizations to understand some of the key issues that are often missed in all of the marketing speak.

Rising Cloud Adoption

There is no denying that cloud adoption is on the rise. Worldwide end-user spending on public cloud services is forecasted to grow 20.7 percent to $591.8 billion in the current year, up from $490.3 billion in 2022, according to the latest forecast from Gartner, Inc.1 It is estimated that more than 85 percent of organizations will embrace cloud-first principle by 2025. Cloud offers a way to scale fast, reduce costs, increase productivity, and reduce operations and management overhead. While moving to cloud is practically inevitable, it will necessitate companies rethink their disaster recovery approach and finetune their IT architecture to handle outages.

Cloud Myths Debunked

The trend of cloud adoption is rapidly growing due to the many advantages that cloud computing offers over traditional on-premises approaches. However, not all clouds are created equal and not every cloud has a silver lining.

"My IT deployment is on the public cloud so I don't need disaster recovery."

Moving to the cloud is not a panacea. If there are concerns about data integrity and availability risks, moving to the cloud will not instantly mitigate them. The reality is that no cloud is immune to downtime, and most organizations are less protected than they think.

As part of a cloud evaluation process, it is essential to deploy a parallel DR solution that meets an organization's RTP/RPO needs. Otherwise, the organization is responsible for any issues that may arise - not the cloud service provider (CSP).

CSPs do not typically include resilient DR as part of their stack. Instead, they recommend that organizations implement out-of-region DR for any business-critical applications. This is very important and often lost in glossy marketing speak. Many organizations that are already running their IT infrastructure on cloud are blissfully unaware of how vulnerable they really are, specifically if their business demands 24/7 availability to service their end customers. This leads to the next problematic statement.

"My IT is safe because I have regional high availability (multi-AZ deployments)."

Most of the CSPs highlight features like "in-region high availability" as a solution to most disaster recovery requirements. While in-region high availability can offer a good degree of resilience, the protection it offers is far from robust. So much so, that if you dig into the small print, most of the CSPs recommend an out-of-region standby environment for business-critical infrastructure and applications to meet RTO/RPO requirements.

Why? The truth is that big is not always better, and it is not uncommon for an entire region to go down rendering in-region high availability null and void.

Even instances where only a particular service goes offline within the region can impact application availability.

For a recent example, one only needs to look to Microsoft Azure Cloud's lengthy June 2022 outage. For 12 hours, customers had trouble connecting to US East 2 region. The reason provided was "an unplanned power oscillation in one of our datacenters within one of our availability zones in the East US 2 region," according to a Microsoft report.2

Other recent examples of public cloud outages impacting complete regions and bringing down many services are listed below.

Footnotes

1. Gartner, Inflationary Pressures Creating a Push and Pull Effect for Cloud Spending, 2022

2. CRN, The 10 Biggest Cloud Outages Of 2022 (So Far), 2022

