0
As companies become more global, with employees, customers, partners, and suppliers spread across every time zone, it becomes increasingly difficult to schedule times when no one is affected.
Imagine the following: You work in the Infrastructure and
Operations (I&O) department of a large retailer with a significant online
e-commerce presence, and at noon today, a critical component of your
infrastructure failed. While you scramble to find a solution, your company's
Website that brings in tens of thousands of dollars a day is greeting all of
your potential customers with an error message and the social networks are
starting to buzz. But to make matters worse, today is not a normal day -- it's
one of the highest volume days of the year.
This nightmarish scenario is an extreme example of
downtime occurring at the worst possible moment for a business. But the truth
of the matter is that there is never a good time for downtime, even planned
downtime. As more and more employees become mobile and working- from-home
policies relax, the concept of the 9 to 5 workday has eroded. Furthermore, as
companies become more global, with employees, customers, partners, and
suppliers spread across every time zone, it becomes increasingly difficult to
schedule times when no one is affected.
Compounding the need for always-available services is the
additional fact that your customers are rarely contained between the four walls
of your organization. Today's IT departments are responsible for supporting two
unique sets of customers with different sets of needs: internal employees and
external customers, partners, and suppliers. But these constituencies are more
similar than you think: Just as your internal employees increasingly expect to
perform their jobs anytime and anywhere, your external stakeholders share the
same expectations in their ability to purchase, receive support, or access your
data and systems. Forrester refers to this concept as the extended enterprise because
a business function is rarely, if ever, a self-contained workflow within the
infrastructure confines of the company.
There is no "easy button" when it comes to
running always-on, always-available services; a blend of a mature and stable
process, people, and, of course, technologies are required. For companies that
have matured their approach to high availability and disaster recovery to the
point where they are one and the same -- a concept that Forrester refers to as
business technology resiliency -- it has taken years of refining policies,
adapting responses to downtime, and securing the appropriate levels of
investment.
While you can't transform your organization overnight into
an always-on, always-available enterprise, these
three initial steps will get you on the right path:
Step 1:
Understand the Costs of Downtime of Critical Services
Securing investment in the capabilities required to run an
always-on, always-available enterprise can be difficult, especially if you
don't know your hourly cost of downtime. Because it is such a complex task,
Forrester finds that the majority of companies have not calculated the cost of
downtime for their critical services. Although trying to calculate the impact
of an outage on reputation and customer retention can be a daunting task, just
calculating revenue losses or productivity losses can be a worthwhile exercise.
Remember that not all outages are created equal: Timing
and duration have a significant impact on the costs of downtime. In the
original example, the outage was perfectly timed to impact the largest number
of potential customers and thus have the largest business impact. What if this
outage occurred at 3 a.m. ET instead of noon ET? Or what if it happened on a
different day? Or, what if, instead of the Website being down for 4 hours
straight on a single day, it was down for 30 minutes on eight different days?
Shorter duration outages tend to be less disruptive than longer ones. All of
this must be taken into account when calculating the impact of an outage.
Don't try and tackle the entire infrastructure all at
once; break down your calculations on a service-by-service basis, starting with
the most critical business services. Understanding the costs of downtime will
guide the appropriate level of investment in downtime prevention for these
services.
Step 2:
Focus Availability on the End-to-End Service, Not on Infrastructure Components
Many companies rigorously track server uptime and storage
uptime, but few succeed in tracking a single service's uptime end to end,
meaning from every infrastructure and software component that works together to
deliver a single service. This, however, is the single most important thing
that an IT department can track because it is the metric that gets closest to
the actual customer experience. This is critical in "the age of the
customer" where businesses compete and differentiate themselves on the
experience of IT-enabled business processes and transactions more than ever.
Step 3:
Match Business Objectives to the Right Mix of Technologies
Once you've calculated your cost of downtime and shifted
your focus to end-to-end availability, the next step is to select the right
technologies to support your critical services. While there are many
technologies that can support the always-on, always available extended
enterprise -- such as active-active architectures, rapid virtual machine
rebooting, application and service monitoring solutions, or cloud-based
services, the difficult part is finding an approach that simultaneously
supports your availability objectives and also matches what the business is
willing to pay to protect critical service. Many enterprises find it useful to
group services or applications into tiers of criticality and assign standard
recovery time objectives (RTOs) and recovery point objectives (RPOs) as well as
service-level agreements (SLAs) for availability. Organizations can then map
appropriate technologies to the tiers of criticality using the business
requirement.
100 Percent Uptime Is Virtually Impossible
In the end, the goal of the always-on, always-available
enterprise is not 100 percent uptime; rather it is 100 percent service
continuity for your most critical services. While there are many companies that
have gotten very close, sustaining true 100 percent uptime for any extended
period of time is virtually impossible -- there are too many things that can go
wrong, from the infrastructure to the applications to natural disasters, human
error, or even planned maintenance.
Since some downtime is inevitable, it's important for you to shift your attitude from reacting to downtime toward proactive planning, good processes, and preventive efforts. You may not be able to achieve 100 percent uptime, but you can at least strive to make services available when your customers most need them and have rapid response measures in place to make sure services are brought back online as quickly as possible.