News Stay informed about the latest enterprise technology news and product updates.

How IT managers can win the SLA game

With today's high-availability software, IT managers can embrace SLAs. How? By removing both planned and unplanned downtime, which can affect SLA compliance.

Many IT managers run the other way when they hear the words "service-level agreement." Service-level agreements (SLAs) document IT availability and accessibility requirements. Because they can also include penalties for non-compliance, they act in many ways like a guarantee. That can intimidate the IT organization with limited resources and staff.

With today's robust, cost-effective high-availability software solutions, however, an IT manager can embrace SLAs with complete confidence -- even with a small staff, limited time and tight budget. How? By removing both planned and unplanned downtime obstacles SLA requirements for data, applications and systems availability.

SLAs: IT availability on the line
A typical SLA defines the services to be provided, the roles and responsibilities for how each service will be supported, quantifiable goals, criteria to assess performance, problem resolution procedures, any incentives or penalties, and the costs. From an IT perspective, an SLA will usually define the level of systems, applications and data availability and resiliency required to support business initiatives. In this way, the SLA acts as an excellent communications tool by aligning IT more closely with business strategies and providing an objective way to measure IT effectiveness and value.

For example, a customer may demand that your company guarantee specific delivery times in order to earn its business. To make that happen, business-side executives involved in transacting that agreement will turn to the IT department to identify and clarify how any disruptions in IT services can impact production and logistics processes and on-time deliveries. The IT manager quantifies the risks and costs of IT downtime (including planned downtime and unplanned outages) to the business and develops availability and resiliency strategies to eliminate those risks.

Typically you will determine what to measure and how to measure it after investigating the business needs. For example, an external customer needs just-in-time delivery. The SLA discussions might then center on exactly what that means, breaking down the steps involved in production, logistics and delivery. One step may be the unimpeded exchange of electronic data between the customer's applications and your warehouse and factory floor from 6 a.m. to 11 p.m. six days a week. Therefore, the IT team must specify which applications, networks and servers will need to be available to provide that service, how that performance can be supported, and how it will be measured or audited to demonstrate compliance.

Downtime: Barrier to SLA compliance
Downtime -- whether unplanned or planned -- can undermine even the most carefully crafted SLA. Traditional high-availability solutions eliminate (or almost significantly minimize) the effects of unplanned outages and events on your ability to deliver seamless IT availability to users, customers and business partners.

Eliminating planned downtime -- the scourge of every IT department -- poses a much more difficult challenge, however. After all, data backups, application upgrades and system maintenance need to be performed. In a typical IT shop, that means bringing production systems down for some length of time -- perhaps several hours a day or week. That can severely compromise the ability to deliver on a stringent service-level requirement.

Active/active high-availability solutions
With an information availability solution, however, your IT department can eliminate the planned downtime on availability. First, high-availability solutions provide the assurance of a synchronized backup server -- the same backup server traditionally viewed as an "insurance policy" against unplanned events. Second, by using the power of that backup server as an active resource for data backups, applications upgrades or server maintenance production systems can remain operational and users, customers and business partners remain unaffected.

This active/active strategy is called switching or role swapping. To be successful, both production and backup systems must be in sync at all times. The IT manager must also be confident in that synchronization process -- confident enough to switch at will.

Through real-time, change-based replication, a high-availability solution ensures that only changed data is replicated to the backup server and that replication occurs in real-time. That provides instant, automated switchover/failover capability. Applications and data remain accessible without interruption, enabling you to hit SLA targets while the IT team performs an application upgrade or runs reports.

For example, a mid-sized manufacturer needs four to six hours of planned downtime per week to back up a growing volume of data to tape, run reports and perform application and system maintenance. Because of this weekly downtime, however, it is unable to meet the stringent 23/7 availability standards required by a new customer. With an active/active high-availability solution, however, the manufacturer would be able to eliminate weekly planned downtime and perform its backups, reporting and maintenance tasks without affecting supply-chain, production floor, logistics or warehouse applications. It also has protection against unplanned downtime.

A switch-ready backup: Can you be sure?
Most high-availability solutions must include extensive levels of automation to ensure that both production servers and backup servers remain in sync. To be confident in switching, however, requires the assurance provided by demonstrable automatic self-healing and self-correcting features. (IBM calls this category of advanced systems capability autonomics.)

These autonomic features work behind the scenes, constantly assessing the replication process and automatically solving any synchronization issues. Along the way, these smart functions also help improve the efficiency of the production server by solving recurring data or object issues.

To enable you to meet SLA-mandated IT requirements, a high-availability software solution must allow you to switch systems on demand as well as do the following:

  • Provide complete application protection and zero data loss.
  • Provide a complete backup production system with a known-current dataset that can immediately take over.
  • Respond within minutes to unexpected system or server outages.
  • Include extensive, built-in autonomics to continuously monitor replication and self-heal out-of-sync or other situations to ensure the reliability of the backup replica.
  • Provide extensive auditing and reporting capability to document the viability of the backup replica.
  • Include automatic monitoring to ensure that your environment is meeting minimum replication latency.
  • Automatically scale and adapt to any increase in transaction volumes, users, business or change in your IT infrastructure.

In addition, once you have established the speed limits for your availability levels -- the recovery point objective (RPO) and recovery time objective (RTO) -- you need to be able to measure whether you are achieving these on an ongoing basis. That means creating the monitoring tools on your own, a task that requires both time and specialized expertise, or selecting a high-availability solution that already incorporates those capabilities and can demonstrate their effectiveness.

Face the SLA future with confidence
You may not have the option whether to embrace SLAs or not. If you don't already use them, you'll probably come under pressure to do so in the near future. In some industries, such as manufacturing, many larger companies will no longer work with small or mid-sized suppliers without an SLA. Fortunately, by eliminating the barriers of planned and unplanned downtime, high-availability solutions can help you deliver business-winning value.

About the author:
Chris Bartley is a solution manager at Lakeview Technology, as well as the managing director of the Information Availability Institute. Chris has more than 12 years' experience in designing, deploying and supporting information availability solutions, as well as developing and teaching numerous managed availability courses. Chris can be reached at

Dig Deeper on Business Continuity

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.