How to Plan Disaster Recovery on Oracle Cloud

May 27, 2021

Key considerations

Identifying the right DR strategy for your business-critical apps

  • Multi region DR
  • Application replication between regions
  • Storage replication between regions
  • Database protection between regions

On-Prem to Oracle Cloud Infrastructure DR

  • Establish DR site to OCI
  • Switch over primary to proven OCI

The primary objectives for the following architectures are to ensure you can build disaster recovery (DR) into your deployment in case of unforeseen events which would require you to failover and still keep E-Business Suite up and running.

Outcomes these architectures can provide:

DR within a single region

  • Active-Active components across ADs
  • Active-Passive components across ADs
  • Regional subnets across ADs
  • Load-balancing across ADs
  • Storage synchronization across ADs
  • Database DR across ADs

DR across multiple regions

  • Application replication between regions
  • Storage replication between regions
  • Cross-region copy lets you asynchronously copy object storage datasets
  • Cross-region backup copy for block volumes
  • Database protection between regions

Identifying the Right DR strategy for Business-Critical Apps

To actually select a solution, you can focus just on the parameters of data loss and downtime.

The first thing to do is to think about the two extremes: can my application tolerate hours or days of lost data, AND uncertain recovery time (hours or days, at least)? If so, then you just need basic backup to the cloud. Probably every application in your environment needs at least this basic level of protection.

Otherwise, if you can tolerate all that data loss and downtime, why not just turn off that system now, and save the cost?

Next, lets go to the other extreme. Does your application need to be back on-line after a site-outage in < 30 minutes (including the decision time?). Maybe you need something close to Zero downtime? If you need that, or you need < than a few seconds of data loss – basically zero data loss after a site-wide outage, Then you want the Active / Active solution. Of course this comes with more cost & effort, but if you need it, Oracle can deliver it.

Most applications fall into the middle ground – they are critical enough to deserve some protection, but possibly not quite Zero downtime / Zero data loss. These fall into the middle, where we measure data loss in seconds, and we can ask one more question: How much downtime can you accept? If you need to ensure that you are back on line in less than 4 hours, you want an Active / Standby solution. If you can tolerate something in the 4-24 recovery time, you can use a Pilot Light solution.

Why Oracle Cloud for Disaster Recovery

When we actually build the DR solution, there is a range of price / performance trade-offs, and we can provide solutions across that whole range.

Mostly what you are thinking about are Data Loss – also called Recovery Point Objective or RPO, and Down time – also called Recovery Time Objective or RTO. As always, these two performance metrics are balanced by cost and complexity.

a) At the entry level, we can just backup the data to the cloud. This gets the data and application configuration off-site, so at least we have a starting place to recover from. This is a really basic offer because it means that you will loose all data up to the last backup – so maybe 24 hours of data loss, and it will take time to recover systems from backup plus any required time to reconfigure them to run in the cloud. This is what we would call a minimum effort – no system should be without at least this level of protection, but most systems will need something better.

b) Next we have the Pilot Light. In this solution we upgrade our database protection to real-time replication, which brings RPO or data loss down to just a few seconds. But we still use the backup and recovery strategy for our application servers, so we have that long recovery time while we restore and reconfigure servers. This solution is good when you want to minimize data loss and keep your costs down, and can tolerate a relatively long downtime – like 24+ hours.

c) The next step up is to configure some stand-by servers that match our application tiers on-premises. Now when we need to fail over, instead of waiting for that long restore time, we have everything ready to go, and just need to switch it into production. This brings our recovery time down to minutes to go along with our already low data loss.

d) At the highest level, we can build an Active / Active solution that gives you Zero downtime and Zero data loss even in the face of a regional disaster. Not every application needs this level of protection, but if the value of a few minutes of downtime or a few seconds of data loss is high enough, we can deliver a solution here.

e) Most customers will select something here, depending on their tolerance for down time when recovering from a disaster.

f) And of course, we recommend that every application get at least this level of protection. Without this basic protection, you are really signing up for unlimited data loss and unlimited down time following a regional disaster. Very few people will knowingly sign up for that.

Notable point : There are a range of protection options that trade off performance and cost. Active / Passive lets you recover with very low down time and data loss. Backup is the minimum for all applications.

Disaster Recovery Across Multiple Regions

You can achieve true DR across multiple region in the unlikely event that one region goes down. This reference architecture covers the most robust case with clustering of supported services across ADs within the primary region, but DR can be achieved across regions with single AD. This is important to note as most of the new OCI regions launching will be single AD regions.

Active-active components across ADs: Clustering of supported services across ADs provides protection from an AD failure.

Active-passive components across regions: If you are using active-passive to synchronize application servers across ADs, use rsync.

VCN peering across regions: VCNs can connect between regions within a tenancy or even between tenancies. Connectivity is done using Oracle’s internal backbone between regions.

Storage synchronization across AD: Block volume backups between regions can be done using the console, CLI, SDKs, or REST APIs. Copying block volume backups to another region at regular intervals makes it easier to rebuild applications and data in the destination region if a region-wide disaster occurs in the source region. You can also easily migrate and expand applications to another region. With Object Storage cross-region copy, data asynchronously copies objects between buckets in the same region or to buckets in other regions.

Database DR across ADs: The use of either Data Guard or Active Data Guard is dependent on your use case and database edition. Active Data Guard requires Enterprise Edition – Extreme Performance.

Disaster Recovery: On-prem to Oracle Cloud Infrastructure

1. Replicate production environment to OCI

2. Set sync policy

  • Configurable policy: hourly, daily,
  • weekly or per defined schedule
  • Multiple policies can be configured and applied
  • Automatic sync and alerts

3. Provisioning options

  • Pre-provision VMs (hot standby)
  • Dynamically provision VMs -sync to storage (low cost)

IT Convergence can provide a complete Disaster Recovery and Backup Platform that extends across physical and virtual environments. Multiple RPO/RTO options give enterprises control over availability vs. cost, ensuring critical applications recover quickly and secondary apps do so in the most cost-effective timeframe.

So let’s take a look at how it works. On the left you see your on-prem estate, including your Oracle EBS applications. On the right you see OCI. Helping us to connect these together is middleware, which leverages automation software to run its own small server in Oracle Cloud and knows how to set up, operate, and manage disaster recovery between your on prem data center and Oracle Cloud Infrastructure.

It detect the config on your production servers, recreate them in the cloud, migrate all the DB, files, etc. to the cloud and continuously update it. Then, either on demand or after a failure you can restart your E-Business Suite applications in the cloud and let your users reconnect so that you’re back in business with minimum disruption.

Customers can also choose from a list certified Cloud MSE’s who will manage the entire process on your behalf. Certified Oracle cloud MSE’s have proven expertise, tools and processes to build, deploy, run, and manage Oracle and non-Oracle workloads on Oracle Cloud Platform all under a single contract and a single point of contact.

These cloud MSE’s can have the environment up and running with a full DR test of the EBS environment running in OCI in a minimum specified time (30-45 days*). For a leading US eye wear company we setup a multi-region, 3 disaster recovery scenario in less than 6 weeks setup. You can read the case study here.

Once OCI is validated as a viable high performance DR site, switchover primary to OCI and set up a second availability domain for DR to OCI as discussed previously

Other Key Considerations for Designing an Oracle Cloud Architecture

Several architecture configurations are available to match your current on-premises design. With the Oracle E-Business Suite Lift and Shift capability, moving from on-premises, operational risk has been reduced with improved success on all migrations. Oracle Cloud Infrastructure is also very cost-effective for rapid deployment and removal of test and Quality Assurance environments.

Below are the other considerations to learn before proceeding with your EBS to OCI Migration.

Fundamentals of a Successful EBS Migration

Applying Best practices and Reference Architectures Across OCI Solutions

The importance of disaster recovery plans from expert consultants

Disaster recovery is an essential aspect of any business strategy, especially when it comes to cloud migration. While cloud providers offer some level of disaster recovery solutions, they may not be sufficient for all businesses. Moreover, even with the best disaster recovery plan in place, unexpected disasters can still occur, leading to data loss, downtime, and other costly consequences.

This is where disaster recovery services from expert consultants come into play. These consultants can help businesses develop comprehensive disaster recovery plans that are tailored to their specific needs and provide the expertise needed to navigate any challenges that arise.

Here are some of the key benefits of working with disaster recovery consultants:

Customized disaster recovery plans: Disaster recovery consultants can help businesses develop customized disaster recovery plans that are tailored to their specific needs. These plans take into account the unique risks and challenges facing the business and provide a roadmap for mitigating those risks.

Expertise and knowledge: Disaster recovery consultants have the expertise and knowledge needed to navigate the complexities of disaster recovery. They understand the different types of disasters that can occur and how to prepare for them, as well as the best practices for minimizing downtime and data loss.

Testing and validation: Disaster recovery consultants can help businesses test and validate their disaster recovery plans to ensure they are effective. Regular testing is essential to ensure that the plan is up-to-date and that all systems and procedures are functioning correctly.

Rapid response: In the event of a disaster, disaster recovery consultants can provide rapid response services to help businesses get back up and running as quickly as possible. They can help businesses identify the root cause of the disaster, implement recovery procedures, and restore data and systems.

Cost savings: While disaster recovery services may seem like an additional expense, they can actually save businesses money in the long run. A well-designed disaster recovery plan can minimize downtime and data loss, reducing the impact on the business and its customers.

Conclusion

Creating a DR for mission critical applications will have an involvement of key several stakeholders with specialized skillsets. It is highly recommended to leverage certified Oracle Cloud MSE Partners to ensure you avoid any roadblocks in your cloud migration journey and avoid inflating your cloud migration costs and timelines.

Talk to Our Migration Experts

Subscribe to our blog