Key considerations
Identifying the right DR strategy for your business-critical apps
- Multi region DR
- Application replication between regions
- Storage replication between regions
- Database protection between regions
On-Prem to Oracle Cloud Infrastructure DR
- Establish DR site to OCI
- Switch over primary to proven OCI
The primary objectives for the following architectures are to ensure you can build disaster recovery (DR) into your deployment in case of unforeseen events which would require you to failover and still keep E-Business Suite up and running.
Outcomes these architectures can provide:
DR within a single region
- Active-Active components across ADs
- Active-Passive components across ADs
- Regional subnets across ADs
- Load-balancing across ADs
- Storage synchronization across ADs
- Database DR across ADs
DR across multiple regions
- Application replication between regions
- Storage replication between regions
- Cross-region copy lets you asynchronously copy object storage datasets
- Cross-region backup copy for block volumes
- Database protection between regions
Identifying the Right DR strategy for Business-Critical Apps
To actually select a solution, you can focus just on the parameters of data loss and downtime.
The first thing to do is to think about the two extremes: can my application tolerate hours or days of lost data, AND uncertain recovery time (hours or days, at least)? If so, then you just need basic backup to the cloud. Probably every application in your environment needs at least this basic level of protection.
Otherwise, if you can tolerate all that data loss and downtime, why not just turn off that system now, and save the cost?
Next, lets go to the other extreme. Does your application need to be back on-line after a site-outage in < 30 minutes (including the decision time?). Maybe you need something close to Zero downtime? If you need that, or you need < than a few seconds of data loss – basically zero data loss after a site-wide outage, Then you want the Active / Active solution. Of course this comes with more cost & effort, but if you need it, Oracle can deliver it.
Most applications fall into the middle ground – they are critical enough to deserve some protection, but possibly not quite Zero downtime / Zero data loss. These fall into the middle, where we measure data loss in seconds, and we can ask one more question: How much downtime can you accept? If you need to ensure that you are back on line in less than 4 hours, you want an Active / Standby solution. If you can tolerate something in the 4-24 recovery time, you can use a Pilot Light solution.
Why Oracle Cloud for Disaster Recovery
When we actually build the DR solution, there is a range of price / performance trade-offs, and we can provide solutions across that whole range.
Mostly what you are thinking about are Data Loss – also called Recovery Point Objective or RPO, and Down time – also called Recovery Time Objective or RTO. As always, these two performance metrics are balanced by cost and complexity.
a) At the entry level, we can just backup the data to the cloud. This gets the data and application configuration off-site, so at least we have a starting place to recover from. This is a really basic offer because it means that you will loose all data up to the last backup – so maybe 24 hours of data loss, and it will take time to recover systems from backup plus any required time to reconfigure them to run in the cloud. This is what we would call a minimum effort – no system should be without at least this level of protection, but most systems will need something better.
b) Next we have the Pilot Light. In this solution we upgrade our database protection to real-time replication, which brings RPO or data loss down to just a few seconds. But we still use the backup and recovery strategy for our application servers, so we have that long recovery time while we restore and reconfigure servers. This solution is good when you want to minimize data loss and keep your costs down, and can tolerate a relatively long downtime – like 24+ hours.
c) The next step up is to configure some stand-by servers that match our application tiers on-premises. Now when we need to fail over, instead of waiting for that long restore time, we have everything ready to go, and just need to switch it into production. This brings our recovery time down to minutes to go along with our already low data loss.
d) At the highest level, we can build an Active / Active solution that gives you Zero downtime and Zero data loss even in the face of a regional disaster. Not every application needs this level of protection, but if the value of a few minutes of downtime or a few seconds of data loss is high enough, we can deliver a solution here.
e) Most customers will select something here, depending on their tolerance for down time when recovering from a disaster.
f) And of course, we recommend that every application get at least this level of protection. Without this basic protection, you are really signing up for unlimited data loss and unlimited down time following a regional disaster. Very few people will knowingly sign up for that.
Notable point : There are a range of protection options that trade off performance and cost. Active / Passive lets you recover with very low down time and data loss. Backup is the minimum for all applications.
Disaster Recovery Across Multiple Regions
You can achieve true DR across multiple region in the unlikely event that one region goes down. This reference architecture covers the most robust case with clustering of supported services across ADs within the primary region, but DR can be achieved across regions with single AD. This is important to note as most of the new OCI regions launching will be single AD regions.
Active-active components across ADs: Clustering of supported services across ADs provides protection from an AD failure.
Active-passive components across regions: If you are using active-passive to synchronize application servers across ADs, use rsync.
VCN peering across regions: VCNs can connect between regions within a tenancy or even between tenancies. Connectivity is done using Oracle’s internal backbone between regions.
Storage synchronization across AD: Block volume backups between regions can be done using the console, CLI, SDKs, or REST APIs. Copying block volume backups to another region at regular intervals makes it easier to rebuild applications and data in the destination region if a region-wide disaster occurs in the source region. You can also easily migrate and expand applications to another region. With Object Storage cross-region copy, data asynchronously copies objects between buckets in the same region or to buckets in other regions.
Database DR across ADs: The use of either Data Guard or Active Data Guard is dependent on your use case and database edition. Active Data Guard requires Enterprise Edition – Extreme Performance.
Disaster Recovery: On-prem to Oracle Cloud Infrastructure
1. Replicate production environment to OCI
2. Set sync policy
- Configurable policy: hourly, daily,
- weekly or per defined schedule
- Multiple policies can be configured and applied
- Automatic sync and alerts
3. Provisioning options
- Pre-provision VMs (hot standby)
- Dynamically provision VMs -sync to storage (low cost)
IT Convergence can provide a complete Disaster Recovery and Backup Platform that extends across physical and virtual environments. Multiple RPO/RTO options give enterprises control over availability vs. cost, ensuring critical applications recover quickly and secondary apps do so in the most cost-effective timeframe.
So let’s take a look at how it works. On the left you see your on-prem estate, including your Oracle EBS applications. On the right you see OCI. Helping us to connect these together is middleware, which leverages automation software to run its own small server in Oracle Cloud and knows how to set up, operate, and manage disaster recovery between your on prem data center and Oracle Cloud Infrastructure.
It detect the config on your production servers, recreate them in the cloud, migrate all the DB, files, etc. to the cloud and continuously update it. Then, either on demand or after a failure you can restart your E-Business Suite applications in the cloud and let your users reconnect so that you’re back in business with minimum disruption.
Customers can also choose from a list certified Cloud MSE’s who will manage the entire process on your behalf. Certified Oracle cloud MSE’s have proven expertise, tools and processes to build, deploy, run, and manage Oracle and non-Oracle workloads on Oracle Cloud Platform all under a single contract and a single point of contact.
These cloud MSE’s can have the environment up and running with a full DR test of the EBS environment running in OCI in a minimum specified time (30-45 days*). For a leading US eye wear company we setup a multi-region, 3 disaster recovery scenario in less than 6 weeks setup. You can read the case study here.
Once OCI is validated as a viable high performance DR site, switchover primary to OCI and set up a second availability domain for DR to OCI as discussed previously
Other Key Considerations for Designing an Oracle Cloud Architecture
Several architecture configurations are available to match your current on-premises design. With the Oracle E-Business Suite Lift and Shift capability, moving from on-premises, operational risk has been reduced with improved success on all migrations. Oracle Cloud Infrastructure is also very cost-effective for rapid deployment and removal of test and Quality Assurance environments.
Below are the other considerations to learn before proceeding with your EBS to OCI Migration.
Fundamentals of a Successful EBS Migration
Applying Best practices and Reference Architectures Across OCI Solutions
Conclusion
Creating a DR for mission critical applications will have an involvement of key several stakeholders with specialized skillsets. It is highly recommended to leverage certified Oracle Cloud MSE Partners to ensure you avoid any roadblocks in your cloud migration journey and avoid inflating your cloud migration costs and timelines.