Rapid Database Recovery in a Critical Failure Scenario
Challenge
- Database corruption impacting business operations during admissions processing
- Failed efforts in database recovery utilizing standard functionality
- Inability to invoke recovery efforts on a DR environment
- Risk of substantial data loss and extended production downtime
- Delays with student admissions could severely impact timely receipt of tuitions
Solution
Through the Emergency Services framework, ITC immediately executed a multi-phased, intensive recovery plan combining traditional tools with creative workaround engineering:
- Root Cause Identification & Failure Analysis: Identified the source of the corruption to then triage with Oracle Support for assistance.
- Backup Strategy Assessment: Identified gaps within the backup policies and provided recommendations for immediate remediation while working the recovery efforts in parallel (existing backups were determined unusable for recovery efforts).
- Database Creation: Created a new Oracle database matching original configurations providing a backup plan in parallel to all other ongoing efforts.
- Data Salvage & Reconstruction: Exported data from the corrupted source using Data Pump to bypass corrupted blocks and create the new database.
- Workaround Engineering: Complete database reconstruction combining elements of data from multiple sources, ultimately restoring all components.
- Post-Recovery Optimization: Completed several configurations designed to optimize the production database.
Results
- The production database was successfully rebuilt, free of corruption, and optimized for ongoing use.
- All critical data was imported into the new production database, and data loss was minimized.
- Cutover operations were executed quickly to return services to normal operation.
- Future recommendations were made as a result of all ongoing work to continue to optimize the production database and to facilitate disaster recovery operations.
- User confidence in the system was restored as a result of team efforts to restore and minimize data loss.