Databricks Implementation for a Global Supplier of Speciality Formulations
Challenge
The customer needed a modern data platform to integrate data from multiple sources, including ERP systems, and consolidate various reporting tools into one unified solution. A key challenge was the daily extraction and processing of large, diverse datasets—each with different performance requirements. They also wanted to enable self-service analytics while ensuring strong data governance. To reduce manual effort and establish a single source of truth, they began with a pilot focused on Procurement, which added complexity to the project’s planning and execution.
Solution
ITC worked closely with the client to understand their complex data landscape, which involved multiple systems like Oracle EBS, SAP, and MS SQL Server. To unify these sources and enable accurate, self-service analytics, ITC implemented a scalable Data Lakehouse architecture using Databricks. A key component of the solution was a customized Master Data Management (MDM) system, which eliminated data duplication and inconsistency, providing a single source of truth for reliable reporting. The team also introduced a metadata-driven ELT framework to automate data ingestion without the need for code changes, improving scalability and minimizing manual work.
To ensure smooth performance and cost-efficiency, serverless compute and auto-scaling features were used to manage fluctuating workloads. The architecture followed a Medallion design, organizing data into bronze, silver, and gold layers for structured processing, and included audit tables for tracking data pipeline performance. Unity Catalog was implemented for secure, centralized data governance, while Azure DevOps Repos streamlined code management and deployment. To support adoption, ITC provided training and integrated the Databricks AI Assistant, enabling the client’s team to confidently develop and manage data workflows.
Results
- Integrated SAP, Oracle, and SQL Server into a Central Data Lake, providing a reliable single source of truth for reporting and analytics.
- Achieved sub-five-minute processing of 5 million rows daily, with global data availability in under 24 hours.
- Automated data ingestion and master data management drastically reduced manual tasks and spreadsheet reliance.
- Enabled AI/ML-driven analysis and forecasting, supporting smarter operational and executive decision-making.
- Customized dashboards and audit tables enhanced tracking of historical costs and future forecasts, leading to better financial planning.