Accelerating Data Platform Migration with Test Automation

October 14, 2024
Accelerating Data Platform Migration with Test Automation

Highlights

  • Spearheaded faster data product delivery by implementing a cutting-edge Test Automation Framework, including the use of dbt model contracts, for data and ELT migration processes, using dbt Cloud on the new Snowflake on Azure platform.
  • The framework seamlessly integrated with data pipelines, enabling the organisation’s desire to implement CI/CD on the data platform.
  • Identified and resolved data quality issues early in the transformation process, streamlining production support through optimisation and automation.
  • Established observability and monitoring for automated data testing, along with Quality Assurance KPIs tailored to the organisation’s needs.
  • Stakeholders and users responded positively, leading to additional streams of data testing automation and quality validation beyond the initial proof of concept, driving adoption across multiple data engineering squads.
  • Consolidated fragmented testing processes by deploying a modular, template-driven approach to data test case design and build.
  • Empowered federated ownership of data testing by individual squads, while maintaining centralised compliance with data testing standards.
  • The framework reduced migration testing time by over 50%, enhanced knowledge transfer between development and testing teams, and shortened data product release cycles by 30%.

Introduction

A leading bank undertook a major initiative to migrate its on-premises data warehouse to a modern data platform on Microsoft Azure and Snowflake, with tight deadlines. To meet these timelines and cut operational costs, the bank focused on modernising processes, boosting efficiency, and maintaining data quality throughout the transition.

To accelerate the migration, the bank prioritised faster data product delivery through test automation and the integration of data pipelines with testing processes. This initiative marked the bank’s move towards adopting Continuous Integration, Continuous Deployment (CI/CD), and DataOps practices within its data platform. These steps supported the bank’s long term vision of implementing a federated data product architecture, distributing data engineering and analytics workloads while ensuring top-tier quality and compliance.

The core goal was to deploy methodologies that would significantly reduce operational overheads and speed up task completion, driving greater efficiency. One key approach was to introduce test automation, allowing data engineers to focus on higher-value activities. The initiative aimed to achieve the following business objectives:

  • Reduce time and effort: Minimise the resources required for testing data pipelines and speed up deployment to production during data migration.
  • Increase visibility of data quality issues: Detect data quality issues early to optimise production support for the data platform.
  • Enable Compliance by Design: Federate analytics engineering while maintaining strong risk controls across all data testing and engineering processes.

Problem Statements

  • Testing required significant time and effort. User Acceptance Testing (UAT) was a major bottleneck, and the business sought solutions to streamline this process for users.
  • Inability to federate analytics engineering, which aimed to remove dependencies on the data platform function, required a risk-controlled environment to enforce standards and best practices across dispersed teams.
  • The data platform testing function, though essential, was smaller in scale compared to engineering and lacked automation, relying on manual support to manage testing volumes effectively.
  • Limited observability and automation hindered the detection, resolution, and prevention of data quality issues within the platform.
  • Insufficient visibility into testing activities arose from missing dashboards and metrics, creating blind spots in the process.
  • Unit testing, combined with the consolidation of data and schema changes during migration to the cloud, led to bottlenecks.
  • Testing activities were often manual, repetitive, and lacked standardisation.
  • The data migration build and review process was labour-intensive, involving tasks such as table, schema, and business rule validation, further slowing progress.

To address this, the business needed a solution that could automate repetitive data testing tasks, significantly reducing testing efforts. This solution had to be easy to adopt and integrate across data engineering teams, forming the foundation for the organisation’s journey towards CI/CD, DataOps, and a federated data product architecture.

The Solution

  • Implemented a test automation framework using dbt Cloud and Snowflake on Azure for data migration, data quality testing, and ELT migration activities. This framework enabled:
    • Automated data validation tests, conducting preliminary unit testing and UAT on the user’s behalf before the formal UAT phase.
    • Early identification and resolution of data quality issues, optimising Production support.
    • Validation beyond basic tests like row counts, allowing for more complex assessments of Data Definition Validity,
  • Data Validity, Data Profile, Correctness, Completeness, Consistency, Design Integrity, and Reliability.
  • The framework automated the generation of test cases and steps, which could be applied within data pipelines or on an ad-hoc basis.
  • Quality metrics and test results, previously unavailable, were now accessible due to the outputs from the test automation framework.
  • The framework introduced a federated test process with governance and controls, establishing a standardised approach to testing across the data platform.
  • A metadata-driven design allowed for consistent and bulk generation of test cases, eliminating manual intervention and ensuring efficiency.
Federated Data Product Architecture Enablement

The Outcomes

  • The Data Domain team pioneered a test automation framework using dbt Cloud on Snowflake, which was widely adopted by various data squads to test migrated data products on the platform.
  • A standardised testing pattern was established, and with the introduction of dbt model contracts as part of development standards, the federation of both data engineering and data testing activities was enabled, aligning with the organisation’s strategic goals.
  • The coaching provided by the Data Domain team introduced new data testing methods, including a template-driven dbt test automation framework and baseline automation for data processes.
  • The framework boosted productivity by reducing the manual, repetitive unit testing required before UAT, allowing data engineers to focus on more value-added tasks.
  • The successful implementation not only improved data engineering efficiency but also set a benchmark for future data migration testing across the organisation.
  • Automation of test cases enabled the creation of regression suites and comprehensive regression testing—previously lacking or incomplete.
  • The framework accelerated testing across data engineering and testing functions by over 50%, improved knowledge transfer between development and testing teams, and reduced the time to market for data products by 30%.

As a result of this success, Data Domain was engaged in additional data platform migration and testing initiatives, including later phases of the data platform migration programme beyond the initial proof of concept.

Read More

Find out more about dbt or read about our work on dbt's page by clicking here.

We are a Snowflake Select Tier Partner. Read more here.

click here to download the case study  >