The shift to agile development and DevOps, with continuous integration and continuous deployment (CI/CD) pipelines, is accelerating the pace of software innovation, while reducing costs. For testing teams, this shift means that test data provisioning must keep up with the faster pace.
Shift-left testing works in tandem with agile software delivery in raising the standards of quality testing. With this method, testing begins at the earliest stages of the development process rather than saving it for last. To implement this approach, test data must be prepared even earlier, and reflect a variety of functional and non-functional software testing scenarios.
With production data fragmented across multiple enterprise systems, ensuring complete and harmonized data for testing is a constant struggle. The need to mask data, as required by privacy regulations, and create synthetic data to augment the existing data set, add an additional layer of complexity.
For DevOps leaders and testing teams, delivering high-quality test data environments, at a rapid pace, is critical. This paper reviews the challenges faced on the journey to DevOps test data management, and the steps required to get there.
Catalyzed by the rapid growth in applications, software development has shifted gears, releasing smaller software deliverables in fast sprints. DevOps test data management is characterized by much smaller-scope deliveries, which go live in weeks, as opposed to months.
Agile methodology allows for continuous design, development, testing, and deployment, throughout the SDL.
Agile development methodology divides long projects into smaller “sprints”. The logic is to get software increments into production as soon as possible – instead of waiting long periods for full deployment.
Not only does agile development give companies a competitive edge, by getting innovative apps out to the user, sooner, but it also mitigates risk. In this kind of CI/C model, all phases of the lifecycle – design, development, testing, deployment – go on simultaneously, in a continuous cycle.
Preparing quality test data has always been a challenge, especially with agile development and CI/CD. And with the growing popularity of data services, and the integration of multiple applications, the provisioning of valid test data has become even more complex.
"We’re on a journey to modernize our apps and to realize the benefits of embracing a DevOps methodology. But you hit a roadblock if you don’t have realistic data to test against."
Ward Chewning VP of Network Services & Shared Platform, AT&T
For a test cycle to be effective, whether it is manual or automated, availability of stable test data, that is as close to production as possible, is critical. Further, DevOps requires test data automation to be successful, which needs qualitative, consistent, and predictable datasets to run smoothly.
Adequate and on-demand test data should be available for running fully automated test suites. This test data should not constrain automated testing, to ensure seamless test data management processes.
Shift-left testing is an approach to software testing in which testing is performed earlier in the software development process. To test earlier, however, requires the earlier availability of realistic test data.
Testing from the earliest stages of the development process increases efficiency and accelerates innovation.
Shift-left testing is common in agile development, where software development is divided into sprints. Each sprint requires its own testing cycles, and so, creating realistic test data often becomes a bottleneck – cancelling the gains of agile productivity.
Let’s examine the challenges related to test data in DevOps, and then review a practical solution for each of them.
Testing teams must contend with many data constraints, which typically slow down software delivery, while hindering quality, and agility.
Testing teams sometimes lack access to the necessary data, or the tools to extract it. Enterprise data is typically fragmented across different data sources. For example, an individual customer's data might be stored in dozens of applications, including customer care (CRM), billing, ordering, ticketing, collections, campaign management, churn prediction, and more. To run functional tests that require customer data, would require provisioning data from all relevant source systems.
Gathering enough production data to cover the required testing scenarios is often challenging.
For example, testers may require the data for 300 customers (across all systems), that meet a certain criteria set, to complete a test scenario, but only 200 production samples are actually available.
Test data management tools need to be able to synthesize (generate) 100 data samples, based on the production samples, while maintaining data integrity across all systems.
In many cases, the data may be available, but it fails to meet the required quality standards for the following reasons:
Adopting a proven test data management strategy enables enterprises to accelerate test data provisioning and increase the quality of software delivery.
Here are the steps companies should use on the road to delivering agile test data at enterprise complexity and scale.
Start by determining clear criteria upon which the test data collection process will be based. These define the data subsets that should be used in testing the use cases, including the required business entities to cover the testing scenarios, the volume of data required for testing, its sources, its freshness, and more. Teams can make use of an automated data catalog to inventory and classify test data assets, and visually map information supply chains.
Having established which test data is needed, it’s time to extract it from the organization’s production systems. When the required data is dispersed across many different systems and data sources, a test data management tool – that can integrate with the production systems, and extract test data according to predefined rules – comes in handy.
Testing is an iterative process. When bugs are discovered and fixed, testing should be repeated to ensure quality (regression testing). A test data management strategy should provide for the means to quickly roll back the test data that was previously used – by the specific tester, for the specific use case – without impacting the test data currently being used for other tests. Companies should seek a test data management tool that is adaptable, easy to sync with source systems, and capable of rolling back data on demand.
Any test data management strategy is incomplete without ensuring adequate privacy and security measures via data masking. When dealing with production data, the mission is to ensure data privacy, while maintaining the data’s integrity and keeping it secure. Centralizing the test data from multiple sources into a test data warehouse, leveraging data anonymization to protect it, and securing it along the way, creates a simple and efficient process for meeting data compliance and security requirements.
Synthetic data generation is another method for protecting sensitive information besides masking. In cases where test teams cannot extract enough production data for testing, or if new software functionality needs to be tested that lacks production data, test teams need to generate test data to complete the testing. A test data management strategy should include the means to easily generate synthetic data on demand.
After acquiring the necessary test data, generating missing data, and masking data as required, it’s time to move it to the target test environments. Test data management tools should offer a fast and seamless path from multiple source systems to multiple environments. Testers should be able to upload, adjust, and remove test datasets either manually or in an automated manner using CI/CD integration.
Test data management benefits include:
The effectiveness of both the testing process, and the delivered software product, rises significantly when proper test data management tools and methods are applied. Provisioning high-quality test data in minutes, enables development teams to increase test coverage, accelerate delivery, and improve the organization’s agility.
By quickly delivering the needed test data, teams are able to detect bugs early on in the software development process, and therefore fix them at a much lower cost. In addition, not having to work hard to produce relevant data frees development teams to focus on innovation and move the organization forward.
When test data management is both safe and of the highest quality, teams are able to adhere to privacy regulations, protecting the company's reputation. Reducing production defects and avoiding data breaches increase user trust, helping companies stay one step ahead of the competition.
The move to agile software development, with high-performance test data environments, saves enterprises millions of dollars.
The right test data generator should increase test coverage, accelerate software delivery, reduce testing costs, and enhance the end-user experience. The challenge is to find the most suitable test data management tool for your organization. A good place to start is by examining the top test data management tools for 2023.
The latest innovation in TDM testing is the business entity approach to test data management – where test data is collected from the source systems by business entity (e.g. customer, branch, loan), unified and masked as an entity, and then provisioned to the target test systems as by business entity.
This method simplifies and streamlines the test data management process, ensures referential integrity of the test data, and enables complete control of the process.
Further, data for business entities – both structured and unstructured – is ingested and compressed into a centralized test data store, allowing testing teams to instantly perform the following TDM testing actions:
Data in the test data store is masked in flight, before being stored. This includes the personal identifiable information (PII) that is stored in various unstructured data constructs: check images, PDF documents, chat scripts, audio files, XML documents, and more.
Test data management (TDM) combines the tools and processes to efficiently provision the required data for software testing, while ensuring compliance.
TDM testing involves the subsetting, transformation, aging, masking, reservation, and versioning of test data.
Its objectives are to ensure that tests are executed with consistent, precise, and relevant data, that is also compliant with data security and privacy regulations. By adopting test data management best practices, enterprises become more agile, enhance the quality of their applications, and minimize the resources needed to test them.
The major challenges of TDM testing include:
Managing enterprise complexity: Test data provisioning becomes be a highly complex process when multiple, heterogenous source systems, multiple test environments, and multiple software teams are involved.
Complying with data security and privacy regulations: Test data often contains personal or sensitive information which must be obscured by data masking tools or data tokenization tools.
Ensuring referential consistency: Test data subsets must maintain referential consistency, to ensure that they are complete, even after anonymization
Ensuring data relevance: Maintaining fresh, accurate data is a challenge, especially when large amounts of test data are required.
Managing huge amounts of data: The massive quantity of test data can be difficult and costly to organize and persist, and can impact performance, scalability, and total cost of ownership.
Anonymizing test data: Test data must be carefully masked to ensure that PII data is never compromised, eliminating risk of breaching data privacy regulations.
DevOps test data management integrates testing into the DevOps pipeline, by automating the collection, delivery, and management of test data as part of the Continuous Integration / Continuous Delivery (CI/CD) process. DevOps is interested in speeding up the testing process, enhancing the cooperation between development and testing teams, and improving the overall application quality. When test data management is embedded in the DevOps pipeline, enterprises can better manage the large volumes of data generated during testing, and ensure that tests are executed with accurate, consistent, and relevant data.
Test data management benefits include:
Better test data coverage: By linking test data between test cases and requirements, test data management can deliver a 360 view of test data coverage and identify error patterns.
Reduced costs: From a test data warehouse, the appropriate data can be provisioned for different testing types (e.g., functional, integration, performance, etc.), reducing the need for redundant data copies and extra storage.
Greater compliance and security: To ensure compliance with data privacy regulations, data masking and synthetic data generation have become intrinsic to test data management.
Test data reusability: Reusable test data is categorized and archived in the test data warehouse for future use by testers, reducing costs even further.
No data copies needed: When a test data warehouse is used by all teams, relational data integrity, and optimized storage, are maintained.
Better applications, with fewer defects: Shift-left testing identifies and deals with problems earlier on, in the testing phase. The result is customer trust.
The key components of a test data management strategy are:
Provisioning data by business entity: A solid test data management strategy lets testers provision lifelike, trusted data systematically, for any test case, on demand.
Extracting data inflight: The TDM testing team, or test data automation system, should be able to request the necessary test data, on the fly, with no preparation.
Refreshing/syncing data continuously: Testers need a test data management system capable of refreshing data (granularly, for each component) and is simple to sync – for rollbacks, should the need arise.
Anonymizing sensitive data: The ability to unify test data from multiple sources, perform data anonymization or de-anonymization, as needed, while constantly protecting it, is critical to compliance with privacy laws.
Synthesizing data when required: When there’s not enough test data available from production sources, synthetic-data generation tools allow testing teams to fill any gaps with artificial, yet very lifelike, data.
Test data management is an important part of the software development life cycle. It typically begins with the creation of test data, and then continues through the execution of tests, with data being refreshed and synced as needed. The test data is used to improve application quality, and can be reused for future efforts.
Automating TDM testing accelerates agile software delivery and expands regression testing coverage. Enterprises are adopting a 3-step process to do this:
Extract: Connect to all data sources to synchronize data extraction.
Manage: Integrate, mask, transform, subset, and generate test datasets in the test data management platform.
Provision: Provision the test data from the test data management system to the testing environments, on-demand.
TDM tools provision data from and to legacy and modern systems via native database connectors and APIs. These include: