Test data automation is the process of automatically delivering test data to lower environments, as requested by software and quality engineering teams.
Table of Contents
What is Test Data Automation?
Test Data Automation Drivers
3-Steps to Enterprise Test Data
Test Data for Every Scenario
Key Technology Challenges
Entity-Based Test Data Management Tools
What is Test Data Automation?
Test data automation is the process of automatically delivering test data to lower environments, preparing and organizing the test data used by software development and testing teams.
Test data automation integrates test data into an organization's DevOps CI/CD pipelines, ensuring that the test data is complete, precise, and current. These attributes improve the efficiency and quality of the testing process.
There’s no denying the importance of test data management, and of automated test data, as traditional test data provisioning methods fail to keep up with mounting requirements for speed and agility. The right test data generation solution needs to access data from IT environments that are more complex than ever, and at a much faster pace.
Software and quality engineering teams are demanding more automation to deliver software on time. These demands grow more crucial when we understand how the test data process works in many organizations.
Traditionally, multiple teams are involved in fulfilling test data requests, spanning data engineering, DevOps, software development, and testing. Each request must be approved, communicated, and prioritized across teams, before it can be delivered.
So it’s little wonder that test datasets take too long to provision.
The faster pace is a challenge that traditional test data approaches simply cannot handle. Without fresh, high-quality test datasets, testing procedures are delayed. The result is lower testing coverage, higher false-positives and false-negatives, which negatively affect product quality.
Test Data Automation Drivers
There are a couple of trends driving the increased demand for test data:
-
Delivering complex test datasets at the speed of business
With automated test data, DevOps can integrate test data into continuous integration and delivery (CI/CD) pipelines. This end-to-end automation is crucial for increasing testing efficiency.
-
Provisioning test data earlier for shift-left testing
By testing early in the stages of development, defects are detected earlier, which means that they can be fixed faster and at a lower cost.
These 2 factors demonstrate the critical role of test data automation, and here’s how you can achieve this goal.
3-Steps to Enterprise Test Data
Automated procedures that provision test data by connecting directly to the source systems might create an overload that impacts performance. To prevent this, instead of building one automation flow that extracts data from the source systems, and delivers it directly to the testing environment, the process should be done in 3 separate steps:
-
Step 1: Extract
Connect to all data sources to synchronize data extraction. -
Step 2: Manage
Integrate, mask, transform, subset, and generate test datasets with test data management. -
Step 3: Provision
Provision the test data, from the test data management tools, to the testing environments, on-demand.
This division, with test data management playing a central role, ensures that the testing environments receive the test data they need, without bringing down the production systems. When provisioning production-grade test data, the data should be requested from the test data management tools, and not the source system, to prevent a flood of requests that may result in system instability. The 3-step approach is also more secure, because it minimizes direct access to production data.
Test Data for Every Scenario
Testing environments are not one-size-fits-all, and require certain datasets to fulfill certain needs, for example:
-
Specific use cases, such as testing the order processing flow for different device types as opposed to testing a billing cycle failure.
-
Different versions of the same application, including changes to the data model.
-
Different development stages, when using a shift-left testing approach, which is more relevant during earlier stages
-
New applications, that are, as yet, without production data
The diversity required to meet these needs proves that copy-paste approach to test data is insufficient. Test data management tools must adjust, transform, augment, and subset the data to fit the specific target environment’s requirements and formats.
Key Technology Challenges
The 3-step approach to test data management is crucial but insufficient for an effective test data management strategy, which requires the following:
-
Synchronized extraction of production data
Traditionally, companies perform a quarterly or monthly extraction of all production data. Stale and outdated test data can limit testing scenarios and impact testing quality. Test data automation should solve this problem with smart synchronization, that only extracts fresh data without impacting the source systems. - Masking on the fly
Data Masking is a critical step in the test data automation process because it protects sensitive data by employing data governance tools, and adhering to privacy regulations. To prevent exposure, this step should be executed before the unmasked data is stored in the system. Because customer data comes from multiple sources, it is often sent to the staging area unmasked to go through integration and cleansing procedures that ensure consistency. These steps involve many stakeholders who increase the risk of data leaks even further. To prevent this, in-flight data masking tools should be implemented.
-
Multiple-source parameter-based provisioning
When test data comes from various sources and requires parameter-based selection, effectively provisioning it becomes a real struggle. Enterprise test data automation needs to effectively address this complexity.
Entity-Based Test Data Management Tools
The entity-based test data management approach addresses the challenges listed above with automated features including:
-
Smart syncing
Business entities enable accurate, highly granular updates, based on specific changes, without having to copy entire databases. This enables data extraction without impacting production system performance. -
Masking with integrity
Referential integrity is inherent in this data model because different sources are automatically integrated into individual entities. Data can be masked on the fly without any staging areas that might put sensitive data at risk. As a result, when test data is stored as business entities, it is always secure, ready for selection and provisioning. -
Subsetting
Using a self-service portal or an API, development, IT, and testing teams can easily create parameter-based selections to provision test data on demand. The automated test data is then seamlessly integrated into CI/CD pipelines, to drive fully automated testing cycles.