A data product is a reusable data asset that bundles data together with
everything needed to make it independently usable by authorized consumers.
Data Products – The Foundation for Data Mesh
- Decentralized, domain-oriented data ownership
- Data as a product
- Self-service platform
- Federated data governance
The second principle suggests that for a distributed data platform to be successful, domain data teams must apply product thinking to the datasets they deliver – considering their data assets as their products, and the rest of the organization's data consumers, as their customers.
Analyst firm Gartner explains that a data mesh architecture is designed with “the specific goal of building business-focused data products”.
Data products are also relevant for centralized data management architectures, such as data fabric, where data products are created, managed, and adapted by central data teams for consumption by authorized data consumers across the company.
This paper defines data products and their attributes, and then discusses the market need, examples, the evolution from project- to product-driven data management, use cases, and how to get started.
Table of Contents
The Business Need for Data Products
Today, data is produced at an unprecedented rate, due to the staggering amount of digital services and offerings, combined with ubiquitous Internet connectivity. At the same time, data is a company’s most important asset, and critical to business success.
McKinsey: Data-driven companies are
– 23x more likely to acquire customers
– 19x more likely to be profitable
Even though 90% of the world’s data was generated in the past 2 years, enterprise data continues to be managed in data silos, and liberating data from these systems is now the biggest obstacle to delivering data-driven outcomes. This is because enterprise data is:
Fragmented in 100s of systems
Captive in vendor-owned applications that lack a rich API set for data access
Locked within in-house legacy systems, with little or no knowhow of the underlying data models
Variably structured (or unstructured) in dozens of technologies and formats
Non-compliant, containing sensitive personal information, which must be anonymized to adhere with regulations (GDPR, CPRA, HIPAA, etc.)
Over 80% of enterprise data is “in the dark”, in the sense that
it's inaccessible and not being used – to drive business decisions or to improve customer experiences or operational efficiencies.
It's only weighing companies down.
What is a Data Product?
A data product is a reusable data asset, engineered to deliver a trusted dataset for a specific purpose. It integrates data from relevant source systems, processes the data, ensures that it’s compliant, and makes it instantly accessible to anyone with the right credentials.
A data product shields data consumers from the underlying complexities of the data sources – by decoupling the dataset from its systems, and making it discoverable and accessible as an asset.
A data product generally corresponds to one or more business entities (customers, suppliers, devices, orders, etc.) and is made up of metadata and dataset instances:
Data product metadata
Static metadata, including the tables and fields used to capture the data product's dataset
Data connectors, for ingesting and delivering the required dataset from source systems to data consumers (via Kafka, JDBC, CDC, data services, ETL, messaging, or virtualization)
Sync rules, defining when and how the data product syncs its dataset with its sources
Business logic, to process, mask, and enrich the raw dataset, prior to delivering it
Data governance, to ensure the dataset’s quality and privacy are enforced according to internal and external regulations
Active metadata logs, for capturing data product performance and usage statistics
Access controls, including authentication and credential validation
Data product dataset
Managed as a unit, simplifying data processing and access
Complete, fresh, clean, and compliant – integrated, cleansed, masked, and enriched
Stored, cached, or virtualized
Audited automatically, to record every entry and change to the dataset
Accessible by any authorized data consumer
Data products are built, versioned, tested, deployed, and monitored,
to ensure their ongoing value to the people and systems that use them.
Data Product Use Cases
Data products are engineered to drive specific business outcomes, through operational and analytical workloads. Here are examples of some common data product use cases:
- Predicting a customer’s propensity to churn, in real time, immediately before a customer service interaction
- Pipelining inventory data from chain stores into a cloud data warehouse for BI analysis
- Preparing a masked test dataset with data masking tools, and integrating it with a CI/CD pipeline, before launching a new version of a wealth management software system
- Tokenizing sensitive customer data prior to AI/ML analysis
- Delivering a consolidated, real-time, and holistic customer dataset to a CRM application, including customer transactions, interactions, and master data.
- Publishing the latest updates on the spread of COVID-19 to an HMO’s patients in high-risk areas
- Moving a legacy application’s data into a new cloud computing environment safely and quickly, while ensuring business continuity
While data products are often associated with analytical workloads,
they are just as important to operational workloads.
Operational Data Products
According to Teresa Tung, Accenture Cloud First Chief Technologist, and holder of 220 patents, an operational data product delivers a holistic, real-time, and trusted dataset of any business entity – such as a customer, vendor, or order – or anything that’s important to the business.
An operational data product moves data between sources and targets, in both directions, and in fractions of a second. And it can selectively store data, to act as an operational datastore, when necessary.
What makes an operational data product so special, is that its dataset is always:
- Unified, and complete, for any business entity
- Up-to-date, and enriched with operational intelligence
- Protected, compliant with privacy regulations, and properly governed
- Accessible in real time via data services and a wide range of data delivery methods
In a data tokenization use case of operational data products, Comcast deployed K2view Data Product Platform, enabling business domains to build, publish, and maintain data products. Authorized data consumers across the company, can auto-discover data assets using the platform’s data product catalog.
In this implementation, each data product manages and persists the dataset for each individual customer, in its own high-performance Micro-Database™ – or mini data lake. In the case of Comcast, the platform manages over 30M Micro-Databases, one for each customer.
Comcast created a data product to tokenize sensitive data, where the tokens for each customer are persisted in the customer’s specific Micro-Database, each secured with its own 256-bit encryption key. In a sense, the Micro-Database becomes a “mini-vault”, with zero risk of a mass data breach.
Micro-Databases are foundational for operational workloads because they’re always:
In perfect sync, with all underlying data sources
Secured, each with its own 256-bit encryption key
Compliant, in the context of data privacy regulations, with any sensitive data tokenized or dynamically masked on the fly
ACID-compatible, so that they can transformed into databases of record for new apps
Accessible, in milliseconds, by authorized data consumers across the enterprise
Operational data products enable enterprises to become more:
Responding to market demands with new apps and/or updated features
Making operational datasets easily accessible across use cases and domains
Delivering a single, trusted, real-time view of any business entity, to any authorized user
The Data Product Lifecycle
Data-driven enterprises have one thing in common: they build data products, as opposed to one-off data projects. Data products are reusable assets focused on business outcomes.
Every data product follows a lifecycle, similar to that of a software product, to iterate and assure that it delivers the desired business outcomes. It looks something like this:
A data product is defined by its business objectives, governance constraints (security and privacy), and data asset inventories. Its design is a function of how the data is to be productized, for consumption via services.
A data product is engineered by locating, accessing, and integrating the needed source data, and then processing it as required. Data services are created to provide consuming applications with access to the data, while data pipelines are engineered to deliver the data to authorized analytical data consumers. The data product is versioned and designed to comply with performance SLAs.
Data products only add value once they’re run in production. But, before that can happen, they must be tested to ensure that the datasets they deliver perform as expected, and are fresh, cleansed, complete, compliant, and ready for high-scale consumption.
The data product is deployed, monitored (for usage, performance, and reliability), maintained, and supported – to quickly address any issues that may arise.
Enter the data product manager
Similar to the software product manager, a data product manager is responsible for delivering business value and ROI from the data products – defining their goals and priorities (together with data engineers and consumers), and continuously working to ensure that the promised value is attained.
Why the cycle?
Data teams are constantly experimenting – implementing new services, deploying them, and monitoring the results. The quicker they go through the cycle, the quicker they learn, and the quicker they deliver incremental value to their customers.
Project vs Product Data Mindset
Traditionally, most companies are project-driven when it comes to data.
For example, if a business domain requires a particular dataset to address a particular need, it typically raises a request with the central data engineering team. That request represents a project to identify, collect, prepare, and deliver the relevant dataset to the business domain. This same pattern is followed every time a new use case emerges, from any domain in the organization.
This “data as a project” approach has some major drawbacks, including slow time-to-delivery, lack of reuse, rigidity, and risk of delivering wrong, and/or incomplete data.
On the other hand, a product-driven approach keeps the entire enterprise’s data needs in mind.
A project-driven approach to data drives greater complexity and minimal reuse,
compared to the simpler, more agile product-driven approach to data.
Data products can be reused to support any number of use cases, serving any number of domains
Advantages of Data Products
Over time, data products deliver better ROI, and cost-per-use, than data projects. Despite some upfront costs, they quickly evolve to support multiple outcomes, addressing emerging use cases – where the focus is always on use case accommodation.
For data consumers, data products offer:
Quicker time-to-insight, using pre-built data products (instead of initiating a new project)
Full data integrity, ensuring complete, consistent, and compliant data every time
Situational awareness, where the data is augmented with real-time insights
Real-time data provisioning, for timely and informed decision-making in operational scenarios
Data governance, assuring high-quality and compliant data
Accessibility at any time, to any authorized data consumer
For the enterprise, data products are:
Business-driven, and outcome-focused
Agile, with value delivered incrementally
Reusable, meaning that they're built once, but can used again and again
Future-proof, in terms of data architecture
Trusted, enhancing data trust and integrity
Collaborative, in the sense that they create a common language between business and IT
How to Get Started with Data Products
Deploy the right platform
K2view provides a Data Product Platform to engineer, test, deploy, and monitor data products, in serving a broad variety of workloads.
The platform’s Data Product Studio enables data teams to quickly define and maintain the metadata for data products, including the data schema, connectors, sync policies, data transformations, governance, and more.
Once deployed, a data product manages its dataset within its own hyper-performance Micro-Database™, to support enterprise scale, resilience, and agility.
A “Customer” data product collects data from all sources, prepares it,
and delivers it to authorized data consumers – end-to-end – in real time.
Appoint data product managers
Data product managers need a broad range of skills in the areas of data, analytics, enterprise applications, business analysis, and DataOps. Ultimately responsible for the entire data product lifecycle, they:
Develop data strategies, determine performance metrics, and promote data literacy across the company
Ensure business value from data, and maximize the Return On Data Investment
Close the gap between business and IT, by communicating the needs of data consumers across business domains and working with data engineers to improve data accessibility.
In the same way a software product manager defines user needs, prioritizes them, and works with R&D to assure delivery, data product managers collect the needs of data consumers, and collaborate with data engineers and data scientists to deliver on them.Data product managers are the ultimate definers of the data – and also the main champions of data products within the organization.
Go for flexibility
To maximize flexibility, enterprises should choose a platform that deploys on premises, in the cloud (iPaaS), or across hybrid environments – with support for all modern data architectures.
A data fabric architecture is a modular data management framework, which integrates with your existing data and analytics tools. It assumes that data products are defined by a central data and analytics organization, and adapt over time based on automated analysis of active metadata.
A data mesh architecture shifts data strategy to a federated data network. It gives business domains the autonomy and tools to create data products for their needs, and creates a common framework for building, and scaling, product-driven data solutions, in real time.
There are advantages and disadvantages to data mesh vs data fabric, but both architectures leverage data products as a fundamental construct.
Data products are an emerging data construct, adopted by leading, data-driven organizations. Their value stems from quick discoverability access to trusted data, cutting the time to insights, and driving informed, timely decision making.
Data products fuel operational and analytical workloads, and may be deployed in a data mesh or data fabric architecture – on premises, in the cloud, or in a hybrid environment.
Finally, in an age of data minimization – where organizations are encouraged to collect only the data they need, and dispose of it the moment they're through – a data product approach facilitates both the provisioning and deletion of personal or sensitive data in real time.
Data teams should seek a data product platform to manage the entire lifecycle of data products, deploy data products at enterprise scale, and with flexibility to support multiple data management architectures and operating models.