Practical Guide to Data Products

Data products – The foundation for data mesh

The concept of “data products” was first introduced as a core component of the data mesh architecture and operating model. Data mesh introduces the following
4 principles:

Decentralized, domain-oriented data ownership
Data as a product
Self-service platform
Federated data governance

The second principle suggests that for a distributed data platform to be successful, domain data teams must apply product thinking to the datasets they deliver – considering their data assets as their products, and the rest of the organization's data consumers, as their customers.

Analyst firm Gartner explains that a data mesh architecture is designed with “the specific goal of building business-focused data products”.

Data products are also relevant for centralized data management architectures, such as data fabric, where data products are created, managed, and adapted by central data teams for consumption by authorized data consumers across the company.

Discover how data products and their attributes are defined, learn the market need, with examples, and the evolution from project- to product-driven data management, use cases, and how to get started.

The business need for data products

Today, data is produced at an unprecedented rate, due to the staggering amount of digital services and offerings, combined with ubiquitous Internet connectivity. At the same time, data is a company’s most important asset, and critical to business success.

McKinsey: Data-driven companies are

23x more likely to acquire customers
19x more likely to be profitable

Even though 90% of the world’s data was generated in the past 2 years, enterprise data continues to be managed in data silos, and liberating data from these systems is now the biggest obstacle to delivering data-driven outcomes. This is because enterprise data is:

Fragmented in 100s of systems
Captive in vendor-owned applications that lack a rich API set for data access
Locked within in-house legacy systems, with little or no knowhow of the underlying data models
Variably structured (or unstructured) in dozens of technologies and formats
Non-compliant, containing sensitive personal information, which must be anonymized to adhere with regulations (GDPR, CPRA, HIPAA, etc.)

Dark data

Over 80% of enterprise data is “in the dark”, in the sense that it's inaccessible and not being used – to drive business decisions or to improve customer experiences or operational efficiencies. It's only weighing companies down.

What is a data product?

A data product is a reusable data asset, engineered to deliver a trusted dataset for a specific purpose. It integrates data from relevant source systems, processes the data, ensures that it’s compliant, and makes it instantly accessible to anyone with the right credentials.

A data product shields data consumers from the underlying complexities of the data sources – by decoupling the dataset from its systems, and making it discoverable and accessible as an asset.

A data product generally corresponds to one or more business entities (customers, suppliers, devices, orders, etc.) and is made up of metadata and dataset instances:

Data product metadata

Static metadata, including the tables and fields used to capture the data product's dataset
Data connectors, for ingesting and delivering the required dataset from source systems to data consumers (via Kafka, JDBC, CDC, data services, ETL, messaging, or virtualization)
Sync rules, defining when and how the data product syncs its dataset with its sources
Business logic, to process, mask, and enrich the raw dataset, prior to delivering it
Data governance, to ensure the dataset’s quality and privacy are enforced according to internal and external regulations
Active metadata logs, for capturing data product performance and usage statistics
Access controls, including authentication and credential validation

Data product dataset

Managed as a unit, simplifying data processing and access
Complete, fresh, clean, and compliant – integrated, cleansed, masked, and enriched
Stored, cached, or virtualized
Audited automatically, to record every entry and change to the dataset
Accessible by any authorized data consumer

Data products are built, versioned, tested, deployed, and monitored, to ensure their ongoing value to the people and systems that use them.

Data product use cases

Data products are engineered to drive specific business outcomes, through operational and analytical workloads. Here are examples of some common data product use cases:

Predicting a customer’s propensity to churn, in real time, immediately before a customer service interaction
Pipelining inventory data from chain stores into a cloud data warehouse for BI analysis
Preparing a masked test dataset with data masking tools, and integrating it with a CI/CD pipeline, before launching a new version of a wealth management software system
Tokenizing sensitive customer data prior to AI/ML analysis
Delivering a consolidated, real-time, and holistic customer dataset to a CRM application, including customer transactions, interactions, and master data.
Publishing updates on the spread of the latest flu to an HMO’s high-risk patients
Moving a legacy application’s data into a new cloud computing environment safely and quickly, while ensuring business continuity

While data products are often associated with analytical workloads, they are just as important to operational workloads.

According to Teresa Tung, Accenture Cloud First Chief Technologist, and holder of 220 patents, an operational data product delivers a holistic, real-time, and trusted dataset of any business entity – such as a customer, vendor, or order – or anything that’s important to the business.

An operational data product moves data between sources and targets, in both directions, and in fractions of a second. And it can selectively store data, to act as an operational datastore, when necessary.

What makes an operational data product so special, is that its dataset is always:

Unified, and complete, for any business entity
Up-to-date, and enriched with operational intelligence
Protected, compliant with privacy regulations, and properly governed
Accessible in real time via data services and a wide range of data delivery methods

In a data tokenization use case of operational data products, Comcast deployed K2view Data Product Platform, enabling business domains to build, publish, and maintain data products. Authorized data consumers across the company, can auto-discover data assets using the platform’s data product catalog.

In this implementation, each data product manages and persists the dataset for each individual customer, in its own high-performance Micro-Database™ – or mini data lake. In the case of Comcast, the platform manages over 30M Micro-Databases, one for each customer.

Comcast created a data product to tokenize sensitive data, where the tokens for each customer are persisted in the customer’s specific Micro-Database, each secured with its own 256-bit encryption key. In a sense, the Micro-Database becomes a “mini-vault”, with zero risk of a mass data breach.

Micro-Databases are foundational for operational workloads because they’re always:

In perfect sync, with all underlying data sources
Secured, each with its own 256-bit encryption key
Compliant, in the context of data privacy regulations, with any sensitive data tokenized or dynamically masked on the fly
ACID-compatible, so that they can transformed into databases of record for new apps
Accessible, in milliseconds, by authorized data consumers across the enterprise

Operational data products enable enterprises to become more:

Agile: Responding to market demands with new apps and/or updated features
Federated: Making operational datasets easily accessible across use cases and domains
Governed: Delivering a single, trusted, real-time view of any business entity, to any authorized user

The data product lifecycle

Data-driven enterprises have one thing in common: they build data products, as opposed to one-off data projects. Data products are reusable assets focused on business outcomes.

Every data product follows a lifecycle, similar to that of a software product, to iterate and assure that it delivers the desired business outcomes. It looks something like this: Data product-4

Define
A data product is defined by its business objectives, governance constraints (security and privacy), and data asset inventories. Its design is a function of how the data is to be productized, for consumption via services.

Engineer
A data product is engineered by locating, accessing, and integrating the needed source data, and then processing it as required. Data services are created to provide consuming applications with access to the data, while data pipelines are engineered to deliver the data to authorized analytical data consumers. The data product is versioned and designed to comply with performance SLAs.

Test
Data products only add value once they’re run in production. But, before that can happen, they must be tested to ensure that the datasets they deliver perform as expected, and are fresh, cleansed, complete, compliant, and ready for high-scale consumption.

Deploy
The data product is deployed, monitored (for usage, performance, and reliability), maintained, and supported – to quickly address any issues that may arise.

Enter the data product manager
Similar to the software product manager, a data product manager is responsible for delivering business value and ROI from the data products – defining their goals and priorities (together with data engineers and consumers), and continuously working to ensure that the promised value is attained.

Why the cycle?

Data teams should be empowered to iterate in data product creation – implement enhancements, deploy them, and monitor performance and usage. The quicker they go through the cycle, they deliver incremental value to data product consumers.

The data product lifecycle for the AI era

With K2view you can now easily build, test, deploy, and monitor AI-powered data products.

Supercharge your AI, operational, and analytical workloads by quickly creating reusable data products that combine your trusted enterprise data from any source with everything you need to ensure its safe, independent use. Group 91028-2.png

By leveraging AI frameworks, like Retrieval-Augmented Generation (RAG), you can integrate your structured and unstructured enterprise data into your LLM prompts for more precise and up-to-date responses.

That's where generative data products and Micro-Databases come in.

Generative data products organize your fragmented data into 360° views of your business entities – such as customers, products, or loans – each in its own, individually encrypted Micro-Database.

And, despite the fact that millions of Micro-Databases are managed at the same time, each and every user query is immediately met with an accurate, personalized response.

Micro-Databases are uniquely suited to AI because they're:

Micro-sized, for split-second, low-cost compute
Clean, for enterprise data that users trust
Fresh, for live, personalized user experiences
Isolated, for security guardrail assurance
Protected, for compliance with data privacy laws
Resilient, for query concurrency at any scale

Project vs product data mindset

Traditionally, most companies are project-driven when it comes to data.

For example, if a business domain requires a particular dataset to address a particular need, it typically raises a request with the central data engineering team. That request represents a project to identify, collect, prepare, and deliver the relevant dataset to the business domain. This same pattern is followed every time a new use case emerges, from any domain in the organization.

This “data as a project” approach has some major drawbacks, including slow time-to-delivery, lack of reuse, rigidity, and risk of delivering wrong, and/or incomplete data.

On the other hand, a product-driven approach keeps the entire enterprise’s data needs in mind.

Data products can be reused to support any number of use cases, serving any number of domains

Data product-driven vs. project-driven approaches

A project-driven approach to data drives greater complexity and minimal reuse,
compared to the simpler, more agile product-driven approach to data.

Advantages of data products

Over time, data products deliver better ROI, and cost-per-use, than data projects. Despite some upfront costs, they quickly evolve to support multiple outcomes, addressing emerging use cases – where the focus is always on use case accommodation.

For data consumers, data products offer:

Quicker time-to-insight, using pre-built data products (instead of initiating a new project)
Full data integrity, ensuring complete, consistent, and compliant data every time
Situational awareness, where the data is augmented with real-time insights
Real-time data provisioning, for timely and informed decision-making in operational scenarios
Data governance, assuring high-quality and compliant data
Accessibility at any time, to any authorized data consumer

For the enterprise, data products are:

Business-driven, and outcome-focused
Agile, with value delivered incrementally
Reusable, meaning that they're built once, but can used again and again
Future-proof, in terms of data architecture
Trusted, enhancing data trust and integrity
Collaborative, in the sense that they create a common language between business and IT

Getting started with data products

Deploy the right platform

K2view provides a Data Product Platform to engineer, test, deploy, and monitor data products, in serving a broad variety of workloads.

The platform’s Data Product Studio enables data teams to quickly define and maintain the metadata for data products, including the data schema, connectors, sync policies, data transformations, governance, and more.

Once deployed, a data product manages its dataset within its own hyper-performance Micro-Database™, to support enterprise scale, resilience, and agility.

product platform _1-1

A “Customer” data product collects data from all sources, prepares it,
and delivers it to authorized data consumers – end-to-end – in real time.

Appoint data product managers

Data product managers need a broad range of skills in the areas of data, analytics, enterprise applications, business analysis, and DataOps. Ultimately responsible for the entire data product lifecycle, they:

Develop data strategies, determine performance metrics, and promote data literacy across the company
Ensure business value from data, and maximize the Return On Data Investment
Close the gap between business and IT, by communicating the needs of data consumers across business domains and working with data engineers to improve data accessibility.

In the same way a software product manager defines user needs, prioritizes them, and works with R&D to assure delivery, data product managers collect the needs of data consumers, and collaborate with data engineers and data scientists to deliver on them.Data product managers are the ultimate definers of the data – and also the main champions of data products within the organization.

Go for flexibility

To maximize flexibility, enterprises should choose a platform that deploys on premises, in the cloud (iPaaS), or across hybrid environments – with support for all modern data architectures.

A data fabric architecture is a modular data management framework, which integrates with your existing data and analytics tools. It assumes that data products are defined by a central data and analytics organization, and adapt over time based on automated analysis of active metadata.

A data mesh architecture shifts data strategy to a federated data network. It gives business domains the autonomy and tools to create data products for their needs, and creates a common framework for building, and scaling, product-driven data solutions, in real time.

There are advantages and disadvantages to data mesh vs data fabric, but both architectures leverage data products as a fundamental construct.

Conclusion

Data products are an emerging data construct, adopted by leading, data-driven organizations. Their value stems from quick discoverability access to trusted data, cutting the time to insights, and driving informed, timely decision making.

Data products fuel operational and analytical workloads, and may be deployed in a data mesh or data fabric architecture – on premises, in the cloud, or in a hybrid environment.

Data teams should seek a data product platform to manage the entire lifecycle of data products, deploy data products at enterprise scale, and with flexibility to support multiple data management architectures and operating models.

Overview

Capabilities

Architecture

Initiative

Industry

Company

Reach Out

News Updates

Resources

Education & Training

Demo

The Practical Guide

What is a Data Product?

Table of Contents

A data product is a reusable data asset that bundles data together with everything needed to make it independently usable by authorized consumers.