Take a "Data as a Product" Approach to Democratize Data Access

Oren Ezra

Oren Ezra

CMO, K2View

For data-driven enterprises, data is an incredibly valuable commodity. But in order to take a "data as a product"  approach, its overall quality must be assured. A data product platform does just that.

Table of Contents

The Proliferation of Data
What is a Data-Driven Enterprise?
What is a Data Product?
Data as a Product and the Data Delivery Lifecycle
A New Role: Data Product Manager
Best Practices for Data as a Product
The Business Entity – the Logic Behind Data as a Product
Data Product Platform Based on Business Entities
K2View Data Product Platform: Data as a Product Inside

The Proliferation of Data

As digitization grows, so does the amount of data that’s available to an enterprise. The sheer volume of digital products, services, and business models, combined with greater connectivity to devices, has led data to proliferate exponentially. With 90% of the world’s data created in the past 2 years, enterprises are becoming more and more data-driven.

According to McKinsey, data-driven companies are
23 times more likely to acquire customers, and
19 times more likely to be profitable.

What is a Data-Driven Enterprise?

A data-driven enterprise maximizes the value of its data by treating its data as a product, and differentiating data based on its overall quality (e.g., completeness, availability, accessibility, and general fitness for use). It treats data as a product in order to drive business outcomes, for example:

  • A telco predicting likelihood to churn in real time, during a customer interaction

  • A media company serving personalized content to its subscribers

  • A bank promoting a new financial product to a targeted client segment

What is a Data Product?

Data products are a foundational concept of the data mesh.

A data product is created with a specific purpose in mind. It encapsulates everything a data consumer needs to generate value from the data. Common examples include:

  • Delivering a 360-degree view of a customer to a CRM application, including transactions, interactions, and master data.

  • Tokenizing sensitive customer information for use by operational and analytical systems

  • Pipelining retail inventory data from chain stores into a central data lake for AI/ML analysis

  • Preparing a masked test dataset and integrating it with a CI/CD pipeline, in support of the agile delivery of a wealth management system.

Data products often correspond to business entities, e.g., customer, supplier, device, location, warehouse. Since a business entity's data is often scattered across many different source systems, a data product requires integration, unification, and ongoing synchronization of the its data with the underlying source systems.

The data product is made of its definition and its data, as described below.

Data product definition:

  • Static metadata, encompassing all the relevant tables and fields that capture the data product's data

  • Synchronization rules, defining how and when data is synced with the source systems

  • Algorithms (code) that transform, process, enrich, and mask the  raw data

  • How data is ingested into the data product, and how the data product is delivered or accessed; for example: JDBC, web services, Kafka, CDC, messaging, virtualization

  • Data preparation flows

  • Data lineage from the data product to the source systems 

  • Access controls, including credential validation and authentication

Data product data:

  • Managed as a unit, making it easy to process and access

  • Unified, cleansed, masked, and enriched

  • Persisted, virtualized, or cached

  • Active metadata, including data product usage and performance

  • Data changes are logged in an audit log

The data product's definition and data are managed separately, with a data product having a single definition, and multiple instances of its data.

Data as a Product and the Data Delivery Lifecycle 

To take a "Data as a Product" approach, data teams must adopt a cross-functional product lifecycle approach to data. The data product delivery lifecycle should follow agile principles, by being short and iterative - to deliver quick, incremental value to consumers of the data.

Define and Design the Data Product
Define the data requirements, within the context of the business objectives, the constraints of data privacy and governance, and the inventory of existing data assets. Design how the data will be structured and how it will be componentized as a product, to be consumed via services.

Engineer the Data Product
Build the data product according to the requirements by identifying, integrating, and collating the data from its sources, and then masking it as needed. Create web services APIs to enable consuming applications with the right credentials to access the data product, and devise pipelines to securely publish the data to subscribers. 

QA the Data Product
Test and validate the data to ensure it is complete, compliant, and timely, and that it can be securely consumed by applications at high scale. 

Support and Maintain the Data Product
Continually monitor data usage, pipeline performance, and reliability, and work closely with data engineering to address issues per defined SLAs.

A New Role: Data Product Manager

Much like software product development, where the software product manager is responsible for gathering user needs, prioritizing them, and working with software development and QA to ensure the right product is delivered at the right time, we believe that there is a place for a similar role in the data team. The data product manager will be responsible to collect data needs from data consumers (data scientists, data analysts, application owners), prioritize them, and work closely with data engineering to deliver the data product on time and on budget.

The data product must deliver business value, and realize ROI, such as more informed decision making, quicker application development, and more. For this to happen effectively, the data delivery must have a definitive timeline – a kind of service level agreement between IT and business.

27 image1

In the Data as a Product approach, data engineers, data testers, and data product
managers collaborate to deliver the right data to the right users at the right time

Best Practices for Data as a Product

Close collaboration
Data collectors and custodians should work closely with their consumers. This calls for experimentation and product evolution, and the ability to develop new features, or rollback changes, as needed.

Agile development
Data products must be developed quickly and reliably, meaning that data assets should be decoupled as much as possible. A good data catalog would be a good first step.

Comprehensive QA
By definition, building data products is a process. Data teams should always have a good CI/CD setup in place, and do their best to identify issues through automatic testing and data quality checks. And when things go wrong (which they inevitably will), be sure to learn from mistakes and improve the product.

High-speed availability
Data products have to be used by consumers in order to judge their value, so data engineers need to make them available quickly and easily. Standard interfaces should be used to accommodate the needs of diverse teams.

27 image2

Cross-functional collaboration leads to more flexible development, better QA, and quicker availability.

The Business Entity – the Logic Behind Data as a Product

The most obvious way to engineer a data product is to model it around the business entity that it supports, such as a customer, employee, credit card, product, or anything else that is important to the business.

Each business entity (customer John Smith) should be complete in all its attributes, enriched via analytics (propensity to churn), and easily accessible to any data consumer (person or application) that has access rights to that entity.

Usage of the business entity should be measurable. How is the data accessed, and how long does it take to get to it (response time)? How often is it accessed, and by whom? Who tried to access it, but didn’t have the right credentials? Which insights did it drive? The list goes on and on.

The overall quality of the data product must be assured, in terms of completeness, integrity, and freshness, in the sense that it’s always up-to-date.

Data Product Platform Based on Business Entities

An entity-centric data product platform, which manages, prepares, and delivers data in the form of business entities, is the ideal platform for delivering data products to data consumers, because it inherently defines and manages the entire lifecycle of the data products.

A data product platform defines an intermediary data schema aggregating all the attributes of a business entity (such as a customer, product, location or order) across all systems, in order to prepare and deliver the data as an integrated data product.

Such a platform is key to supporting the data as a product methodology. It essentially integrates data, from all sources, by business entities – cleansing, validating, enriching, masking, and transforming it, in flight. It may be deployed as a data fabric, data mesh, or data hub architecture

K2View Data Product Platform: Data as a Product Inside

K2View Data Product Platform, with its patented approach to organizing data by business entities, transforms fragmented enterprise data into data products – enabling companies to proactively adopt the data as a product mindset necessary to sustain data-driven leadership.


Learn how the concept of "data as a product" can work for you by reading the Gartner report on data mesh vs. data fabric.