Democratize Data Access with a "Data as a Product" Approach

Oren Ezra

Oren Ezra

CMO, K2View

"Data as a product", a core principle of the Data Mesh model, realizes its full potential in Data Product Platform. Learn more in this in-depth article.

Table of Contents


The Proliferation of Data
What is a Data-Driven Enterprise?
What is a Data Product?
The 7 Top Benefits of Data Products
Data as a Product and the Data Delivery Lifecycle
A New Role: Data Product Manager
Best Practices for Data as a Product
The Business Entity – the Logic Behind Data as a Product
A Platform Based on Business Entities
Data Product Platform: Data as a Product Inside

The Proliferation of Data

As digitization grows, so does the amount of data that’s available to an enterprise. The sheer volume of digital products, services, and business models, combined with greater connectivity to devices, has led data to proliferate exponentially. With 90% of the world’s data created in the past 2 years, enterprises are becoming more and more data-driven.

According to McKinsey, data-driven companies are

23 times more likely to acquire customers, and
19 times more likely to be profitable.

 

What is a Data-Driven Enterprise?

A data-driven enterprise maximizes the value of its data by treating its data as a product, and differentiating data based on its overall quality (e.g., completeness, availability, accessibility, and general fitness for use). It treats data as a product in order to drive business outcomes, for example:

  • A telco predicting likelihood to churn in real time, during a customer interaction

  • A media company serving personalized content to its subscribers

  • A bank promoting a new financial product to a targeted client segment

What is a Data Product?

Data products are a foundational concept of the data mesh.

A data product is created with a specific purpose in mind, to make a trusted dataset accessible to authorized data consumers. It encapsulates everything a data consumer needs to generate value from the data. Common examples include:

  • Delivering a Customer 360 dataset to a CRM application, including transactions, interactions, and master data.

  • Tokenizing sensitive customer information for use by operational and analytical systems

  • Pipelining retail inventory data from chain stores into a central data lake for AI/ML analysis

  • Preparing a masked test dataset and integrating it with a CI/CD pipeline, in support of the agile delivery of a wealth management system

Data products often correspond to business entities, such as customers, suppliers, devices, locations, or warehouses. Since a business entity's data is often scattered across many different source systems, a data product requires data integration, unification, and ongoing synchronization of the its data with the underlying source systems.

The data product is comprised of its definition (metadata) and the resulting dataset (once instantiated), as described below:

Data product definition (metadata):

  • Static metadata, encompassing all the relevant tables and fields that capture the data product's data

  • Active metadata, including data product usage and performance

  • Synchronization rules, defining how and when data is synced with the source systems

  • Algorithmsthat transform, process, enrich, and mask the raw data

  • Data connectors, that ingest the source data into the data product, and deliver the dataset to data consumers; for example: JDBC, web services, Kafka, CDC, messaging, virtualization

  • Data governance policies, ensuring that data governance (quality and privacy compliance) is enforced according to internal and external regulations 

  • Access controls, including credential validation and authentication

Data product dataset:

  • Managed as a unit, making it easy to process and access

  • Unified, cleansed, masked, and enriched

  • Persisted, virtualized, or cached

  • Auditable, ensuring data changes are logged in an audit log

The data product's definition and data are managed separately, with a data product having a single definition, and multiple instances of its data.

The 7 Top Benefits of of Data Products

By taking the "data as a product" approach, organizations can enjoy the following benefits:

  1. Ensure their data initiatives are business-driven and outcome-focused

  2. Democratize data access to authorized data consumers across the organization

  3. Enable agile delivery of incremental value through data

  4. Provide a common language between business and IT

  5. Achieve efficiencies through reuse of data products across use cases

  6. Elevate the organization's trust in data

  7. Future-proof their data architectures (data mesh architecture / data fabric architecture / data hub architecture)

Data as a Product and the Data Delivery Lifecycle 

To take a "Data as a Product" approach, data teams must adopt a cross-functional product lifecycle approach to data. The data product delivery lifecycle should follow agile principles, by being short and iterative – to deliver quick, incremental value to authorized data consumers.

Define and Design the Data Product
Define the data requirements, within the context of the business objectives, the constraints of data privacy and governance, and the inventory of existing data assets. Design how the data will be structured and how it will be componentized as a product, to be consumed via services.

Engineer the Data Product
Build the data product according to the requirements by identifying, integrating, and collating the data from its sources, and then employing data masking as needed. Create APIs for data services, to provide consuming applications with the right credentials to access the data product, and design a data pipeline to securely publish the data to subscribers. 

QA the Data Product
Test and validate the data to ensure it is complete, compliant, and timely, and that it can be securely consumed by applications at high scale. 

Support and Maintain the Data Product
Continually monitor data usage, pipeline performance, and reliability, and work closely with data engineering to address issues as defined in the SLAs.

A New Role: Data Product Manager

Much like software product development, where the software product manager is responsible for gathering user needs, prioritizing them, and working with software development and QA to ensure the right product is delivered at the right time, we believe that there is a place for a similar role in the data team. The data product manager will be responsible to collect data needs from data consumers (data scientists, data analysts, application owners), prioritize them, and work closely with data engineering to deliver the data product on time and on budget.

The data product must deliver business value, and realize ROI, such as more informed decision making, quicker application development, and more. For this to happen effectively, the data delivery must have a definitive timeline – a kind of service level agreement between IT and business.

Right Data in Right Time to Right User

In the Data as a Product approach, data engineers, data testers, and data product
managers collaborate to deliver the right data to the right users at the right time

Best Practices for Data as a Product

Close collaboration
Data collectors and custodians should work closely with their consumers. This calls for experimentation and product evolution, and the ability to develop new features, or rollback changes, as needed.

Agile development
Data products must be developed quickly and reliably, meaning that data assets should be decoupled as much as possible. An automated data catalog would be a good first step.

Comprehensive QA
By definition, building data products is a process. Data teams should always have a good CI/CD setup in place, and do their best to identify issues through automated test data management and data quality checks. And when things go wrong (and they inevitably do), be sure to learn from your mistakes to improve the data product.

High-speed availability
Data products have to be used by consumers in order to judge their value, so data engineers need to make them available quickly and easily. Standard interfaces should be used to accommodate the needs of diverse teams.

The Business Entity – the Logic Behind Data as a Product

The most obvious way to engineer a data product is to model it around the business entity that it supports, such as a customer, employee, credit card, product, or anything else that is important to the business.

Each business entity (customer John Smith) should be complete in all its attributes, enriched via analytics (propensity to churn), and easily accessible to any data consumer (person or application) that has access rights to that entity.

Usage of the business entity should be measurable. How is the data accessed, and how long does it take to get to it (response time)? How often is it accessed, and by whom? Who tried to access it, but didn’t have the right credentials? Which insights did it drive? The list goes on and on.

The overall quality of the data product must be assured, in terms of completeness, integrity, and freshness, in the sense that it’s always up-to-date.

A Platform Based on Business Entities

An entity-centric data product platform, which manages, prepares, and delivers data in the form of business entities, is the ideal platform for delivering data products to data consumers, because it inherently defines and manages the entire lifecycle of the data products.

A data product platform defines an intermediary data schema aggregating all the attributes of a business entity (such as a customer, product, location or order) across all systems, in order to prepare and deliver the data as an integrated data product.

Such a platform is key to supporting the data as a product methodology. It essentially integrates data, from all sources, by business entities – cleansing, validating, enriching, transforming it, in flight, and employing data masking tools, when required. It may be deployed as a data mesh, data fabric, or customer data platform/hub. 

Data Product Platform: Data Products Inside

Data Product Platform, with its patented approach to organizing data by business entities, transforms fragmented enterprise data into data products – enabling companies to proactively adopt the "data as a product" mindset necessary to sustain data-driven leadership.

Read Gartner on Data Mesh vs Data Fabric.