What is Dark Data and How Do Data Products Shed Light on it?

Lion Brotzky

Lion Brotzky

VP Product

Today’s enterprises strive to be “data-driven”, but can’t access a good part of their potentially valuable data. Data products get business-critical data out of the dark.

Table of Contents


The Dark Data Dilemma
What is Dark Data?
Why Does Dark Data Contain Business-Critical Information?
Data Products Illuminate Dark Data
How Data Products Extract Value from Dark Data
Data Products for Operations and Analytics
Become Enlightened with Data Products

The Dark Data Dilemma

Although modern enterprises generate volumes of potentially valuable data, the vast majority of it (80%) is “in the dark”, in the sense that it is inaccessible and unused.

This “dark data” is not driving business decisions, it’s not being used to improve operational efficiencies, and it’s not improving the customer experience. More often than not, this data deluge actually weighs companies down.

To become truly data-driven, enterprises need the ability to unify fragmented data sources, and liberate dark data, so they can access all the information they need in real-time.

IT teams and business intelligence units are scrambling to keep up with demands for data and insights across the enterprise. Meanwhile, data consumers, across all domains, lack the information they need to improve business and operational outcomes.

So, in today’s enterprise, all data stakeholders are interested in leveraging their data assets to the fullest.

What is Dark Data?

According to analyst firm Gartner, dark data is the mass of information enterprises collect, process, and store, but generally fail to capitalize on (e.g., for analytics, business relationships, and direct monetization).

Dark data often accounts for the vast majority of an organization’s information assets. But because storing and securing data typically incur more expense and risk than value, companies often retain dark data for compliance purposes only.

Although the bulk of dark data is probably not useful for the business, there is still a lot of business-critical data hidden in the dark.

Why Does Dark Data Contain Business-Critical Information?

Dark data holds so much valuable business information because enterprise data is often:

  • Fragmented across many different systems
    Businesses today use hundreds of enterprise systems, each of which produces data. In every department, and within each system, the volume of data is growing, but data consumers lack the ability to extract and unify it to derive meaningful insights.

  • Held captive in vendor-owned applications
    When data is collected and managed by third-party systems, data consumers within your business struggle to access it due to a lack of APIs, documentation, or expertise.

  • Locked within in-house legacy systems
    Legacy systems usually store data in legacy databases – VSAM, IMS, DB2/MF DB400 – some of which, are non-relational and/or very difficult to access technically.

  • Not standardized
    Data that is collected across disparate systems may be structured and unstructured in dozens of technologies and formats. Without a means to unify the data under a common structure, with metadata readily understood by business and IT, it cannot be aggregated in a way that is understandable to data consumers.

  • Non-compliant with the most current privacy laws
    In many cases, the need to protect sensitive data in compliance with the ever-growing number of privacy regulations, introduces complexities and time delays.

  • Growing at an exponential rate
    It’s estimated that the amount of digital data businesses generate is doubling every 2 years! It’s no wonder that an enterprise’s ability to collect data, far exceeds its ability to analyze it.

Data Products Illuminate Dark Data

A data product approach is the answer to delivering business value from dark data. The concept of data products is derived from the data mesh framework, which asserts that each business domain should be able to define, access, and control its own data products.

A data product is built to fulfill the needs of one or more data consumers. It corresponds to a specific business entity (e.g., a customer, product, location, etc.), enabling data consumers to easily access fresh, complete, and compliant data for analytical and/or operational workloads – wherever and whenever they need it, and no matter where the data originates from.

51aData products illuminate dark data, deriving business value for data consumers and domains.

Data products are likely to contain dark data, originating in dozens of fragmented source systems, each of which could rely on different technologies, structures, formats, and terminologies.

Each data product unifies everything a company knows about a particular business entity, including transactions, interactions, and master data – whether structured or unstructured.

A data product contains everything a data consumer needs to extract value, including the dark data associated with it

Data products are:

  • Defined via metadata, which includes schema, data processing logic, integration methods, masking techniques, access controls, and more

  • Created from source data, residing in existing systems (where each individual data product has its own data, managed as a holistic unit)

  • Enriched with situational awareness, derived from real-time and offline analytics and/or ML algorithms.

  • Continually syncing, with their source systems

  • Easily accessed by authorized data consumers, in a variety of data delivery methods

How Data Products Extract Value from Dark Data

Data products enable businesses to derive value from dark data by:

  • Providing a single source of truth
    With the ability to gain a holistic, complete view of business entity data, fragmentation and data silos no longer prevent data consumers from seeing the whole picture. A data product delivers all of the relevant information concerning a specific business entity, irrespective of the source systems where dark data may be hidden.

  • Federating autonomy to domains
    Data products are federated, which means each business domain in the organization can manage its own data autonomously – gaining insights from dark data, as needed – without relying on a centralized data management team.

  • Enabling a common language between IT and business
    A data product schema creates a semantic abstraction layer that hides data complexity (such as specific domain expertise, technical jargon, or the code required to extract value form dark data) from data consumers. It basically forces tech-savvy data teams to use a common language that’s easily understood by their business counterparts.

  • Delivering up-to-date, contextualized data
    One of the many challenges of dark data is preventing data from “perishing,” or becoming obsolete over time. For example, customer data might only be relevant at a specific moment in time, such as when an individual customer calls customer support. Data products ensure all of the data provided is fresh, complete, updated, and contextualized to fulfill current objectives.

51a copy

Customer data is often only good for the moment.
Data products ensure that all data is up to date.

Data Products for Operations and Analytics

Here’s a quick overview of some of the key workloads that data products fuel.

  • Customer Data Hub
    Integrate fragmented customer data from different sources into customer data products stored in a customer data hub. Unify and transform structured and unstructured data into a single customer view to provide a clear, real-time understanding of the customer that can be used by any department in the enterprise at scale.

  • Operational Intelligence
    Gain real-time visibility and insights from data products – to minimize customer churn, assess risk, recommend the next best action, prevent fraud, and so on.

  • Data Privacy and Governance
    Govern customer data to comply with GDPR, CCPA, LGPD, and easily configure new privacy rules.

  • Cloud Migration
    Auto-discover and match data from dispersed on-premise systems and pipeline it to new cloud applications, data lakes, and DWHs.

  • Test Data Provisioning
    Enable instant, on-demand test data provisioning to support agile software delivery and accelerate development cycles.

  • Legacy Application Modernization
    Build new applications on top of digital entities so you can gradually retire legacy systems for cost rationalization.

  • Data Pipelining
    Create high-speed pipelines that ensure data integrity and agility, from source systems, to data lakes and data warehouses.

Become Enlightened with Data Products

If your data is in the dark, so are you. Without the capability to access business entity data in real-time, it’s impossible to derive insights that can be used to improve business outcomes.

With data products, you gain instant access to unified, clean, and complete data that would otherwise remain buried within the depths of your enterprise, inaccessible and unusable. With a data product manager as part of your team, you'll make smarter business decisions, improve customer experience, and enhance operational outcomes.

Learn more about data products through the lens of data mesh.