Data Fabric vs Data Lake vs Database

Yuval Perlov

Yuval Perlov

CTO, K2View

This article will focus on which data store is best for real-time, massive-scale, hyper-speed operational use cases – Data Product Platform, as a data fabric vs data lake vs database.

Table of Contents

What We Mean by Real-Time Operational Use Cases
What Real-Time Operational Use Cases Require
Big Data Stores Defined
Let’s Evaluate the Options
And the Winner is…

What We Mean by Real-Time Operational Use Cases

In enterprise operations, there are dozens of real-time use cases that require a massive-scale, hyper-speed data architecture capable of supporting thousands, or even millions, of transactions simultaneously. Examples abound:

  • Delivering a single Customer 360, from dozens of underlying legacy systems, to a self-service IVR, customer service agents (CRM), customer self-service portal (web or mobile), chat service agents and bots, as well as field-service technicians

  • Tokenizing data, such as credit card transactions

  • Detecting online fraud

  • Credit scoring

  • Predicting churn, and more…

blog picArtboard 1
Service agents need a real-time customer 360 view to be most effective

What Real-Time Operational Use Cases Require

These use cases require a big data platform that can support split-second response times for performing complex queries, while coping with:

  • Live data – Continually updated from operational systems (millions to billions of updates per day).

  • Big, fragmented data – Terabytes of data spread across dozens of massive databases / tables, often in different technologies.

  • A specific instance of an entity – For example, retrieving complete data for a specific customer, location, device, etc.

  • High concurrency – Thousands of requests per second.

Big Data Stores Defined

To determine the best big data store for real-time enterprise operations, let’s start with some basic definitions.

A data lake, according to the Gartner glossary, refers to a collection of storage instances of various data assets. These assets are stored in a near-exact, or even exact, copy of the source format – structured or unstructured – and maintained in addition to the originating data stores.

A data warehouse is a storage architecture designed to hold data extracted from transaction systems, operational data stores, and external sources. It combines the data in an aggregate, summary form suitable for enterprise-wide data analysis, and reporting for predefined business needs.

A database management system (DBMS) is used for the storage and organization of data that typically has defined formats and structures. DBMSs are categorized by their basic structures and by their use or deployment.

  • A relational DBMS typically includes a Structured Query Language (SQL) application programming interface. It's organized and accessed according to the relationships between data entities.

  • A non-relational (NoSQL) DBMS is frequently used in big data and real-time web applications. Although optimized for use at massive scale, a non-structured database cannot enforce relationships between data entities.

A Data Product Platform (in the form of a data fabric or data mesh) is an integrated layer of connected data, that's ingested and normalized from an enterprise's data sources – regardless of the technology, format, or the whereabouts of the sources. The platform can persist and secure the processed data in its own data store, and delivers it to consuming applications, real-time decisioning/ML/AI engines, and big data stores. It can also integrate, process, and deliver enterprise data in real time.

Let’s Evaluate the Options

The following summarizes the pros/cons of Data Product Platform as a data fabric vs data lake vs databases, while also comparing relational vs non-relational databases.The focus of this comparison is on massive-scale, high-volume, operational use cases, as described above.

Data Lake
(Amazon S3, Azure Data Lake, Apache Hadoop, …)

Data Warehouse
(Snowflake, Amazon Redshift, Google BigQuery, …)


  • Complex data query support, across structured and unstructured data


  • Not optimized for single entity queries, resulting in slow response times.

  • Live data is not supported, so continually updating data is either unreliable, or delivered at unacceptable response times.

Relational Database (Oracle, MS SQL, PostgreSQL, …)


  • SQL support, wide adoption, and ease of use.


  • Non-linear scalability, requiring costly hardware (hundreds of nodes) to perform complex queries, in near real time, on Terabytes of data.

  • High-concurrency, resulting in problematic response times.

NoSQL Database (MongoDB, Redis, Cassandra, …)


  • Distributed datastore architecture, supporting
    linear scalability.


  • SQL not supported, requiring specialized skills.

  • To support data querying, indexes need to be predefined, or complex application logic needs to be built-in, hindering time-to-market and agility.

K2View Data Product Platform (fabric, mesh, or hub architectures)


  • Full SQL support.

  • Distributed datastore architecture, supporting linear scalability.

  • High concurrency support, with real-time performance.

  • Complex query support for single business entities.


And the Winner is…

In the Data Product Platform as a data fabric vs data lake vs database debate, K2View is the platform of choice for massive-scale, high-volume, real-time operational use cases. But they’re even better together. On the one hand, Data Product Platform can prepare trusted data for lakes and warehouses. On the other hand, lakes and warehouses can provide insights back to the K2View platform for real-time use.

Data Product Platform outperforms

all other big data stores for real-time operational use cases.