Data Fabric vs Data Lake vs Database

Yuval Perlov

Yuval Perlov

CTO, K2View

This article will focus on which data store is best for real-time, massive-scale, hyper-speed operational use cases – operational data fabric vs data lake vs database.

Table of Contents


What We Mean by Real-Time Operational Use Cases
What Real-Time Operational Use Cases Require
Big Data Stores Defined
Let’s Evaluate the Options
And the Winner is…


What We Mean by Real-Time Operational Use Cases

In enterprise operations, there are dozens of real-time use cases that require a massive-scale, hyper-speed data architecture capable of supporting thousands, or even millions, of transactions simultaneously. Examples abound:

  • Delivering a single 360 customer view, from dozens of underlying legacy systems, to a self-service IVR, customer service agents (CRM), customer self-service portal (web or mobile), chat service agents and bots, as well as field-service technicians

  • Tokenizing credit card transactions

  • Detecting online fraud

  • Credit scoring

  • Predicting churn, and more…

blog picArtboard 1
Service agents need a real-time 360 customer view to be most effective

What Real-Time Operational Use Cases Require

These use cases require a big data platform that can support split-second response times for performing complex queries, while coping with:

  • Live data – Continually updated from operational systems (millions to billions of updates per day).

  • Big, fragmented data – Terabytes of data spread across dozens of massive databases / tables, often in different technologies.

  • A specific instance of an entity – For example, retrieving complete data for a specific customer, location, device, etc.

  • High concurrency – Thousands of requests per second.

Big Data Stores Defined

To determine the best big data store for real-time enterprise operations, let’s start with some basic definitions.

A data lake, according to the Gartner glossary, refers to a collection of storage instances of various data assets. These assets are stored in a near-exact, or even exact, copy of the source format – structured or unstructured – and maintained in addition to the originating data stores.

A data warehouse is a storage architecture designed to hold data extracted from transaction systems, operational data stores, and external sources. It combines the data in an aggregate, summary form suitable for enterprise-wide data analysis, and reporting for predefined business needs.

A database management system (DBMS) is used for the storage and organization of data that typically has defined formats and structures. DBMSs are categorized by their basic structures and by their use or deployment.

  • A relational DBMS typically includes a Structured Query Language (SQL) application programming interface. It's organized and accessed according to the relationships between data entities.

  • A non-relational (NoSQL) DBMS is frequently used in big data and real-time web applications. Although optimized for use at massive scale, a non-structured database cannot enforce relationships between data entities.

A data fabric is an integrated layer of connected data, that's ingested and normalized from an enterprise's data sources – regardless of the technology, format, or the whereabouts of the sources. The data fabric can persist and secure the processed data in its own data store, and delivers it to consuming applications, real-time decisioning/ML/AI engines, and big data stores. An operational data fabric can integrate, process, and deliver enterprise data in real time.

Let’s Evaluate the Options

 

 

 

 

 

 

 

 

The following summarizes the pros/cons of data fabric vs data lake vs databases, while also comparing relational vs non-relational databases.The focus of this comparison is on massive-scale, high-volume, operational use cases, as described above.

Data Lake
(Amazon S3, Azure Data Lake, Apache Hadoop, …

Data Warehouse
(Snowflake, Amazon Redshift, Google BigQuery, …)

Pros

Complex data query support, across structured
and unstructured data

Cons

Not optimized for single entity queries, resulting in slow response times.

Live data is not supported, so continually updating data is either unreliable, or delivered at unacceptable response times.

Relational Database (Oracle, MS SQL, PostgreSQL, …)

Pros

SQL support, wide adoption, and ease of use.

Cons

Non-linear scalability, requiring costly hardware (hundreds of nodes) to perform complex queries, in near real time, on Terabytes of data.

High-concurrency, resulting in problematic response times.

NoSQL Database (MongoDB, Redis, Cassandra, …)

Pros

Distributed datastore architecture, supporting
linear scalability.

Cons

SQL not supported, requiring specialized skills.

To support data querying, indexes need to be predefined, or complex application logic needs to be built-in, hindering time-to-market and agility.

Operational Data Fabric (K2View)

Pros

Full SQL support.

Distributed datastore architecture, supporting linear scalability.

High concurrency support, with real-time performance.

Complex query support for single business entities.

Cons

Querying across multiple entities may not be inherently supported, requiring integration with tools such as Elasticsearch for this purpose.

 

And the Winner is…

In the data fabric vs data lake vs database debate, data fabric is the architecture of choice for massive-scale, high-volume, real-time operational use cases. But they’re even better together. On the one hand, data fabric can prepare trusted data for lakes and warehouses. On the other hand, lakes and warehouses can provide insights back to the data fabric for real-time use.

See how K2View Data Fabric easily outperforms all other big data stores for real-time operational use cases.