This article will focus on which data store is best for real-time, massive-scale, hyper-speed operational use cases – Data Product Platform, as a data fabric vs data lake vs database.
Table of Contents
What We Mean by Real-Time Operational Use Cases
What Real-Time Operational Use Cases Require
Big Data Stores Defined
Let’s Evaluate the Options
And the Winner is…
What We Mean by Real-Time Operational Use Cases
In enterprise operations, there are dozens of real-time use cases that require a massive-scale, hyper-speed data architecture capable of supporting thousands, or even millions, of transactions simultaneously. Examples abound:
-
Delivering a single customer 360, from dozens of underlying legacy systems, to a self-service IVR, customer service agents (CRM), customer self-service portal (web or mobile), chat service agents and bots, as well as field-service technicians
-
Tokenizing data, such as credit card transactions
-
Detecting online fraud
-
Credit scoring
-
Predicting churn, and more…
Service agents need a real-time customer 360 view to be most effective.
What Real-Time Operational Use Cases Require
These use cases require a big data platform that can support split-second response times for performing complex queries, while coping with:
-
Live data – Continually updated from operational systems (millions to billions of updates per day).
-
Big, fragmented data – Terabytes of data spread across dozens of massive databases / tables, often in different technologies.
-
A specific instance of an entity – For example, retrieving complete data for a specific customer, location, device, etc.
-
High concurrency – Thousands of requests per second.
Big Data Stores Defined
To determine the best big data store for real-time enterprise operations, let’s start with some basic definitions.
A data lake, according to the Gartner glossary, refers to a collection of storage instances of various data assets. These assets are stored in a near-exact, or even exact, copy of the source format – structured or unstructured – and maintained in addition to the originating data stores.
A data warehouse is a storage architecture designed to hold data extracted from transaction systems, operational data stores, and external sources. It combines the data in an aggregate, summary form suitable for enterprise-wide data analysis, and reporting for predefined business needs.
A database management system (DBMS) is used for the storage and organization of data that typically has defined formats and structures. DBMSs are categorized by their basic structures and by their use or deployment.
-
A relational DBMS typically includes a Structured Query Language (SQL) application programming interface. It's organized and accessed according to the relationships between data entities.
-
A non-relational (NoSQL) DBMS is frequently used in big data and real-time web applications. Although optimized for use at massive scale, a non-structured database cannot enforce relationships between data entities.
A Data Product Platform (in the form of a data fabric or data mesh) is an integrated layer of connected data, that's ingested and normalized from an enterprise's data sources – regardless of the technology, format, or the whereabouts of the sources. The platform can persist and secure the processed data in its own data store, and delivers it to consuming applications, real-time decisioning/ML/AI engines, and big data stores. It can also integrate, process, and deliver enterprise data in real time.
Let’s Evaluate the Options
The following summarizes the pros/cons of Data Product Platform as a data fabric vs data lake vs databases, while also comparing relational vs non-relational databases.The focus of this comparison is on massive-scale, high-volume, operational use cases, as described above.
Data Lake
|
|
Pros
|
Cons
|
Relational Database (Oracle, MS SQL, PostgreSQL, …) |
|
Pros
|
Cons
|
NoSQL Database (MongoDB, Redis, Cassandra, …) |
|
Pros
|
Cons
|
Data Product Platform (fabric architecture, mesh architecture, or hub architecture) |
|
Pros
|
And the Winner is…
In the data fabric vs data lake vs database debate, Data Product Platform is the platform of choice for massive-scale, high-volume, real-time operational use cases. But fabrics, lakes and databases are even better together. On the one hand, Data Product Platform can prepare trusted data for lakes and warehouses. On the other hand, lakes and warehouses can provide insights back to the platform for real-time use.