Table of Contents

    Table of Contents

    Data Fabric vs Data Lake vs Database

    Yuval Perlov

    Yuval Perlov

    CTO, K2view

    This article will focus on which data store is best for real-time, massive-scale, hyper-speed operational use cases – Data Product Platform, as a data fabric vs data lake vs database.

    Table of Contents


    What We Mean by Real-Time Operational Use Cases
    What Real-Time Operational Use Cases Require
    Big Data Stores Defined
    Let’s Evaluate the Options
    And the Winner is…


    What We Mean by Real-Time Operational Use Cases

    In enterprise operations, there are dozens of real-time use cases that require a massive-scale, hyper-speed data architecture capable of supporting thousands, or even millions, of transactions simultaneously. Examples abound:

    • Delivering a single customer 360, from dozens of underlying legacy systems, to a self-service IVR, customer service agents (CRM), customer self-service portal (web or mobile), chat service agents and bots, as well as field-service technicians

    • Tokenizing data, such as credit card transactions

    • Detecting online fraud

    • Credit scoring

    • Predicting churn, and more…

    blog picArtboard 1
    Service agents need a real-time customer 360 view to be most effective.

    What Real-Time Operational Use Cases Require

    These use cases require a big data platform that can support split-second response times for performing complex queries, while coping with:

    • Live data – Continually updated from operational systems (millions to billions of updates per day).

    • Big, fragmented data – Terabytes of data spread across dozens of massive databases / tables, often in different technologies.

    • A specific instance of an entity – For example, retrieving complete data for a specific customer, location, device, etc.

    • High concurrency – Thousands of requests per second.

    Big Data Stores Defined

    To determine the best big data store for real-time enterprise operations, let’s start with some basic definitions.

    A data lake, according to the Gartner glossary, refers to a collection of storage instances of various data assets. These assets are stored in a near-exact, or even exact, copy of the source format – structured or unstructured – and maintained in addition to the originating data stores.

    A data warehouse is a storage architecture designed to hold data extracted from transaction systems, operational data stores, and external sources. It combines the data in an aggregate, summary form suitable for enterprise-wide data analysis, and reporting for predefined business needs.

    A database management system (DBMS) is used for the storage and organization of data that typically has defined formats and structures. DBMSs are categorized by their basic structures and by their use or deployment.

    • A relational DBMS typically includes a Structured Query Language (SQL) application programming interface. It's organized and accessed according to the relationships between data entities.

    • A non-relational (NoSQL) DBMS is frequently used in big data and real-time web applications. Although optimized for use at massive scale, a non-structured database cannot enforce relationships between data entities.

    A Data Product Platform (in the form of a data fabric or data mesh) is an integrated layer of connected data, that's ingested and normalized from an enterprise's data sources – regardless of the technology, format, or the whereabouts of the sources. The platform can persist and secure the processed data in its own data store, and delivers it to consuming applications, real-time decisioning/ML/AI engines, and big data stores. It can also integrate, process, and deliver enterprise data in real time.

    Let’s Evaluate the Options

    The following summarizes the pros/cons of Data Product Platform as a data fabric vs data lake vs databases, while also comparing relational vs non-relational databases.The focus of this comparison is on massive-scale, high-volume, operational use cases, as described above.

    Data Lake
    (Amazon S3, Azure Data Lake, Apache Hadoop, …)

    Data Warehouse
    (Snowflake, Amazon Redshift, Google BigQuery, …)

    Pros

    • Complex data query support, across structured and unstructured data

    Cons

    • Not optimized for single entity queries, resulting in slow response times.

    • Live data is not supported, so continually updating data is either unreliable, or delivered at unacceptable response times.

    Relational Database (Oracle, MS SQL, PostgreSQL, …)

    Pros

    • SQL support, wide adoption, and ease of use.

    Cons

    • Non-linear scalability, requiring costly hardware (hundreds of nodes) to perform complex queries, in near real time, on Terabytes of data.

    • High-concurrency, resulting in problematic response times.

    NoSQL Database (MongoDB, Redis, Cassandra, …)

    Pros

    • Distributed datastore architecture, supporting
      linear scalability.

    Cons

    • SQL not supported, requiring specialized skills.

    • To support data querying, indexes need to be predefined, or complex application logic needs to be built-in, hindering time-to-market and agility.

    Data Product Platform (fabric architecture, mesh architecture, or hub architecture)

    Pros

    • Full SQL support.

    • Distributed datastore architecture, supporting linear scalability.

    • High concurrency support, with real-time performance.

    • Complex query support for single business entities.

     

    And the Winner is…

    In the data fabric vs data lake vs database debate, Data Product Platform is the platform of choice for massive-scale, high-volume, real-time operational use cases. But fabrics, lakes and databases are even better together. On the one hand, Data Product Platform can prepare trusted data for lakes and warehouses. On the other hand, lakes and warehouses can provide insights back to the platform for real-time use.

    Achieve better business outcomeswith the K2view Data Product Platform

    Solution Overview

    Discover the #1 data product platform

    Built for enterprise complexity.

    Solution Overview