Over the past few years, the term “data fabric” has become synonymous with enterprise data integration and management. Analyst firm Gartner lists “data fabric” as one of the “Top 10 Data and Analytics Technology Trends for 2021” and predicts that by 2024, 25% of data management vendors will provide a complete framework for data fabric – up from 5% today.
This paper addresses the what, why, how, and who of data fabric, by citing its definition, purpose, architecture, challenges, best practices, benefits, vendors, as well as a data fabric capability checklist.
Data fabric democratizes data access across the enterprise, at scale. It is a single, unified architecture – with an integrated set of technologies and services, designed to deliver integrated and enriched data – at the right time, in the right method, and to the right data consumer – in support of both operational and analytical workloads.
Data fabric combines key data management technologies – such as data catalog, data governance, data integration, data pipelining, and data orchestration.
Gartner: A data fabric stitches together integrated data from
many different sources and delivers it to various data consumers
Data fabric serves a broad range of business, technical, and organizational alignment drivers.
Gartner: An ideal, complete data fabric design with its many components.
A well designed data fabric architecture is modular and supports massive scale, distributed multi-cloud, on-premise, and hybrid deployment.
As the diagram above illustrates, as data is provisioned from sources to consumers, it is cataloged, enriched to provide insights and recommendations, prepared, delivered, orchestrated, and engineered.
Data sources range from siloed legacy systems to the most modern cloud environments.
Data consumers of the data fabric, include data scientists and data analysts (working with data lakes), marketing analysts (involved with customer segmentation), sales, marketing, and data privacy specialists (concerned with customer segmentation), cloud architects, and more.
Data mesh architecture addresses the four key issues in data management:
Data fabric is well suited to mesh design because it builds an integrated layer of connected data across a broad range of data sources for an instant, holistic view of the business, to both analytical and operational workloads.
Data fabric establishes the semantic definition of the different data products, the data ingestion modes, and the necessary governance policies that secure and protect the data.
Also, the various business domains coordinate the deployment of additional data fabric nodes, giving them control of data pipelines and services.
Data mesh architecture implemented with data fabric
A data fabric that can manage, prepare, and deliver data in real time, creates the ideal data mesh core. Of course, data mesh architecture has its implementation challenges, but these are easily handled by data fabric:
|Data mesh implementation challenges||How they are handled by data fabric|
|Requirement for data integration expertise: Data integration across many different enterprise source systems often requires domain-specific expertise in data pipelining.||Data as a product: When a data product is a business entity managed in a virtual data layer, there’s no need for domains to deal with underlying source systems.|
|Federation vs independence: Achieving the right balance between reliance on central data teams and domain independence isn’t simple.||Enterprise-wide collaboration: Domain-specific teams, in coordination with centralized data teams, build APIs and pipelines for their data consumers, control and govern access rights, and monitor use.|
|Batch data and real-time and batch data delivery: Data products must be provisioned to both offline and online data consumers, securely and efficiently, on a single platform.||Analytical and operational workloads: Data fabric collects and processes data from underlying systems, to supply data products on demand, for offline and online use cases.|
Visual data lineage is a key technique because relational insights are lost
when traditional data modeling and integration tools are used.
To explain how data fabric complements and improves big data stores for operational workloads, a comparison between data fabric, data lakes and databases is useful.
The following chart summarizes the pros/cons of each data store, as it relates to massive-scale, high-volume, operational use cases.
|Data Lake, DWH||
|Data FabricIBM Cloud Pack for Data||
So, while data fabric is a superior solution for high-scale operational workloads, it is also a reciprocal technology to data lakes and data bases for offline analytical workloads. For such workloads, data fabric can:
In enterprise operations, there are scores of use cases that require a high-scale, high-speed data architecture capable of supporting thousands of simultaneous transactions. Examples include:
Therefore, data fabric must include built-in mechanisms for handling:
Data fabric offers many advantages over alternative data management approaches, such as master data management, data hubs, and data lakes , including:
The operational benefits data fabric provides to enterprises include:
There are multiple vendors that deliver an integrated set of capabilities to support the data fabric architecture. The top 5 data fabric vendors appear below:
|IBM Cloud Pack for Data||
It is commonly held that data fabric is built to support big data analytics – specifically trend analysis, predictive analytics, machine learning, and business intelligence – performed by data scientists, in offline mode, to generate business insights.
But data fabric is equally important for operational use cases – such as churn prediction, credit scoring, data privacy compliance, fraud detection, real-time data governance, and 360 customer view – which rely on accurate, complete, and fresh data.
Data teams don’t want to have one data fabric solution for big data analytics, and another one for operational intelligence. They want a single data fabric for both.
The ideal data fabric optimizes the field of vision – and the depth of understanding – for every single business entity – customer, product, order, and so on. It provides enterprises with clean, fresh data for offline big data analytics (“field of vision”), and delivers real-time, actionable data for online operational analytics (“depth of understanding”).
Data fabric supports both offline data analytics, and online operational intelligence.
K2View is the only data fabric capable of responding to entity-centric data queries in real time, at massive scale, and supporting both operational and analytical workloads.
Here are 5 reasons that K2View has become the data fabric of choice among some of the world’s largest enterprises:
K2View Data Fabric unifies the data for every business entity from all underlying source systems into a single micro-database, one for every instance of a business entity.
A customer micro-DB, for example, unifies everything a company knows about a specific customer – including all interactions (emails, phone calls, website portal visits, chats…), transactions (orders, invoices, payments…) and master data – regardless of underlying source systems, their technologies, and data formats. In this case, one micro-DB is managed for every customer.
The micro-DB may be enriched with new fields that are captured or calculated on the fly – such as KPIs, consent information, propensity to churn, and more. And it can be easily defined, using auto-discovery, to extract a suggested data schema from the underlying systems.
A micro-DB represents everything an enterprise knows about a specific business entity.
Each micro-DB is encrypted with its own unique key, so that each entity is uniquely secured. This maintains the highest level of security for data at rest.
K2View Data Fabric can scale to manage hundreds of millions of secured micro-DBs concurrently, and be deployed in a distributed on-premise, on-cloud, or hybrid architecture.
K2View has developed an operational data fabric that ingests data from any source, in any data delivery style, and then transforms it for delivery, to any target, in milliseconds.
K2View Data Fabric provides a low-code / no-code framework to create and debug microservices. Using a visual, drag-and-drop builder, microservices can be quickly customized and orchestrated to support any operational use case. This approach lends itself to treating data as a product and supporting mesh architectures.
Users or tokens that need access to a microservice are assigned a role, which defines the level of data access they have. Once a microservice is deployed, K2View Data Fabric controls authentication and authorization so that user access is properly restricted.
The K2View Data Fabric is a central data hub that delivers a real-time, trusted, and holistic view of any business entity to any consuming applications, data lakes, or data warehouses. The use cases of the data fabric are therefore numerous, and span many departments in the enterprise.