K2VIEW DATA FABRIC

What is a Data Fabric?
The Complete Guide


This paper addresses the what, why, how, and who of data fabric, including data fabric architecture, challenges, benefits, core capabilities, vendors, and more. 

fabric-Oct-14-2021-12-08-45-34-PM

INTRO

Data Fabric – a “must-have” for data-centric enterprises

Over the past few years, the term “data fabric” has become synonymous with enterprise data integration and management. Analyst firm Gartner lists “data fabric” as one of the “Top 10 Data and Analytics Technology Trends for 2021” and predicts that by 2024, 25% of data management vendors will provide a complete framework for data fabric – up from 5% today.

This paper addresses the what, why, how, and who of data fabric, by citing its definition, purpose, architecture, challenges, best practices, benefits, vendors, as well as a data fabric capability checklist.

Chapter 01

Data fabric overview

Data fabric democratizes data access across the enterprise, at scale. It is a single, unified architecture – with an integrated set of technologies and services, designed to deliver integrated and enriched data – at the right time, in the right method, and to the right data consumer – in support of both operational and analytical workloads.

Data fabric combines key data management technologies – such as data catalog, data governance, data integration, data pipelining, and data orchestration.


Data fabric diagram from Gartner
Gartner: A data fabric stitches together integrated data from
many different sources and delivers it to various data consumers

 

Chapter 02

Why data fabric

Data fabric serves a broad range of business, technical, and organizational alignment drivers.

Business drivers

  • Reducing time to insights and making more informed decisions, by pipelining data into data lakes and warehouses reliably and quickly.
  • Gaining a real-time, 360-degree view of any business entity– such as a customer, claim, order, device, or retail outlet – to achieve micro-segmentation, reduce customer churn, alert on operational risks, or deliver personalized customer service.
  • Lowering the total cost of ownership to operate, scale, maintain, and change legacy systems by incrementally and quickly modernizing these systems.

Data management drivers

  • Data preparation automation saves data scientists data engineers, and other IT resources, from undertaking tedious repetitive data transformation, cleansing, and enrichment tasks.
  • Gaining access to enterprise data in any data delivery method – including bulk data movement (ETL), data virtualization, data streaming, change data capture, and APIs.
  • A data fabric platform integrates and augments a company’s data management tools currently in use, and enables the retirement of others, for increased cost effectiveness.

Organizational drivers

Chapter 03

Data fabric architecture


Screen Shot 2021-09-15 at 11.45.25

Gartner: An ideal, complete data fabric design with its many components.

A well designed data fabric architecture is modular and supports massive scale, distributed multi-cloud, on-premise, and hybrid deployment.

As the diagram above illustrates, as data is provisioned from sources to consumers, it is cataloged, enriched to provide insights and recommendations, prepared, delivered, orchestrated, and engineered.

Data sources range from siloed legacy systems to the most modern cloud environments.

Data consumers of the data fabric, include data scientists and data analysts (working with data lakes), marketing analysts (involved with customer segmentation), sales, marketing, and data privacy specialists (concerned with customer segmentation), cloud architects, and more.

Chapter 04

Data fabric for mesh architecture

Data mesh architecture addresses the four key issues in data management:

  • Data scattered among scores, and at times hundreds, of legacy and cloud systems, making it difficult to achieve a single source of truth
  • Speed and volume of data, that data-centric enterprises have to deal with
  • Data hard to get to, when access often requires data engineering
  • Lack of communication between business analysts, operational data consumers, data engineers, and data scientists.


Data fabric is well suited to mesh design because it builds an integrated layer of connected data across a broad range of data sources for an instant, holistic view of the business, to both analytical and operational workloads.

Data fabric establishes the semantic definition of the different data products, the data ingestion modes, and the necessary governance policies that secure and protect the data.

Also, the various business domains coordinate the deployment of additional data fabric nodes, giving them control of data pipelines and services.

data mesh-2

Data mesh architecture implemented with data fabric

A data fabric that can manage, prepare, and deliver data in real time, creates the ideal data mesh core. Of course, data mesh architecture has its implementation challenges, but these are easily handled by data fabric:

Data mesh implementation challenges How they are handled by data fabric
Requirement for data integration expertise: Data integration across many different enterprise source systems often requires domain-specific expertise in data pipelining. Data as a product: When a data product is a business entity managed in a virtual data layer, there’s no need for domains to deal with underlying source systems.
Federation vs independence: Achieving the right balance between reliance on central data teams and domain independence isn’t simple. Enterprise-wide collaboration: Domain-specific teams, in coordination with centralized data teams, build APIs and pipelines for their data consumers, control and govern access rights, and monitor use.
Batch data and real-time and batch data delivery: Data products must be provisioned to both offline and online data consumers, securely and efficiently, on a single platform. Analytical and operational workloads: Data fabric collects and processes data from underlying systems, to supply data products on demand, for offline and online use cases.

 

Chapter 05

Data fabric core capabilities

screensorch

Visual data lineage is a key technique because relational insights are lost
when traditional data modeling and integration tools are used.

Data fabric supports the following key capabilities integrated into a single platform:

  1. Data catalog
    To classify and inventory data assets, and represent information supply chains visually
  2. Data engineering
    To build reliable and robust data pipelines for both operational and analytical use cases
  3. Data governance
    To assure quality, comply with privacy regulations, and make data available – safely and at scale
  4. Data preparation and orchestration
    To define the data flows from source to target, including the sequence of steps for data cleansing, transformation, masking, enrichment, and validation
  5. Data integration and delivery
    To retrieve data from any source and deliver it to any target, in any method: ETL (bulk), messaging, CDC, virtualization, and APIs
  6. Data persistence layer
    To persist data dynamically in a broad range of relational and non-relational models

Data fabric should also address the following key non-functional capabilities:


  1. Data scale, volume, and performance
    • Dynamically scale both up and down, seamlessly, no matter how large the data volume.
    • Support both operational and analytical workloads, at enterprise scale.

  2. Accessibility
    • Support all data access modes, data sources, and data types, and integrate master and transactional data, at rest, or in motion.
    • Ingest and unify data from on-premise and on-cloud systems, in any format – structured or unstructured.
    • The data fabric logical access layer needs to allow for data consumption, regardless of where, or how, the data is stored, or distributed – so no in-depth knowledge of underlying data sources is necessary.

  3. Distribution
    • Data fabric should be deployable in a multi-cloud, on premise, or hybrid environments.
    • To maintain transactional integrity and data governance capabilities, data fabric needs to support a smart data virtualization strategy.

  4. Security
    • Where data is persisted, it must be encrypted and masked to meet data privacy regulations.
    • Data Fabric should be able to deliver user credentials to the source systems, so that access rights are properly checked and authorized.

Chapter 06

Data fabric vs data lakes vs databases for operational workloads

To explain how data fabric complements and improves big data stores for operational workloads, a comparison between data fabric, data lakes and databases is useful.

The following chart summarizes the pros/cons of each data store, as it relates to massive-scale, high-volume, operational use cases.

  Pros Cons  
Data Lake, DWH
  • Complex data query support, across structured and unstructured data
  • Not optimized for single entity queries, resulting in slow response times
  • Live data is not supported, so continually updating data is either unreliable, or delivered at unacceptable response times
 
Relational Database
  • SQL support, wide adoption, and ease of use
  • Non-linear scalability, requiring costly hardware (hundreds of nodes) to perform complex queries, in near real time, on Terabytes of data
  • High-concurrency, resulting in problematic response times
 
NoSQL Database
  • Distributed datastore architecture, supporting linear scalability
  • SQL not supported, requiring specialized skills
  • To support data querying, indexes need to be predefined, or complex application logic needs to be built-in, hindering time-to-market and agility
 
Data FabricIBM Cloud Pack for Data
  • Full SQL support
  • Distributed datastore architecture, supporting linear scalability
  • High concurrency support, with real-time performance for operational workloads
  • Complex query support for single business entities
  • Complex query support for single business entities
  • Supports all integration methods
  • High-scale data preparation and pipelining into data lakes and warehouses for analytical workloads
  • Dynamic data governance
   

 

So, while data fabric is a superior solution for high-scale operational workloads, it is also a reciprocal technology to data lakes and data bases for offline analytical workloads. For such workloads, data fabric can:

  1. Pipeline fresh, trusted data INTO them, for offline analytics purposes.
  2. Receive business insights FROM them, to embed into real-time operational use cases.

Chapter 07

Data fabric use cases

In enterprise operations, there are scores of use cases that require a high-scale, high-speed data architecture capable of supporting thousands of simultaneous transactions. Examples include:

  • Delivering a 360 customer view
    Delivering a single view of the customer to a self-service IVR, customer service agents (CRM), customer self-service portal (web or mobile), chat service bots, and field-service technicians

  • Complying with data privacy laws
    With a flexible workflow and data automation solution that orchestrates compliance across people, systems, and data – designed to address current and future regulations

  • Pipelining enterprise data into data lakes and warehouses
    Enabling data engineers to prepare and deliver fresh, trusted data – from all sources, to all targets – quickly and at scale

  • Provisioning test data on demand
    Creating a test data warehouse, and delivering anonymized test data to testers and CI/CD pipelines – automatically, and in minutes – with complete data integrity

  • Modernizing legacy systems
    Safely migrating data from legacy systems into data fabric, and then using the fabric as the database of record for newly developed applications
  • Securing credit card transactions
    Protecting sensitive cardholder information by encrypting and tokenizing the original data to avoid data breaches

  • Predicting churn, detecting customer fraud, credit scoring, and more

Many of the operational use cases require the data fabric to respond to complex queries in a split second.

Therefore, data fabric must include built-in mechanisms for handling:

  • Live data ingestion
    Continually updated from operational systems (with millions, to billions, of updates per day)

  • Connectivity to disparate systems
    With terabytes of data spread across dozens of massive databases / tables, often in different technologies

  • In-flight data transformation, data cleansing, and data enrichment
    To deliver meaningful insights and influence business outcomes in real time

  • A specific instance of an entity
    Such as, retrieving complete data for a specific customer, location, device, etc.

  • High concurrency
    To the tune of thousands of requests per second

Chapter 08

Data fabric advantages

Data fabric offers many advantages over alternative data management approaches, such as master data management, data hubs, and data lakes , including:

  • Enhanced data management
    Allowing data to be retrieved, validated, and enriched automatically – without any transformation scripts, or third-party tools
  • Expanded data services
    Using innovative engines to manage and synchronize data with full support for SQL, and an embedded web services layer

  • High consistency, durability, and availability
    Meeting enterprise standards, with a distrusted database layer and processing engine

  • Excellent performance
    Relying on an architecture capable of running every query on a small amount of data, and in-memory processing

  • Tight security
    Eliminating the possibility of mass data breaches, due to a sophisticated, multi-key encryption engine

Chapter 09

Data fabric benefits

The operational benefits data fabric provides to enterprises include:

  • Simplified data orchestration
    Integrating operators for external databases, business logic, masking, parsing, and streaming

  • Automated test data management
    Generating data from production systems, and then providing high-quality test data to testing teams

  • Rapid data privacy compliance
    Configuring, managing, and auditing Data Subject Access Requests associated with data privacy regulations such as GDPR, CCPA, LGPD, etc.

  • Comprehensive data administration
    Configuring, monitoring, and administrating data, with admin management tools, an intuitive visualization studio, and web admin tools

  • Optimized cost of ownership
    Relying on in-memory performance on commodity hardware, complete linear scalability, and risk-free integration

Chapter 10

Data fabric vendors

There are multiple vendors that deliver an integrated set of capabilities to support the data fabric architecture. The top 5 data fabric vendors appear below:

  Strengths Concerns   
K2View
  • Single, integrated platform, combining all data fabric capabilities
  • Data is uniquely organized by business entity, for real-time data pipelining, and “x360” workloads at scale
  • Support for massive data workloads requiring real-time data integration and movement
  • Full support for both analytical and operational workloads
  • Quick deployment (typically in a few weeks) and easy adaption, supporting agile, and CI/CD
  • Low total cost of ownership (TCO)
  • Focus on large enterprises, with relatively few mid-sized customers
  • High concentration of deployments in telco, healthcare, and financial services markets
  • Few system integration partners outside North America and Europe
 
Denodo
  • Focus and strength in data virtualization
  • Catalog serves as a single-entry point for enforcing security and governance
  • Broad go-to-market partnerships
  • Optimization for analytics use cases
  • Complexity in managing and operating the data fabric
  • Not applicable for high-volume operational workloads
  • Processes and effort required to ensure distributed query performance on the platform
 
Talend
  • Focus and strength in data integration across multi-cloud and hybrid ecosystems
  • Wide-ranging capabilities for data engineering
  • Broad set of connectors for a large variety of data sources
  • Not applicable for high-volume operational workloads; best suited for analytics use cases
  • Support required for complex data orchestration and data pipeline operationalization
  • Limited data virtualization capabilities
 
Informatica
  • Use of AI and ML for augmented data integration and data quality support
  • Strengths in data integration for optimized analytics, data migration, and MDM
  • Ability to scale in support of complex data integration scenarios
  • Complex and costly deployment and adaption
  • Data virtualization support required
  • Limited real-time data pipelining capabilities, making it less suitable for operational workloads, where real-time data integration is required
  • Multiple disjointed tools acquired over time and not yet integrated into a single platform
 
IBM Cloud Pack for Data
  • Strong product scalability and performance
  • Diverse data integration delivery styles and architectures
  • Data virtualization and metadata management
  • Improved integration capabilities repackaged as Cloud Pak for Data
  • Data fabric is comprised of multiple standalone products, creating uncertainty around the platform’s structure, cost, and deployment
  • Complex architecture, resulting in difficult and costly upgrades
  • Self-service, and cloud-based data integration capabilities required
 

Chapter 11

Data fabric for analytics and operations

It is commonly held that data fabric is built to support big data analytics – specifically trend analysis, predictive analytics, machine learning, and business intelligence – performed by data scientists, in offline mode, to generate business insights.

But data fabric is equally important for operational use cases – such as churn prediction, credit scoring, data privacy compliance, fraud detection, real-time data governance, and 360 customer view – which rely on accurate, complete, and fresh data.

Data teams don’t want to have one data fabric solution for big data analytics, and another one for operational intelligence. They want a single data fabric for both.

The ideal data fabric optimizes the field of vision – and the depth of understanding – for every single business entity – customer, product, order, and so on. It provides enterprises with clean, fresh data for offline big data analytics (“field of vision”), and delivers real-time, actionable data for online operational analytics (“depth of understanding”).

blog Choosing the right data fabric solutionData fabric supports both offline data analytics, and online operational intelligence.

Here’s how:

  1. The data fabric continually provisions high-quality data, based on a 360 view of business entities – such as a certain segment of customers, a line of company products, or all retail outlets in a specific geography – to a data lake or DWH. Using this data, data scientists create and refine Machine Learning (ML) models, while data analysts use Business Intelligence (BI) to analyze trends, segment customers, and perform Root-Cause Analysis (RCA).
  2. The refined ML model is deployed into the data fabric , to be executed in real-time for an individual entity (customer, product, location, etc.) – thus “operationalizing” the machine learning algorithm.
  3. Data fabric executes the ML model on demand, in real time, feeding it the individual entity’s complete and current data.
  4. The ML output is instantly returned to the requesting application, and persisted in the data fabric, as part of the entity, for future analysis. Data fabric can also invoke real-time recommendation engines to deliver next-best-actions.

Chapter 12

Why K2View

K2View is the only data fabric capable of responding to entity-centric data queries in real time, at massive scale, and supporting both operational and analytical workloads.

Here are 5 reasons that K2View has become the data fabric of choice among some of the world’s largest enterprises:

A micro-database for every business entity delivers unmatched performance, ease of access, completeness of data, and a common language with the business

K2View Data Fabric unifies the data for every business entity from all underlying source systems into a single micro-database, one for every instance of a business entity.

A customer micro-DB, for example, unifies everything a company knows about a specific customer – including all interactions (emails, phone calls, website portal visits, chats…), transactions (orders, invoices, payments…) and master data – regardless of underlying source systems, their technologies, and data formats. In this case, one micro-DB is managed for every customer.

The micro-DB may be enriched with new fields that are captured or calculated on the fly – such as KPIs, consent information, propensity to churn, and more. And it can be easily defined, using auto-discovery, to extract a suggested data schema from the underlying systems.

Screen Shot 2021-10-03 at 22.41.51

A micro-DB represents everything an enterprise knows about a specific business entity.

To maximize performance:

  • Data sync rules define the frequency and events at which each data element in the micro-DB is updated from the source systems.
  • Data virtualization rules define which data will be persisted in the micro-DB, and which will only be cached in memory.
  • Each micro-DB is compressed by approximately 90%, also resulting in lower data transmission costs.

Each micro-DB is encrypted with its own unique key, so that each entity is uniquely secured. This maintains the highest level of security for data at rest.

K2View Data Fabric can scale to manage hundreds of millions of secured micro-DBs concurrently, and be deployed in a distributed on-premise, on-cloud, or hybrid architecture.

Data is integrated and delivered, from any source, to any target, in any style

K2View has developed an operational data fabric that ingests data from any source, in any data delivery style, and then transforms it for delivery, to any target, in milliseconds.

Microservices, deliver a single view of any business entity to consuming applications

K2View Data Fabric provides a low-code / no-code framework to create and debug microservices. Using a visual, drag-and-drop builder, microservices can be quickly customized and orchestrated to support any operational use case. This approach lends itself to treating data as a product and supporting mesh architectures.

Users or tokens that need access to a microservice are assigned a role, which defines the level of data access they have. Once a microservice is deployed, K2View Data Fabric controls authentication and authorization so that user access is properly restricted.

One platform, many use cases

The K2View Data Fabric is a central data hub that delivers a real-time, trusted, and holistic view of any business entity to any consuming applications, data lakes, or data warehouses. The use cases of the data fabric are therefore numerous, and span many departments in the enterprise.

Screen Shot 2021-10-03 at 22.47.13

In summary, the platform delivers:

  • Modular, open, and scalable architecture
    Data integration, transformation, enrichment, preparation, and delivery – integrated in a single, extensible platform

  • Split second, end-to-end, response times
    Enterprise data fabric, built to support real-time operations, with bi-directional data movement between sources and targets

  • Data management for operational and analytical workloads
    Integrated, trusted data is delivered in a split second into consuming applications or pipelined into data lakes and data warehouses for analytical purposes.

WHITEPAPER

Uncover the patented technology behind the K2View operational data fabric

We wrote the book on operational data fabric. Our technical whitepaper reveals the inner workings of K2View Data Fabric, and how it effectively organizes and manages big data.

Get Whitepaper