K2VIEW ebook

ETL vs ELT vs eETL

A Comparison of Data Integration Methods

Learn how entity-based ETL (eETL) overcomes the limitations of traditional
ETL and ELT, to deliver clean, ready-to-use data to your business analysts.

K2View-Resources_Upper-Banner_-ETL-vs-ELT-vs-eETL-Dec-08-2022-02-27-24-3710-PM

INTRO

A revolution in data preparation and delivery

Enterprises are increasingly moving from complex and expensive data pipelines, based on on ETL or ELT, to smart, agile data pipelines that are based on business entities – "entity-based ETL" (eETL). 

This paper examines ETL versus ELT, and discusses the logical evolution to eETL, along with its benefits.

Chapter 01

Traditional ETL

The traditional approach to data integration, known as extract-transform-load (ETL), has been around since the 1970s. In the ETL process, data is extracted in bulk from various sources, transformed into a desired format, then loaded for storage into its target destination.

ETL workflow

Extract Transform Load
Raw data is read and collected from various sources, including message queues, databases, flat files, spreadsheets, data streams, and event streams. Business rules are applied to clean the data, enrich it, anonymize it, if necessary, and format it for analysis. The transformed data is loaded into a big data store, such as a data warehouse, data lake, or non-relational database.

 

ETL steps and timeline

Traditional ETL has the following disadvantages:

  • Smaller extractions: Heavy processing of data transformations (e.g., I/O and CPU processing of high-volume data) often means having to compromise on smaller data extractions.
  • Complexity: Traditional ETL is comprised of custom-coded programs and scripts, based on the specific needs of specific transformations. This means that the data engineering team must develop highly specialized, and often non-transferrable, skill sets for managing its code base.
  • Cost and time consumption: Once set up, adjusting the ETL process can be both costly and time consuming, often requiring lengthy re-engineering cycles – by highly skilled data engineers.
  • Rigidity: Traditional ETL limits the agility of data scientists, who receive only the data after it was transformed and prepared by the data engineers – as opposed to the entire pool of raw data – to work with.
  • Legacy technology: Traditional ETL was primarily designed for periodic, batch migrations, was performed on-premise and does not support continuous data streaming. It is also extremely limited when it comes to real-time data processing, ingestion, or integration.

Chapter 02

ELT Pros and Cons

ELT stands for Extract-Load-Transform. Unlike traditional ETL, ELT extracts and loads the data into the target first, where it runs transformations, often using proprietary scripting. The target is most commonly a data lake, or big data store, such as Teradata, Spark, or Hadoop.

ELT workflow

Extract Load Transform
Raw data is read and collected from various sources, including message queues, databases, flat files, spreadsheets, data streams, and event streams. The extracted data is loaded into a data store, whether it is a data lake or warehouse, or non-relational database. Data transformations are performed in the data lake or warehouse, primarily using scripts.

 

etl-vs-elt-vs-entity-based-etl-2

Pros

ELT has several advantages over ETL, such as:

  • Fast extraction and loading: Data is delivered into the target system immediately, with very little processing in-flight.
  • Lower upfront development costs: ELT tools are good at moving source data into target systems with minimal user intervention, since user-defined transformations are not required.
  • Low maintenance: ELT was designed for use in the cloud, so things like schema changes can be fully automated.
  • Greater flexibility: Data analysts no longer have to determine what insights and data types they need in advance, but can perform transformations on the data as needed in the data warehouse or lake.
  • Greater trust: All the data, in its raw format, is available for exploration and analysis. No data is lost, or mis-transformed along the way.

While the ability to transform data in the data store answers ELT’s volume and scale limitations, it does not address the issue of data transformation, which can be very costly and time-consuming. Data scientists, who are high-value company resources, need to match, clean, and transform the data – accounting for 40% of their time – before even getting into any analytics.


Cons

ELT is not without its challenges:

  • Costly and time consuming: Data scientists need to match, clean, and transform the data before applying analytics.
  • Compliance risks: ELT tools don’t have built-in support for data anonymization (while ensuring referential integrity) and data governance, thereby introducing data privacy risks.
  • Data migration costs and risks: The movement of massive amounts of data, from on-premise to cloud environments, consumes high network bandwidth.
  • Big store requirement: ELT tools require a modern data staging technology, such as a data lake, where the data is loaded. Data teams then transform the data into a data warehouse where it can be sliced and diced for reporting and analysis.
  • Limited connectivity: ELT tools lacks connectors to legacy and on-premise systems, although this is becoming less of an issue as ELT products mature, and legacy systems are retired.

Chapter 03

ETL vs ELT

The following table summarizes the main differences between ETL vs ELT:

  ETL ELT   
Process 
  • Data extracted from sources, transformed, then loaded into a DWH/lake
  • Raw data extracted and loaded directly into a DWH/lake, where transformation takes place
 
Primary Use
  • Smaller amounts of targeted data
  • Larger amounts of data
 
Time to insights
  • Time consuming: Fresh data may take days to receive, while new data may take weeks/months
  • Time consuming: Data scientists need to match, clean and transform the data prior to applying analytics
 
Compliance
  • ETL removes confidential and sensitive information before adding it to the warehouse
  • With raw data added directly into the warehouse, there are greater chances of accidental data exposure and breaches
 
Technology
  • Mature technology used for over 20 years, supported by many tools
  • Comparatively new field supported by fewer professionals and tools
 
Cost
  • Can be costly due to high-scale processing time
  • Low-cost entry and high scalability due to SaaS-based cloud services
 

 

Learn more about getting the best of both (ETL vs ELT) worlds with entity-based ETL pipelining.

Get Whitepaper

Chapter 04

A New Approach: Entity-Based ETL (eETL)

An innovative new approach addresses the limitations of both traditional ETL and ELT, and delivers trusted, clean, and complete data that can immediately be used to generate insights.

eETL steps and timeline

etl-vs-elt-vs-entity-based-etl-3

At the foundation of the eETL approach is a logical abstraction layer that captures all the attributes of any given business entity (such as a customer, product or order), from all source systems. Accordingly, data is collected, processed, and delivered per business entity (instance) as a complete, clean, and connected data asset.

In the extract phase, the data for a particular entity is collected from all source systems. In the transform phase, the data is cleansed, enriched, anonymized, and transformed – as an entity – according to predefined rules. In the load phase, the entity data is safely delivered to any big data store.

The best of both worlds

eETL represents the best of both (ETL and ELT) worlds, because the data in the target store is:

  • Extracted, transformed, and loaded – from all sources, to any data store, at any scale – via any data integration method: messaging, CDC, streaming, virtualization, JDBC, and APIs
  • Always clean, fresh, and analytics-ready
  • Built, packaged and reused by data engineers, for invoking by business analysts and data scientists
  • Continuously enriched and connected, to support complex queries and avoid the need for running heavy processing table joins

Chapter 05

ETL vs ELT vs eETL

The table below summarizes the ETL vs ELT vs eETL approaches to big data preparation:

  ETL ELT eETL
Process
  • Data extracted from sources, transformed, then loaded into a DWH/lake
  • Raw data extracted and loaded directly into a DWH/lake, where transformation takes place
  • ETL is run in parallel in data fabric via entities
  • Data is clean and complete by design
Primary Use
  • Smaller amounts of targeted data
  • Larger amounts of data
  • Supports preparation of large amounts of data with low impact on sources and destinations
  • Supports packaging and reuse of data preparation flows, essentially operationalizing data preparation
Flexibility
  • Transforms and loads the data that you define as necessary
  • You can make raw data readily available and build transformations as needed
  • Data engineers define the data preparation flow
  • Data scientists decide on scope, time and destination of data
Time to insights
  • Time consuming
  • Time consuming
  • Data preparation is done instantly and continuously, in real time
Compliance
  • ETL removes confidential and sensitive information before adding it to the warehouse
  • With raw data added directly into the warehouse, there are greater chances of accidental data exposure and breaches
  • Data is anonymized during preparation and is fully compatible with privacy regulations (GDPR, CCPA)
Technology
  • Mature technology used for over 20 years, supported by many tools
  • Comparatively new field supported by fewer professionals and tools
  • Proven and used by large enterprises
Cost
  • Can be costly due to high-scale processing time
  • Low-cost entry and high scalability due to SaaS-based cloud services
  • Low computing and storage costs since transformation is done per digital entity
  • Elimination of the staging phase, saving processing, storage, time, and effort

 

The best method for you 

As explained above in the ETL vs ELT comparison, each method has its own advantages and disadvantages. Applying an entity-based approach to data pipelining enables you to overcome the limitations of both approaches and deliver clean and ready-to-use data, at high scale, without having to compromise on flexibility, data security, and privacy compliance requirements.

Whitepaper
K2view Data Pipelining ETL vs ELT vs eETL

ETL vs ELT vs eETL pipelining

Download the whitepaper to compare ETL, ELT and entity-based ETL pipelining of data from any source to any lake or DWH.

  • Overcomes the disadvantages of ETL and ELT

  • Continuously delivers complete, clean, and timely data – at any scale

  • Automates and productizes the data preparation process

  • Supports both analytical and operational workloads

Get the whitepaper