ETL vs ELT vs eETL: A Comparison of Data Integration Methods

INTRO

A revolution in data preparation and delivery

Enterprises are increasingly moving from complex and expensive data pipelines, based on on ETL or ELT, to smart, agile data pipelines that are based on business entities – "entity-based ETL" (eETL).

This paper examines ETL versus ELT, and discusses the logical evolution to eETL, along with its benefits.

01 Traditional ETL
02 ELT Pros and Cons
03 ETL vs ELT
04 A New Approach: Entity-Based ETL (eETL)
05 ETL vs ELT vs eETL

Please select Chapter

01 Traditional ETL
02 ELT Pros and Cons
03 ETL vs ELT
04 A New Approach: Entity-Based ETL (eETL)
05 ETL vs ELT vs eETL

Chapter 01

Traditional ETL

The traditional approach to data integration, known as extract-transform-load (ETL), has been around since the 1970s. In the ETL process, data is extracted in bulk from various sources, transformed into a desired format, then loaded for storage into its target destination.

ETL workflow

Extract	Transform	Load
Raw data is read and collected from various sources, including message queues, databases, flat files, spreadsheets, data streams, and event streams.	Business rules are applied to clean the data, enrich it, anonymize it, if necessary, and format it for analysis.	The transformed data is loaded into a big data store, such as a data warehouse, data lake, or non-relational database.

ETL steps and timeline

Traditional ETL has the following disadvantages:

Smaller extractions: Heavy processing of data transformations (e.g., I/O and CPU processing of high-volume data) often means having to compromise on smaller data extractions.
Complexity: Traditional ETL is comprised of custom-coded programs and scripts, based on the specific needs of specific transformations. This means that the data engineering team must develop highly specialized, and often non-transferrable, skill sets for managing its code base.
Cost and time consumption: Once set up, adjusting the ETL process can be both costly and time consuming, often requiring lengthy re-engineering cycles – by highly skilled data engineers.
Rigidity: Traditional ETL limits the agility of data scientists, who receive only the data after it was transformed and prepared by the data engineers – as opposed to the entire pool of raw data – to work with.
Legacy technology: Traditional ETL was primarily designed for periodic, batch migrations, was performed on-premise and does not support continuous data streaming. It is also extremely limited when it comes to real-time data processing, ingestion, or integration.

Chapter 02

ELT Pros and Cons

ELT stands for Extract-Load-Transform. Unlike traditional ETL, ELT extracts and loads the data into the target first, where it runs transformations, often using proprietary scripting. The target is most commonly a data lake, or big data store, such as Teradata, Spark, or Hadoop.

ELT workflow

Extract	Load	Transform
Raw data is read and collected from various sources, including message queues, databases, flat files, spreadsheets, data streams, and event streams.	The extracted data is loaded into a data store, whether it is a data lake or warehouse, or non-relational database.	Data transformations are performed in the data lake or warehouse, primarily using scripts.

etl-vs-elt-vs-entity-based-etl-2

Pros

ELT has several advantages over ETL, such as:

Fast extraction and loading: Data is delivered into the target system immediately, with very little processing in-flight.
Lower upfront development costs: ELT tools are good at moving source data into target systems with minimal user intervention, since user-defined transformations are not required.
Low maintenance: ELT was designed for use in the cloud, so things like schema changes can be fully automated.
Greater flexibility: Data analysts no longer have to determine what insights and data types they need in advance, but can perform transformations on the data as needed in the data warehouse or lake.
Greater trust: All the data, in its raw format, is available for exploration and analysis. No data is lost, or mis-transformed along the way.

While the ability to transform data in the data store answers ELT’s volume and scale limitations, it does not address the issue of data transformation, which can be very costly and time-consuming. Data scientists, who are high-value company resources, need to match, clean, and transform the data – accounting for 40% of their time – before even getting into any analytics.

Cons

ELT is not without its challenges:

Costly and time consuming: Data scientists need to match, clean, and transform the data before applying analytics.
Compliance risks: ELT tools don’t have built-in support for data anonymization (while ensuring referential integrity) and data governance, thereby introducing data privacy risks.
Data migration costs and risks: The movement of massive amounts of data, from on-premise to cloud environments, consumes high network bandwidth.
Big store requirement: ELT tools require a modern data staging technology, such as a data lake, where the data is loaded. Data teams then transform the data into a data warehouse where it can be sliced and diced for reporting and analysis.
Limited connectivity: ELT tools lacks connectors to legacy and on-premise systems, although this is becoming less of an issue as ELT products mature, and legacy systems are retired.

Chapter 03

ETL vs ELT

The following table summarizes the main differences between ETL vs ELT:

	ETL	ELT
Process	Data extracted from sources, transformed, then loaded into a DWH/lake	Raw data extracted and loaded directly into a DWH/lake, where transformation takes place
Primary Use	Smaller amounts of targeted data	Larger amounts of data
Time to insights	Time consuming: Fresh data may take days to receive, while new data may take weeks/months	Time consuming: Data scientists need to match, clean and transform the data prior to applying analytics
Compliance	ETL removes confidential and sensitive information before adding it to the warehouse	With raw data added directly into the warehouse, there are greater chances of accidental data exposure and breaches
Technology	Mature technology used for over 20 years, supported by many tools	Comparatively new field supported by fewer professionals and tools
Cost	Can be costly due to high-scale processing time	Low-cost entry and high scalability due to SaaS-based cloud services

Chapter 04

A New Approach: Entity-Based ETL (eETL)

An innovative new approach addresses the limitations of both traditional ETL and ELT, and delivers trusted, clean, and complete data that can immediately be used to generate insights.

eETL steps and timeline

etl-vs-elt-vs-entity-based-etl-3

At the foundation of the eETL approach is a logical abstraction layer that captures all the attributes of any given business entity (such as a customer, product or order), from all source systems. Accordingly, data is collected, processed, and delivered per business entity (instance) as a complete, clean, and connected data asset.

In the extract phase, the data for a particular entity is collected from all source systems. In the transform phase, the data is cleansed, enriched, anonymized, and transformed – as an entity – according to predefined rules. In the load phase, the entity data is safely delivered to any big data store.

The best of both worlds

eETL represents the best of both (ETL and ELT) worlds, because the data in the target store is:

Extracted, transformed, and loaded – from all sources, to any data store, at any scale – via any data integration method: messaging, CDC, streaming, virtualization, JDBC, and APIs
Always clean, fresh, and analytics-ready
Built, packaged and reused by data engineers, for invoking by business analysts and data scientists
Continuously enriched and connected, to support complex queries and avoid the need for running heavy processing table joins

Chapter 05

ETL vs ELT vs eETL

The table below summarizes the ETL vs ELT vs eETL approaches to big data preparation:

	ETL	ELT	eETL
Process	Data extracted from sources, transformed, then loaded into a DWH/lake	Raw data extracted and loaded directly into a DWH/lake, where transformation takes place	ETL is run in parallel in data fabric via entities Data is clean and complete by design
Primary Use	Smaller amounts of targeted data	Larger amounts of data	Supports preparation of large amounts of data with low impact on sources and destinations Supports packaging and reuse of data preparation flows, essentially operationalizing data preparation
Flexibility	Transforms and loads the data that you define as necessary	You can make raw data readily available and build transformations as needed	Data engineers define the data preparation flow Data scientists decide on scope, time and destination of data
Time to insights	Time consuming	Time consuming	Data preparation is done instantly and continuously, in real time
Compliance	ETL removes confidential and sensitive information before adding it to the warehouse	With raw data added directly into the warehouse, there are greater chances of accidental data exposure and breaches	Data is anonymized during preparation and is fully compatible with privacy regulations (GDPR, CCPA)
Technology	Mature technology used for over 20 years, supported by many tools	Comparatively new field supported by fewer professionals and tools	Proven and used by large enterprises
Cost	Can be costly due to high-scale processing time	Low-cost entry and high scalability due to SaaS-based cloud services	Low computing and storage costs since transformation is done per digital entity Elimination of the staging phase, saving processing, storage, time, and effort

The best method for you

As explained above in the ETL vs ELT comparison, each method has its own advantages and disadvantages. Applying an entity-based approach to data pipelining enables you to overcome the limitations of both approaches and deliver clean and ready-to-use data, at high scale, without having to compromise on flexibility, data security, and privacy compliance requirements.

Whitepaper

ETL vs ELT vs eETL pipelining

Download the whitepaper to compare ETL, ELT and entity-based ETL pipelining of data from any source to any lake or DWH.

Overcomes the disadvantages of ETL and ELT
Continuously delivers complete, clean, and timely data – at any scale
Automates and productizes the data preparation process
Supports both analytical and operational workloads

Overview

Capability

Architecture

Initiative

Industry

Company

Reach Out

News Updates

Education & Training

Resources

Demo

K2VIEW ebook

ETL vs ELT vs eETL

A Comparison of Data Integration Methods

Table of Contents

INTRO

A revolution in data preparation and delivery

Table of Contents

Chapter 01

Traditional ETL

Chapter 02

ELT Pros and Cons

Pros

Cons

Chapter 03

ETL vs ELT

Learn more about getting the best of both (ETL vs ELT) worlds with entity-based ETL pipelining.

Chapter 04

A New Approach: Entity-Based ETL (eETL)

The best of both worlds

Chapter 05

ETL vs ELT vs eETL

The best method for you

ETL vs ELT vs eETL pipelining