The traditional approach to data integration, known as extract-transform-load (ETL), has been around since the 1970s. In the ETL process, data is extracted in bulk from various sources, transformed into a desired format, then loaded for storage into its target destination.
ETL workflow
Extract | Transform | Load |
Raw data is read and collected from various sources, including message queues, databases, flat files, spreadsheets, data streams, and event streams. | Business rules are applied to clean the data, enrich it, anonymize it, if necessary, and format it for analysis. | The transformed data is loaded into a big data store, such as a data warehouse, data lake, or non-relational database. |
Traditional ETL has the following disadvantages:
ELT stands for Extract-Load-Transform. Unlike traditional ETL, ELT extracts and loads the data into the target first, where it runs transformations, often using proprietary scripting. The target is most commonly a data lake, or big data store, such as Teradata, Spark, or Hadoop.
ETL workflow
Extract | Transform | Load |
Raw data is read and collected from various sources, including message queues, databases, flat files, spreadsheets, data streams, and event streams. | The extracted data is loaded into a data store, whether it is a data lake or warehouse, or non-relational database. | Data transformations are performed in the data lake or warehouse, primarily using scripts. |
Pros:
ELT has several advantages over ETL, such as:
While the ability to transform data in the data store answers ELT’s volume and scale limitations, it does not address the issue of data transformation, which can be very costly and time-consuming. Data scientists, who are high-value company resources, need to match, clean, and transform the data – accounting for 40% of their time – before even getting into any analytics.
Cons:
ELT is not without its challenges:
The following table summarizes the main differences between ETL vs ELT:
ETL | ELT | ||
Process |
|
|
|
Primary Use |
|
|
|
Flexibility |
|
|
|
Time to insights |
|
|
|
Compliance |
|
|
|
Technology |
|
|
|
Bandwidth and computation costs |
|
|
An innovative new approach addresses the limitations of both traditional ETL and ELT, and delivers trusted, clean, and complete data that can immediately be used to generate insights.
At the foundation of the eETL approach is a logical abstraction layer that captures all the attributes of any given business entity (such as a customer, product or order), from all source systems. Accordingly, data is collected, processed, and delivered per business entity (instance) as a complete, clean, and connected data asset.
In the extract phase, the data for a particular entity is collected from all source systems. In the transform phase, the data is cleansed, enriched, anonymized, and transformed – as an entity – according to predefined rules. In the load phase, the entity data is safely delivered to any big data store.
The Best of Both Worlds
eETL represents the best of both (ETL and ELT) worlds, because the data in the target store is:
Extracted, transformed, and loaded – from all sources, to any data store, at any scale – via any data integration method: messaging, CDC, streaming, virtualization, JDBC, and APIs
Always clean, fresh, and analytics-ready
Built, packaged and reused by data engineers, for invoking by business analysts and data scientists
Continuously enriched and connected, to support complex queries and avoid the need for running heavy processing table joins
The table below summarizes the ETL vs ELT vs eETL approaches to big data preparation:
ETL | ELT | eETL | |
Process |
|
|
|
Primary Use |
|
|
|
Flexibility |
|
|
|
Time to insights |
|
|
|
Compliance |
|
|
|
Technology |
|
|
|
Bandwidth and computation costs |
|
|
|
The best method for you
As explained above in the ETL vs ELT comparison, each data pipeline method has its own advantages and disadvantages.
Applying an entity-based approach to data pipelining enables you to overcome the limitations of both approaches and deliver clean and ready-to-use data, at high scale, without having to compromise on flexibility, data security, and privacy compliance requirements.
Download the whitepaper to compare ETL, ELT and entity-based ETL pipelining of data from any source to any lake or DWH.
Overcomes the disadvantages of ETL and ELT
Continuously delivers complete, clean, and timely data – at any scale
Automates and productizes the data preparation process
Supports both analytical and operational workloads
Get a live demo of the K2View platform to assess its fit for your use cases.
Experience the power and flexibility of the K2View platform with a 30-day trial.
Experience the power and flexibility of the K2View platform with a 30-day trial.