ETL stands for Extract, Transform and Load and refers to the process of copying data from one or more sources into a destination system.
Most of the time, the data formats in the sources differ from the desired format. When moving the data to the destination, the format is also changed to fit the purpose it is going to be used for. In many cases this purpose is data analysis and the destination system is a data warehouse.
The three steps are:
- Extraction: take data from the different source systems.
- Transformation: convert data into a format that can be analyzed.
- Load: store data into a data warehouse or other system.
Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream processing.