Microsoft Azure Data Factory vs. on-premise ETL – tools: key features compared

15 September 2020
Blog Image

Computing engine

A classic ETL-tool is a desktop tool which requires a good-sized server where the tool needs to be installed on. The server will also have to be managed and a basic monitoring needs to be set up to check the health of the system. This also means that the size of the server will have to be determined at front. The ETL-tool will have to be maintained and upgraded when necessary. ADF is a cloud-based, serverless service and does not require any hardware or any installation. Since no server is required, no monitoring or managing of it is needed. Updates of ADF are fully managed by Microsoft and will be implemented automatically.

Costs

Comparing the cost of the 2 systems is an impossible job since costs depends on multiple parameters and the pricing models differ totally. When using a classic ETL-tool the cost is especially determined by the price of the used server and the license cost of the ETL-tool. This price can vary depending on the type of the sever installed and the choice of the ETL-tool. ADF uses a pay-as-you go system. This means that no up-front costs are necessary and that you will only be charged for the services you use. Pricing for ADF is roughly based on the time needed to process the data and the performance-scaling you use. The difficulty here is to find to perfect equilibrium between time and performance to lower the cost. A disadvantage of this pricing model that the invoice will be less predictable than a classic environment. Every month the cost will vary depending of different parameters.

Performance

In terms of performance, both classic ETL-tools and ADF have controls to help optimize runtimes, which are more or less comparable. However, in good written ETL , the answer to performance issues is often to add more resources (memory, CPU, disk I/O). This is where ADF is more flexible, because you can easily change the performance settings with one click, and it is even possible at pipeline-level or for a single execution. By using this technique you can assign more resources to heavy-loaded pipelines which results in a better performance at lower costs.

Scalability

When using a classic ETL-tool, scaling can be a bottleneck for an agile business. Classic ETL-tools are not always created to handle big volumes of data and the type of server cannot always be switched easily, which will result in a higher time to market. This is one of the main reasons why cloud is so conductive: your system can be upgraded in no time to each desired level.

ADF is not standalone

When using ADF, not only standard ETL-transformations are embedded, but also more advanced components are integrated such as Databricks, Azure Machine Learning, HDInsight etc. This enables the developer to use more advanced services of Microsoft Azure within an ADF-pipeline. When using these components resources will be provided automatically by the trigger of the ADF-pipeline. This can be very useful when creating for example predictive pipelines, or to test a specific use-case. This architecture is not possible within a classic ETL-tool since these components are not standardly integrated.

How we can help

As an experienced Microsoft partner, Datashift helps companies developing new data platforms in the cloud, more specific ADF. We also help companies creating business cases to migrate existing architecture to ADF & Microsoft Azure. Reach out for more info !