Benchmarking AutoML: Exploring the Top AutoML Libraries

Tech Takes

April 29, 2024

Benchmarking AutoML: Exploring the Top AutoML Libraries

Introduction

In the field of machine learning, Automated Machine Learning (AutoML) has become instrumental in democratizing ML applications by automating complex processes such as model selection and hyperparameter tuning.

AutoML has emerged as a versatile tool, suitable for a variety of users:

Software Engineers: Enables those who need to integrate machine learning into their applications but lack the expertise to fine-tune algorithms.
Citizen Data Scientists: Facilitates the construction of ML pipelines in a low-code environment, making it accessible to non-experts.
Data Scientists and Engineers: Accelerates workflow, allowing professionals to focus on more complex aspects of their projects.

With the emergence of AutoML, AutoML libraries are also used widely. AutoML libraries are tools used in machine learning (ML) to automate tasks traditionally done by human experts. These libraries aim to make ML accessible to a wider range of users, even those without extensive ML knowledge.

Here, we explore a detailed comparison of various AutoML libraries based on our ongoing benchmarking study. This builds upon the 2022 "AMLB: an AutoML Benchmark" study, which compared 9 AutoML frameworks on 71 classification and 33 regression tasks. We extend this research by incorporating a comprehensive benchmarking study conducted in 2024, alongside evaluations of additional popular AutoML tools not covered in the original study.

We will focus on these key features:

Model accuracy vs inference time
Framework robustness
Flexibility through presets

An Overview of AutoML Libraries

The benchmarking study includes 15 well-known AutoML frameworks:

‍

**Figure 1: Level of support in AutoML libraries**

Key Features Comparison

1. Model Accuracy vs. Inference Time

‍In the realm of AutoML, a pivotal balance must be struck between the precision of models and the speed of their inferences.

AutoGluon: Known for its excellent balance between accuracy and inference time. It offers various presets that allow users to optimize for speed or accuracy based on their needs.
Auto-sklearn: Focuses heavily on model accuracy through extensive hyperparameter tuning but may lag in inference time when compared to simpler models.
FLAML: Strikes a good balance with its cost-effective optimization strategies, aiming for a middle ground in both accuracy and speed.
GAMA: While generally more accurate, it can suffer from longer inference times due to its exhaustive search strategies.
H2O AutoML: Offers robust performance with slightly slower inference times, suitable for applications where accuracy is prioritized.
LightAutoML: Designed for speed, making it ideal for real-time applications, though it may trade off some accuracy.
MLJAR: Offers a range of presets that cater to various needs, balancing between inference speed and accuracy.
TPOT: Excellent in terms of accuracy but often has the longest inference times due to its genetic programming-based optimization.
AUTO-SKLEARN 2: Enhances the original Auto-sklearn by focusing on iterative algorithms and using meta-learning to improve initial configurations. It aims to provide faster models suitable for real-time applications while maintaining competitive accuracy.
NAIVEAUTOML: Designed to provide a simple and straightforward approach to AutoML, which may not always produce the most optimized models but offers rapid deployment and reasonable performance, particularly suitable for smaller or less complex datasets.
Ludwig: Offers flexible model building with a focus on deep learning, allowing for customizable yet sometimes slower inference.
Microsoft’s NNI: Excels in optimizing deep learning networks with an emphasis on both performance tuning and inference speed.
MLBox: Provides efficient preprocessing and hyperparameter tuning but may require more configuration for optimal inference speed.
AutoKeras: Focuses on neural architecture search, providing high accuracy at potentially higher computational costs and longer inference times.
PyCaret: An easy-to-use library that efficiently balances model accuracy and inference time, suitable for quick deployments.

**Figure 2: Only selected frameworks which have evaluations on all tasks**

Figure 3: Scaled performance for each framework under different time constraints. Only frameworks which have evaluations on all tasks for both time constraints are shown. Performance generally does not improve much with more time [1]

‍

**Figure 4: Inference speed on unseen data in rows per second (only selected frameworks) [1]**

2. Framework Robustness

Reliability is a cornerstone of any tool's utility, and in AutoML, the robustness of frameworks is no less critical. It captures scenarios where frameworks may stumble, whether due to data complexity, resource constraints, or algorithmic limitations. We examine the robustness of each AutoML framework, offering a window into the practical considerations one must weigh when selecting a tool for their ML endeavours.

AutoGluon: Exhibits high reliability with minimal failures across different tasks.
Auto-sklearn: Occasionally encounters failures in more complex datasets due to intense computational demands.
FLAML: Shows moderate reliability with some issues in handling larger datasets.
GAMA: Susceptible to failures in extensive searches but offers detailed diagnostic tools to mitigate these issues.
H2O AutoML: Generally reliable but can be resource-intensive, leading to failures in constrained environments.
LightAutoML: Has a good track record with minimal failures, particularly in simpler tasks.
MLJAR: Fairly reliable, with occasional issues in very complex or large datasets.
TPOT: Due to its nature, it can fail in tasks requiring quick model generation.‍
Ludwig, Microsoft’s NNI, MLBox, AutoKeras, and PyCaret: Each framework's robustness varies, with specific challenges noted in scalability, complex model configurations, and handling diverse data types.

**Figure 5: Errors by type for each framework (only selected frameworks) [1]**

3. Use of Presets

‍Presets in AutoML frameworks serve as ready-to-use configurations that cater to diverse needs, from those seeking the utmost accuracy to those requiring swift model deployment.

AutoGluon: Offers presets such as 'Best Quality', 'High Quality', and 'Fast', allowing users to choose based on their specific needs.
Auto-sklearn: Does not offer presets but allows extensive customization, which can act as a manual preset system.
FLAML: Limited presets focused on quick model delivery.
GAMA, H2O AutoML, LightAutoML, MLJAR, TPOT: Each provides various levels of preset configurations, helping users quickly align the tool's performance with their operational requirements.
Ludwig and PyCaret offer various built-in configurations to streamline model development and deployment processes.
Microsoft’s NNI provides tools for advanced tuning of deep learning models, focusing on achieving optimal performance.
MLBox and AutoKeras emphasize automation in their workflows, with presets that help non-experts achieve competitive results without extensive machine learning knowledge.

Conclusion

AUTOGLUON emerged as a consistent leader in model performance throughout our benchmarks (in among top 10 frameworks, still need to test for 11th-15th similarly on all datasets). Acknowledging the critical importance of inference time in practical applications, we found considerable variance among the frameworks based on the study.

The choice of an AutoML library depends heavily on the specific requirements of the task at hand, such as the need for speed, accuracy, or robustness against failures. This comparison provides a snapshot of how each framework performs across a range of scenarios, helping you make an informed decision in your AutoML tool selection.

The evolution of AutoML libraries is evident from the emergence of new players and the advancements identified in the 2024 benchmarks. This ongoing research will continue to expand by incorporating additional datasets and evaluating even more libraries. By staying current with this dynamic field, we aim to provide the most comprehensive understanding of AutoML capabilities.