Why AutoML Is Not Enough for Scaling AI​

EvoML: Scalable End-to-end Data Science Lifecycle

Slide Background

1. Introduction:

The practice of building AI is rapidly becoming automated. Automated Machine Learning (AutoML) is the process of automating the time consuming, iterative tasks of machine learning model development. However, AutoML on its own cannot bring AI to life in the real business world. In this article, we will give an overview of AutoML’s limitations, and explain how EvoML overcomes these limitations to make AI scalable.


2. What is AutoML?

Integrating AI into your business activities is a long-term process which requires specialised talent, a considerable amount of effort and investments. It requires well-sought-after data science talents who mainly tech giants can afford. Even in 2021, hiring and keeping data science experts is very difficult. In addition, it is still time-consuming and expensive to create AI models even when organisations have the talent they need.

The emergence of AutoML platforms in recent years is regarded as an opportunity to ‘democratise machine learning’, allowing both ML and non-ML experts to automatically build ML models to solve complex business tasks, without the need for domain knowledge and coding skills. This not only reduces the time and investment needed to build models, but it also enables companies which struggle to hire data science experts or have very small data science teams, to access AI.

Figure 1: Traditional ML VS AutoML. Source: https://joshjanzen.com/ml-vs-automl/

There are various AutoML solutions available on the market, from open-source libraries to commercial products with high-level code or code-free UIs. Some of them cover only parts of the data science pipeline while some of them provide an end-to-end solution. Figure 1 compares traditional machine learning with AutoML. As you can see, these platforms usually consist of four types of automation tasks:

  1. Data Ingestion & Data Exploration
  2. Date pre-processing, Feature Engineering and Feature selection
  3. Model training and Hyperparameter optimisation
  4. Model Evaluation and Interpretation

Additionally, some AutoML tools specifically focus on the automation of the model operationalisation and maintenance tasks. Since models need to be deployed into business operations - “operationalised” in as many different systems and interface as necessary. Meanwhile, organisations need to monitor, manage, govern, and analyse the results of the ML models to ensure that they are meeting their business needs.


3. Scaling AI Challenges that AutoML doesn’t tackle

AutoML accelerates data science lifecycle and makes AI more accessible. But AutoML mainly focuses on creating and perfecting model accuracy. There are some challenges to overcome to make AI scalable.

Just like an engine is not the whole car, a model is just one aspect being covered in AI projects. Even if the model achieves high accuracy, it will not be used if you define the wrong business problem, the model has bias, it is black-box, it is too slow, or it consumes too much compute power. For the purpose of this article, we will be focusing on three challenges which we consider to be essential for ML platforms. These challenges are: 1) Efficiency 2) Trust and 3) Flexibility.

3.1 Efficiency: ML code needs to work well in production environment

There is a misconception that model accuracy is everything. As we have emphasised in our previous blog, AI is an iterative optimisation process to achieve multiple objectives, and efficiency is critical for scaling AI.

Unless you deploy AI models in production, you will not capture the value of AI. As stated in a recent research paper from Google, ML code is only a small fraction of the whole ecosystem. In contrast, the process of deployment is practically 50% of it (right side as showed in Figure 2 below). Businesses need to plan about deployment in production environments from the start of the model building process. There are different production requirements to consider, for example:

  • •   Maximum response time that is acceptable for your AI model
  • •   Maximum computing resource (cost) that your AI model can consume
  • •   Automatic retraining of the model when data patterns change

Figure 2 ML code is only a small fraction of the whole ecosystem. Image by Marcin Laskowski

To satisfy these specific requirements, the model code needs to be customised for each specific production environment. However, current AutoML platforms are incapable of multi-objective optimisation at ML code level. AI/data engineers need to spend a lot of time understanding the code created by data scientists and optimise each part according to requirements. In addition, AutoML cannot solve the trade-off between accuracy and efficiency. Businesses often struggle to sacrifice accuracy for production requirements or other business metrics.

AutoML alone cannot optimise ML code for multiple production requirements simultaneously, there is still a huge gap to be filled in before scaling AI to variable deployment scenarios to drive business value.

3.2 Trust: Bias, Explainability and ML Code

AI is used across different sectors (healthcare, finance, criminal justice etc) and can make significant decisions that impact our lives. However, AI is intrinsically biased. Therefore, it is paramount that we understand and find ways to reduce bias and establish trust.

AutoML usually operates as “black box”. The techniques and underlying mechanism of automation are hidden from users. Therefore, users may not trust the results and can find it difficult to tailor the AutoML solution to their systems or projects. AutoML tools need to be more transparent and explainable, enabling AutoML-built models to be trustworthy enough to be adopted responsibly by businesses.


Figure 3 Black box AutoML. Image by MIT

After experimenting with an AutoML tool, a few data scientists from Columbia University discovered that although some transparency was offered for the final model (some visuals and technical details), it was unclear how feature engineering was done and how the model was trained, making it difficult even for those with more experience, to explain the results with confidence. Therefore, they had to go back, carefully analyse the output and manually run the experiment which required a lot of time and, to their surprise, a lot of skills.

If these tools are to really democratise AI for everyone, a high level of transparency needs to be provided for each step of the data science process.

“We find that including transparency features in an AutoML tool increased user trust and understandability in the tool; and out of all proposed features, model performance metrics and visualisations are the most important information to data scientists when establishing their trust with an AutoML tool.”- IBM Research&Rensselaer Polytechnic Institute

3.3 Flexibility: Boxed solution without ML Code is Not Practical

Building sophisticated models requires flexibility. Most AutoML tools are boxed solutions, making it easy to build general AI solutions for business analysts. However, organisations have complex business problems that require sophisticated models. Data scientists are still required in this case to improve the model manually to meet their specific needs. When it comes to data scientists, the flexibility to use different AutoML approaches is a necessity. A boxed AutoML process which limits data scientists’ accessibility to the process, will hinder their creativity and productivity.

Integrating models in production requires flexibility and ML code. The code-free process of AutoML enables you to easily create solutions, but the solution may not be practical to use and scale. AI model is embedded in software to function in the business process. Scaling AI applications require that the AI platform can smoothly integrate within your existing IT infrastructure. ML code is essential to enable customisation for seamless integration. While most AutoML platforms do not provide ML code for custom integration.

Optimising models in production requires flexibility. Model is living and breathing in the dynamic production environment. To maintain optimal performance, users may need to retrain the model and improve a specific step in the data science lifecycle. Businesses also have non-AutoML models (e.g sophisticated models built manually by data scientists) that need to be optimised from time to time. Therefore, you need a tool that provides a flexible foundation to automate and simplify the parts of data science you need.

Reaching AI Maturity requires flexibility. Large tech companies are using a ‘Lock-in’ strategy to attract new customers by compiling all their AI options into all-inclusive packages. Although this might sound appealing to some users, in time, it will become very difficult to migrate your data and findings if you ever want to swich to a different tool or integrate with other systems.

Most companies are still at the beginning of their AI journey, where they are still experimenting or taking their very first steps towards implementing a solution. Being locked in by an AutoML platform at this stage, and later find out that you’re not sure where AI can be applied across your company, or that the platform is not practical for your IT infrastructure, is not desirable.


EvoML: Scalable End-to-end Data Science Lifecycle

Although AutoML significantly reduces the time it takes to build an ML model and improves its accuracy, it leaves out important attributes in the process necessary for any business to scale AI. To tackle the challenges of scaling AI mentioned above (efficiency, trust and flexibility), a more explainable and modular automation process is necessary, and ML code optimisation is the silver bullet.

EvoML is built with the mission of making AI scalable from day one. As a result, businesses are enabled to take deployment and scaling requirements into consideration when they start building models, which saves time and avoids extra work later. In contrast, organisations will have to spend extra time and even rebuild models if they only consider these requirements after they have built the model.

As we have discussed in 3.1, Efficiency, businesses need accurate models that can run efficiently in different production environments. Furthermore, the more use cases organisations embed AI into, the more hybrid approach they will need for AI development, which requires standalone optimisation that works well with models built differently. EvoML not only allows users to automatically built efficient ML models with multiple objectives for a specific deployment scenario, but it also enables them to optimise existing models for improved efficiency. These existing models can be built manually by data scientists, created by other AutoML platforms or generated on EvoML. This standalone optimisation is critical as AI maturity of the business grows.


Figure 4 EvoML: Scalable end-to-end data science lifecycle

The optimisation takes place at the algorithmic level, but most importantly at model source code, by automatically identifying inefficiencies in the code and proposing a series of changes. EvoML provides the whole code of the best model (selected and tuned) for end-to-end use.

Furthermore, EvoML is a ‘glass box’, with every step of the automation process visible to the user. This high degree of transparency helps better mitigate bias and gives users the confidence to easily explain the results to other team members, regulators, customers, etc.

With EvoML’s scalable end-to-end data science lifecycle, businesses can build AI that can really scale across different clouds, devices, and ultimately different business units, to capture the massive business value and gain a competitive edge.

About TurinTech

TurinTech is a research-driven deep tech company founded in 2018 based in London. TurinTech provides a platform for users with different level of skills to automatically build, optimise and deploy scalable AI within days.

TurinTech is run by professors, data scientists and engineers from prestigious universities. We are actively collaborating with world-leading academic institutions to create breakthroughs.

Learn more about scaling AI at https://turintech.ai/
Follow us on LinkedIn, Medium, Twitter



Reference:

https://www.bigsquid.com/automl-is-not-enough
https://towardsdatascience.com/automl-is-overhyped-1b5511ded65f
https://hbr.org/2019/10/the-risks-of-automl-and-how-to-avoid-them 
https://analyticsindiamag.com/what-are-the-limitations-of-automl/ 
https://www.linkedin.com/pulse/demystifying-data-science-part-v-automl-ian-thomas/?articleId=6650429769135599617 
https://www.rtinsights.com/automl-evolution-misconceptions-reality/ 
https://dotdata.com/how-to-evaluate-and-select-the-right-automl-platform/ 
https://news.mit.edu/2019/atmseer-machine-learning-black-box-0531 
https://medium.com/syndicai/artificial-intelligence-model-deployment-problem-or-challenge-6438ecec20a9 
https://arxiv.org/abs/2001.06509 
https://www.kdnuggets.com/2019/07/automl-full-autopilot.html 
https://towardsdatascience.com/should-you-use-a-no-code-ai-platform-limits-and-opportunities-4f39a92234f0 
https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf