Machine Learning in the age of AutoML
Learn about the basics of ML and how H2O’a Automatic ML is making these tools accessible to a wider community of people.
“Just as electricity transformed almost everything 100 years ago, today I actually have a hard time thinking of an industry that I don’t think AI (Artificial Intelligence) will transform in the next several years.” : Andrew Ng
Every company can be an artificial intelligence (AI) company. This article touches upon the basics of ML and how Automatic ML is making AI accessible to a wider community of people. This article is based on a webinar titled — Introduction to Machine Learning for All of Us, conducted by Rafael Coss, Director of Technical Marketing at H2O.ai
AI Fundamentals
Artificial intelligence: A definition
According to Wikipedia, AI is the study of “intelligent agents” i.e any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. In order words, Artificial intelligence is a field of computer science that provides the ability for a computer to learn and reason like humans using several available techniques.
The field of AI has evolved over the years and is currently making a lot of progress. The possible reasons for its success today are:
- Ability to leverage more data to find patterns,
- Access to state of the art algorithms and techniques to find those patterns in data, and
- The availability of large amounts of computing power since finding patterns in data using the algorithms requires a lot of computing power.
So the fact that these three things have been commoditized is a key enabler to make AI a reality today. And that’s why in 2020, AI is spreading like wildfire through various enterprises.
Machine Learning: A Definition
Machine learning, on the other hand, is a subfield of artificial intelligence, which enables machines to learn from past data or experiences without being explicitly programmed.
Most recent advances in AI have been achieved by applying machine learning to very large data sets. With big data and the digitalization of the world, more and more data is becoming available to us. Machine learning algorithms detect patterns and learn how to make predictions and recommendations by processing data and experiences, rather than by receiving explicit programming instructions.
What kind of problems can be solved with Machine Learning?
Machine Learning is today being applied to a number of use cases, for example:
- Finding a category: Is this tweet positive or negative? Is this person going to default on a credit card loan? Is that a yes or no? Such a problem is called a Classification problem.
- Finding a number: Predicting Sales or profit for a company. Such a problem is called a Regression problem.
- Finding Groups/Cluster: Identifying subgroups in data such that data points in the same subgroup (cluster) are very similar.
Depending upon these use cases, machine learning can be broadly classified as:
Supervised learning
Supervised Learning
During Supervised learning, a computer learns by example. A supervised learning algorithm takes a known set of input data(training examples) and the corresponding responses to the data (label) and trains a model to generate reasonable predictions for the response to new data(test data).
Let’s look at an example wherein a bank is trying to use machine learning to figure out if someone’s going to default on a loan or not. The dataset provided is of a credit card transaction of the person for the last year. Since the idea is to find a category i.e will default or won’t default, this is a classic case of Supervised learning classification.
Algorithms like Linear Regression, Logistic Regression, Decision Trees, Support Vector Machines, etc are used for Supervised learning.
Unsupervised Learning
Unsupervised learning, on the other hand, is a type of machine learning that looks for previously undetected patterns in a data set with no pre-existing labels and with a minimum of human supervision. This means, unlike supervised learning, we do not have a variable to predict; instead we try to uncover hidden patterns in data so that we can identify clusters or groups within data. An example wherein a company wants to segment customers into groups by distinct characteristics like age, income group to better understand its customers is a use case of unsupervised learning.
Algorithms like K-Means clustering are used for Unsupervised learning.AI Use Cases
AI Use Cases
AI-backed technologies are being used across a variety of different industries including but not limited to finance, healthcare, telecom, marketing and retail, IoT, manufacturing, etc. The use of AI in the industry is quickly changing the business landscape, even in traditionally conservative areas.
A Typical Machine learning Workflow
A typical Machine Learning workflow has been shown below and can be fairly rich and complex.
It includes the following phases:
- exploring and preparing the data,
- Selecting the best models and tuning and optimizing them
- model deployment
- Making predictions
All this can be very complicated and requires a lot of manpower and time. However, it is here that we can leverage automatic machine learning to make this simpler and less complicated for the user.
Automatic Machine Learning
AutoML is fundamentally changing the face of ML-based solutions today by enabling people from diverse backgrounds to use machine learning models to address complex scenarios.
Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. AutoML tends to automate the maximum number of steps in an ML pipeline — with a minimum amount of human effort — without compromising the model’s performance.
H2O Driverless AI: The Platform to make your own AI
H2O Driverless AI is an artificial intelligence (AI) platform for automatic machine learning. Driverless AI automates some of the most challenging data science and machine learning workflows such as feature engineering, model validation, model tuning, model selection, and model deployment. It can also deliver insights and interpretability and can be customized and extended with a user’s own AI recipes too.
At a very high level, here’s how Driverless AI works:
- Driverless AI is data source agnostic. It can ingest data from any data source including Hadoop, Snowflake, S3 object storage, Google BigQuery, etc.
- Automatic Visualization plots, graphics, and charts to help understand the data shape, outliers, missing values, and so on. This is where a data scientist can quickly spot things such as bias in the data. In a way, Automatic Visualization helps jump-start the EDA process.
- Based on the problem type, Driverless AI will use recipes to do advanced feature engineering (automatically), while the model continues to iterate across thousands of choices, does parameter tuning, and looks for the best fit of the model.
- Finally, another amazing feature of Driverless AI is that it can build an automatic scoring pipeline, which means it can generate Python and Java code to deploy low latency scoring of that model into production. Imagine taking that scored model and propagating it across every edge device — on smartphones, or in cars, to continuously generate value.
Driverless AI also has a Machine Learning Interpretability feature which gives the data scientist the reason codes and insight into what model was generated and which features were used to build the model. Automatic documentation gives one an in-depth explanation of the entire feature engineering process. This satisfies that desire to have trust in AI with explainability. This entire process is done through a graphical user interface, making it easy for even a novice data scientist to be productive immediately. Here is a short end-to-end demo of H2O Driverless AI which includes: (1) Data Visualization (2) An AI experiment (3) Machine Learning Interpretability (4) One-click deployment (5) Bring Your Own Recipe. This demo gives you the perfect overview in just over 6 minutes!
H2O Driverless AI provides companies with a data science platform that addresses the needs of a variety of use cases for every enterprise in every industry. With Driverless AI, every company can become an AI company. H2O Driverless AI is ideal for driving Enterprise AI adoption because it solves for Talent, Time and Trust challenges and to make your own AI with a customizable and extensible platform.
Leveraging Automatic Machine Learning can help accelerate and scale your company’s AI efforts and their journey to becoming an AI company. However, what about if a company has IP on particular feature engineering or scores that could be helpful. Traditional AutoML has a fixed set of ML optimization it performs. Expert data scientists want to leverage the benefits of the automation but don’t want to lose the ability to influence the optimization. They want their cake(benefit from optimization) and eat it too (influence the optimization) Being able to have an extensible AutoML platform provides these benefits. Learn how expert Data Scientists at Goldman Sachs leverage an open and extensible AutoML:
Where Do You Go from Here?
So where do you go from here? By deliberating on the above key points and talking to your team about it, companies can get a sense or a direction from where to begin their journey. Think and identify the problems that you are trying to solve currently and see how you can use Machine Learning and AI to give you leverage. Learn by doing and explore the H2O.ai Tutorials.