Machine Learning: Algorithm Overview
January 15, 2019By Jason M. Pittman, D.Sc.
Welcome back! In this ongoing series of blog posts, my goal is to help demystify machine learning. It’s a topic of growing importance, and also one that intimidates many people. I hope to change that.
Recently, we looked at decision trees as a basic form of machine learning. As this series continues, I will walk you through a simple decision tree programming example. I also want to show you some math.
However, there’s a bit more terminology we need to pick up beforehand. For example, in our last discussion, the decision trees we looked at were examples of supervised algorithms. Not all machine learning is so; there are unsupervised algorithms too.
What is the difference? Why use one or the other? Are there other types of algorithms available? Let’s find out!
Types of Problem
The best place to start is by defining the type of problem we’re attempting to solve. Let’s recall that machine learning problems are essentially statistics problems. Okay, we know statistics are great tools for two things: making predictions based on data and organizing or describing data.
Thus, we can categorize the types of problems machine learning applies to by understanding whether we’re making a prediction as an answer or organizing data as an answer. That’s a broad take, anyway.
Alternatively, we can codify types of problems according to data. Where does the data come from? How do we consume the data? What do we want from the data? These are the kinds of questions we ask. For example, using historic data to make a prediction is different from consuming unstructured data to produce an organized structure.
Also, we can look at the scale of data insofar as the data are discrete (e.g. binary) or continuous (e.g. real numbers). Some problems can be defined through discrete inputs and outputs such as liking or not liking a course. Other problems necessitate a continuous range of data such as when we’re analyzing trends. More formally, we can say machine learning deals with specific types of problems such as predictions, organizational, descriptions, rankings, and so forth.
Types of algorithm
Fortunately for us, we have a toolbox full of algorithms to match virtually any type of problem. In fact, in many cases we have a many-to-one relationship with several algorithms potentially working with a single type of problem. Thus, selecting an appropriate algorithm is paramount because we want optimal results, not just results.
Along such lines, the two general types of algorithms -- classifiers and regression -- have a variety of specific implementations. For instance, classifiers can be instantiated from decision trees, naive Bayes classifiers, vector machines, and neural networks. As well, regression can take the form of linear, non-linear, or Bayesian regression. By the way, these are just the supervised algorithms. Keep in mind, because these are supervised, we need training data to feed them. As such, these algorithms are perfect for making predictions.
Conversely, if don’t have training data, we can turn to unsupervised algorithms. Although we don’t know what the output will look like, algorithms such as unsupervised decision trees, clustering algorithms, component (factor) analyses, and the like are well-adapted to organizing and structuring ambigious data into comprehensible models.
Moving forward, we will explore all of these algorithms in detail, including programming and datasets. Buckle up friends -- we’re headed to the example zone!