Now that the new year is underway, I’m finding it hard to get back into the swing of things. Though I did write through the holidays, I slowed down as I had to spend time with family and do the whole…
However, for a better understanding of how a supervised algorithm works, let’s define some basic terms useful to build our algorithms.
How we know the fuel of every machine learning algorithm, is the data, a supervised learning algorithm used as an input a dataset as a collection of labelled data
More formally we can describe the dataset of the labelled data in this way:
Each element of xi among N is called a feature vector. A feature vector is a vector in which each dimension, j=1,…., N contains a value that describes the example in analysis. This value is called and is denoted by the syntax.
The goal of supervised learning is to predict the data in input based on the feature present on the data and then classify the data itself. Figure 1 shows how a feature work, in the figure we see the labelled data, Dog and the different weight, for example, 18 lbs or 14 lbs. A feature is essentially the characteristic connected with the data, the feature is important when we want to classify.
When we talk of Supervised Learning, the idea is generally to solve two major types of problems:
A classification problem is a type of problem where machine learning tries to classify the data based on the previous observation.
A typical example is a spam filter, the filter learns the common rules to classify the mail as spam or not, this type of classification is called binary classification.
Another type of classification is called multiclass classification, this type of classification can be used for example to recognize different types of animals, we feed the algorithm with different pictures and the algorithm classify the animal on unseen data.
The essence of the Classification is essential to assign a categorical label to unordered and unseen data.
There is another important task connected with Supervised Learning, this is called regression.
Regression analysis is used to predict a continuous outcome variable based on a prediction, if we want to be more formal, a regression analysis is for finding a correlation between some independent variable and one dependent variable. A classical example of regression analysis is the prediction of the price of a house based for example on the square feet or the location.
We now have an idea about supervised learning, and what type of problem we can solve with this technique, for a better understanding of how to use let’s see the major algorithm used with supervised learning.
The KNN or K-Nearest Neighbor is a simples algorithm used to classify data based on the similarity of the data. In the algorithm, the ‘k’ is used to identify the value of neighbour near the data point, this is used to analyze the value near the data point and then classify the data, new data are classified by the majority of vote of the neighbour, the number of neighbours used is identified by the value k.
Naive Bayes classifier is a probabilistic algorithm, this is based on the Bayes Theorem and is used to solve a classification problem. This type of algorithm is essentially a probabilistic classifier, this means the classification is done using a probability of an event using the Bayes Theorem:
Naive Bayes is used for spam classification or text classification.
Decision Trees are used in both the problem from the supervised learning, Classification and Regression, for this reason, are sometimes called Classification And Regression Trees (CART).
In the Decision Trees, the prediction of the response is made directly by learning the feature derived from the features themselves.
In decision analysis, the decision tree can be used to visualize the data and explicitly represent decision and decision making.
Linear Regression is one of the most basic algorithms for Machine Learning, with the linear regression algorithm, the model tries to find the best linear between dependent and independent variables.
Linear regression can be split into two main types:
In both cases, the model tries to find the correlation between the independent variable and the dependent one.
Support Vector Machine is another algorithm that can be used in both cases, classification and regression.
The SVM algorithm plot each data item as a point in the n-dimensional space, where n is the number of features we have in the model, each point represent a coordinate in the space.
With the coordinate defined, we use classification to find a hyper-plane used to differentiate the data.
In this article, we introduce the supervised learning algorithm, this is the first algorithm used in Artificial Intelligence and Machine Learning.
In the next articles, we start to see how to implement some of the algorithms presented and we see how this algorithm can be used to build our Machine Learning model.
If you are interested to go deeper into the algorithm I can suggest some amazing books:
If you like the article feel free to add a comment and ask any question you can have
Fertility issues can be a difficult and emotional journey for couples who are trying to conceive. Fortunately, there are many fertility clinics that offer specialized treatments and support to help…
There is a spiritual path of silence. It is a path that has tremendous power. Silence leads us to God consciousness. Silence is central to meditation. It is only by learning to quiet the mind and…
Food insecurity rates are rising rapidly in Marion County. According to the Service Learning Assistant at the IUPUI Campus Kitchen, Ashabul Alam, “43 percent of students at IUPUI are now classified…