Maximal Margin Separators
A Maximal Margin Separator (in a 2-dimensional space) is a hyperplane (in this case a line) that completely separates 2 classes of observations, while giving the most space between the line and the nearest observation. These nearest observations are the support vectors. In the plot below, the support vectors are the circled points. In this instance, the support vectors are evenly spaced, which means that maximal margin separator would be a line that falls halfway between each pair of support vectors and matches their slope. (In an instance where there are 3 support vectors, the line will parallel the slope of the side that has 2 support vectors.)
Finding the line
Since we have a relatively simple plot, and we know what our support vectors are, we can find find the equation for the hyperplane by first finding the equation for the line.
x2 = m*x1 + b
Compare blue cross point (2, 2) to red circle point (2, 3). Notice that a point halfway (vertically) between them would be (2, 2.5). This is point 1 on our maximal margin separator. Compare blue cross observation (4, 6) to red circle observation (4, 7). Note that a point halfway between those two points would be (4, 6.5). This is point 2 on our maximal margin separator.
We can now compute the slope by dividing x22 – x21 by x12 – x11. That works out to (6.5-2.5)/(4-2) = 2. That’s our slope and we can sub that in for m in the equation:
x2 = 2 * x1 + b
We know what our points are. We can sub in either one to find our intercept (b). Subbing in the point at (4, 6.5), we get:
6.5 = 2 * 4 + b
or
6.5 = 8 + b
We can subtract 8 from both sides to get b:
6.5 – 8 = b – 8
-1.5 = b
So now we know that our line equation is:
x2 = 2 * x1 + -1.5
Equation for a hyperplane
A hyperplane equation looks like this:
beta0 + (beta1 * x1) + (beta2 * x2) = 0
with the caveat that (beta1^2 + beta2^2) = 1
Notice that the hyperplane equation has to equal zero. That’s so that all points above the hyper plane end up being postive, and all points below it end up being negative. We can then classify our points according to each class by whether they are positive or negative.
Let’s take this in steps:
First let’s fill in what we know from our point on the hyperplane and our linear equation. beta0 is our intercept, so fill that in.
-1.5 + beta1 x1 + beta2 x2 = 0
beta1 is our slope, so fill that in.
-1.5 + 2 x1 + beta2 x2 = 0
We know a point on our hyperplane is (4, 6.5), so we can fill in x1 and x2
-1.5 + 2 * 4 + beta2 * 6.5 = 0
Now we have to use algebra to solve for beta2. If you struggle with algebra, meet MathPapa’s Algebra Calculator. It will be your new best friend.
In this case, beta2 = -1.
Our hyperplane equation, discounting the caveat is:
-1.5 + 2(x1) + -1(x2) = 0
Dealing with the caveat
Now we have to deal with the caveat. We can use a normalization process called the l2 norm to find a scaling factor for all of our betas. We’ll only use beta1 and beta2 to find our normalization factor. But, then we’ll apply it to all three coefficients so that our entire equation continues to equal zero. You can prove to yourself that multiplying all the coefficents by the same factor works by running the following code (in R).
#set up our points
x1 = 4
x2 = 6.5
#set up w - this is a weight.
#We're going to scale our equation by the weight
w = 1
#set up the initial equation
eq = -1.5/w + (2/w)*x1 + (-1/w)*x2
#look at the inital value
eq
#loop from 1 to 10.
#We'll divide our betas by the weight.
#Eq will still equal zero (when rounded):
for (w in 1:10){
eq = -1.5/w + (2/w)*x1 + (-1/w)*x2
cat("Loop ", w, " eq = ", eq, "\n")
}
The formula for the li norm is:
||y|| = sqrt(beta1^2 + beta2^2)
In our case that is:
||y|| = sqrt(2^2 + -1^2)
||y|| = 1.73205080757
You can prove to yourself that the caveat is met by checking your work:
w = 1.73205080757
2/w
-1/w
(1.154701^2) +(-0.5773503^2)
The final hyperplane equation
Whew! You almost at the end. Now, we have our weight and we need to scale all three betas by our weight.
w = 1.73205080757
-1.5/w
2/w
-1/w
Our final hyperplane equation is:
-0.8660254 + 1.154701(x1) + -0.5773503(x2) = 0
If we fill in our known point (4,6.5) again, we can check our work:
#verify that this equals 0
round(-0.8660254 + 1.154701*x1 + -0.5773503*x2)
#verify that this equals 1
round((1.154701^2) + (-0.5773503^2))
How big is the margin
We have to do one final step to determine how big the margin is. Easy. We know what the points on the margin are. These are our support vectors above. Choose one. Let’s choose the the red circle at (4,7).
All we have to do is plug in this point on the margin and run our hyperplane equation again. Since this is a point above the hyperplane, we multiple the result by -1 to get the positive margin.
x1 = 4
x2 = 7
-1*(-0.8660254 + 1.154701*x1 + -0.5773503*x2)
## [1] 0.2886735
IT IS WRONG! Be careful