Support Vector Machine
What is SVM?
- SVM is Supervised machine learning Algorithm.
- It is mostly used in classification problem.
- However it is also used in Regression problems also.
- Here data are represented in n-dimensional space each having coordinates.
- Classification is done finding hyper-plane.
- For two dimensions:
g(x) = wt x + b ( straight line)
- For three dimensions:
g(x) = wt x + b ( plane)
- For more than three dimensions:
g(x) = wt x + b ( Hyper-plane)
- W is a vector perpendicular to hyperplane.
- W represents orientation in d dimension space. Where d is the dimension of the feature vector.
- Where as b represent the position of that hyperplane.
- g(x1) = wt x1 + b >0 then x1 belongs to C1 class.
- g(x1) = wt x1 + b <0 then X2 belongs to C2 class.
- When the training is done. Wt and b is so modified that it changes the plane so that the point falls into the actual class (+ve and –ve).
- g(x1) = wt .x1 + b > v(gamma)
If I have a hyperplane w.x + b = 0
distance of a point x is given by = (w.x + b) / ||w||
This is the distance x from the hyperplane.
what we want is this distance to be (w.x + b) / ||w|| > v(gamma)
- (w.x + b) / ||w|| > = v(gamma)
= (w.x + b) >= v* ||w||
after scaling v*||w|| =1
Therefore w.x + b >= 1 ( if x belongs C1)
w.x + b <= -1 ( if x belongs C2)
yi ( w.xi + bi ) >=1 , yi is a class belongingness , it takes the value as + or -1
if yi ( w.xi + bi ) =1 then xi is support vector.
- Kernel : To have a hyper plane between classes. This function which takes low dimensional input space and transform it to higher dimensional space. It converts non-linear not separable problems to linearly separable problems. E.g “linear” , ”Poly” , “ Sigmoid”, ”Radial basis”.
- Gamma:As the value of gamma increases it tries to fit the training dataset exactly. Hence it causes generalization error and Overfitting.
- C :Also known as the penalty parameter. It controls the trade off between smooth decision boundary and classifying the training points correctly
- Cross validation score is the only way to have effective combination of these parameters and avoid over-fitting.
- It works really well with clear margin of separation
- It is effective in high dimensional spaces.
- It is effective in cases where number of dimensions is greater than the number of samples.
- It uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
- It doesn’t perform well, when we have large data set because the required training time is higher.
- It also doesn’t perform very well, when the data set has more noise i.e. target classes are overlapping.
- SVM doesn’t directly provide probability estimates, these are calculated using an expensive five-fold cross-validation.