- SVM is Supervised machine learning Algorithm.
- It is mostly used in classification problem.
- However it is also used in Regression problems also.
- Here data are represented in n-dimensional space each having coordinates.
- Classification is done finding hyper-plane.

- For two dimensions:

g(x) = w^{t}x + b ( straight line) - For three dimensions:

g(x) = w^{t}x + b ( plane) - For more than three dimensions:

g(x) = w^{t}x + b ( Hyper-plane) - W is a vector perpendicular to hyperplane.
- W represents orientation in d dimension space. Where d is the dimension of the feature vector.
- Where as b represent the position of that hyperplane.
- g(x1) = wt x1 + b >0 then x1 belongs to C1 class.
- g(x1) = wt x1 + b <0 then X2 belongs to C2 class.
- When the training is done. Wt and b is so modified that it changes the plane so that the point falls into the actual class (+ve and –ve).
- g(x1) = w
^{t}.x1 + b > v(gamma)

If I have a hyperplane w.x + b = 0

distance of a point x is given by = (w.x + b) / ||w||

This is the distance x from the hyperplane.

what we want is this distance to be (w.x + b) / ||w|| > v(gamma) - (w.x + b) / ||w|| > = v(gamma)

= (w.x + b) >= v* ||w||

after scaling v*||w|| =1

Therefore w.x + b >= 1 ( if x belongs C1)

w.x + b <= -1 ( if x belongs C2)

yi ( w.xi + bi ) >=1 , yi is a class belongingness , it takes the value as + or -1

if yi ( w.xi + bi ) =1 then xi is support vector.

**Kernel :**To have a hyper plane between classes. This function which takes low dimensional input space and transform it to higher dimensional space. It converts non-linear not separable problems to linearly separable problems. E.g “linear” , ”Poly” , “ Sigmoid”, ”Radial basis”.**Gamma:**As the value of gamma increases it tries to fit the training dataset exactly. Hence it causes generalization error and Overfitting.**C :**Also known as the penalty parameter. It controls the trade off between smooth decision boundary and classifying the training points correctly- Cross validation score is the only way to have effective combination of these parameters and avoid over-fitting.

- It works really well with clear margin of separation
- It is effective in high dimensional spaces.
- It is effective in cases where number of dimensions is greater than the number of samples.
- It uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.

- It doesn’t perform well, when we have large data set because the required training time is higher.
- It also doesn’t perform very well, when the data set has more noise i.e. target classes are overlapping.
- SVM doesn’t directly provide probability estimates, these are calculated using an expensive five-fold cross-validation.