Kernel SVM#

Note

Intuition: Transforming the original space to the feature space, solve the problem in the feature space.
For SVM, we can use the kernel trick.

Recall the Linear SVM in the original space:

\[\begin{split} \begin{equation} \begin{split} \underset{\alpha}{\max}\ &\sum_{i=1}^{n}\alpha_{i} - \sum_{i,j=1}^{n}y^{(i)}y^{(j)}\alpha_{i}\alpha_{j}\left \langle x^{(i)},x^{(j)} \right \rangle \\ \mbox{s.t.}\ &0 \le \alpha_{i}\le{C},\ i=1,...,n \\ &\sum_{i=1}^{n}\alpha_{i}y^{(i)}=0 \end{split} \end{equation} \end{split}\]

Let \(\phi : \mathbb{R}^{d} \to \mathbb{R}^{p}\) be a feature map, \(\left \langle x^{(i)},x^{(j)} \right \rangle\) change to \(\left \langle \phi(x^{(i)}),\phi(x^{(j)}) \right \rangle\), so SVM in the feature space:

\[\begin{split} \begin{equation} \begin{split} \underset{\alpha}{\max}\ &\sum_{i=1}^{n}\alpha_{i} - \sum_{i,j=1}^{n}y^{(i)}y^{(j)}\alpha_{i}\alpha_{j}\left \langle \phi(x^{(i)}),\phi(x^{(j)}) \right \rangle \\ \mbox{s.t.}\ &0 \le \alpha_{i}\le{C},\ i=1,...,n \\ &\sum_{i=1}^{n}\alpha_{i}y^{(i)}=0 \end{split} \end{equation} \end{split}\]

We only need to know kernel of the feature space \(K(x, z) = \left \langle \phi(x),\phi(z) \right \rangle\) with out knowing the form of \(\phi\).

When predicting:

\[w^{T}\phi(x) + b = \left ( \sum_{i=1}^{n}\alpha_{i}y^{(i)}\phi(x^{(i)})\right )^{T}x + b = \sum_{i=1}^{n}\alpha_{i}y^{(i)}\left \langle \phi(x^{(i)}),\phi(x) \right \rangle + b\]

Kernels#

Theorem(mercer): let \( K: \mathbb{R}^{d} \times \mathbb{R}^{d} \mapsto \mathbb{R}\). then for K be a valid kernel, it is necessary and sufficient that for any \(\left \{ x^{(1)},...,x^{(n)} \right \} \), the corresponding kernel matrix is symmetric positive semi-definite.

Linear kernel:

\[K(x, z) = x^{T}z\]

Polynomial kernel:

\[K(x, z) = (\gamma{x^{T}}z + r)^{k}\]

Gaussian RBF(Radial Basis Function):

\[K(x, z) = \exp(-\gamma\left \| x - z \right \|^{2} )\]

Examples#

import numpy as np

X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

clf = make_pipeline(StandardScaler(), SVC(kernel='rbf', gamma='auto'))
clf.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('svc', SVC(gamma='auto'))])
clf.predict([[-0.8, -1]])
array([1])