Linear Regression#
Note
Linear Regression = Linear Model + Mean Square Loss
Linear Regression has nice geometric and probabilistic Interpretations.
Model#
Suppose \(x \in \mathbb{R}^{d}\), \(y \in \mathbb{R}\). Linear model is:
For simplicity, let:
Then linear model could be write as:
Loss#
Loss Function is mean square loss:
Update Rule#
Gradient Descent:
Gradient of Linear Regression:
Combine all dimensions:
Write in matrix form:
where \(X \in \mathbb{R}^{n\times{d}}, y \in \mathbb{R}^{n}\).
Analytic Solution#
From above, we have:
If \(X^{T}X\) is invertible:
Else the equation also has a solution:
Examples#
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
X.shape, y.shape
((506, 13), (506,))
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(X, y)
LinearRegression()
from sklearn.metrics import mean_squared_error
mean_squared_error(y, reg.predict(X))
21.894831181729202
Geometric Interpretation#
Denote linear space \(S = \mbox{span}\left \{\mbox{columns of } X \right \}\), linear combination of \(S\) should be written as \(X\theta\).
\(X\theta\) is the projection of \(y\) on \(S \Leftrightarrow\) \(X\theta - y\) orthogonal with \(S \Leftrightarrow\) orthogonal with columns of \(X \Leftrightarrow X^{T}(X\theta - y)=0\)
Linear regression \(\Leftrightarrow\) Finding the projection of \(y\) on \(S\).
Probabilistic Interpretation#
Assume targets and inputs are related via:
where \(\epsilon^{(i)}\) is the error term and distributed IID according to Gaussian with mean 0 and variance \(\sigma^{2}\):
This is equivalent to say(we should denote that \(\theta\) is not a random variable here):
The likelihood function:
Maximize the log likelihood:
hence, maximizing the log likelihood gives the same answer as minimizing:
Linear regression \(\Leftrightarrow \) Maximum Likelihood Estimate given Gaussian error.