Wolpert and Macready (1997)
We have dubbed the associated results NFL theorems because they demonstrate that if an algorithm performs well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems.
In this lecture, we will use
We will interchangeably use
# Truth, without noise
xx, yy = generateData(100, False)
# Noisy observation, training samples
x, y = generateData(13, True)
# plot data
plt.plot(xx, yy, 'g--')
plt.plot(x, y, 'ro')
[<matplotlib.lines.Line2D at 0x10a41e5e0>]
# Here were are going to take advantage of numpy's 'polyfit' function
# This implements a "polynomial fitting" algorithm
# coeffs are the optimal coefficients of the polynomial
coeffs = np.polyfit(x, y, 0) # 0 is the degree of the poly
# We construct poly(), the polynomial with "learned" coefficients
poly = np.poly1d(coeffs)
plt.plot(xx, yy, "g--")
plt.plot(x, y, "ro")
plt.plot(xx, poly(xx), "b-")
[<matplotlib.lines.Line2D at 0x10a502b50>]
coeffs = np.polyfit(x, y, 1) # Now let's try degree = 1
poly = np.poly1d(coeffs)
plt.plot(xx, yy, "g--")
plt.plot(x, y, "ro")
plt.plot(xx, poly(xx), "b-")
[<matplotlib.lines.Line2D at 0x10a56a370>]
coeffs = np.polyfit(x, y, 3) # Now degree = 3
poly = np.poly1d(coeffs)
plt.plot(xx, yy, "g--")
plt.plot(x, y, "ro")
plt.plot(xx, poly(xx), "b-")
[<matplotlib.lines.Line2D at 0x10a5c7730>]
The function $y(\vec{x}, \vec{w})$ is linear in parameters $\vec{w}$.
The basis functions $\phi_j(\vec{x})$ need not be linear.
r = np.linspace(-1,1,100)
f, axs = plt.subplots(1, 3, figsize=(12,4))
for j in range(8):
axs[0].plot(r, np.power(r,j))
axs[1].plot(r, np.exp( - (r - j/7.0 + 0.5)**2 / 2*5**2 ))
axs[2].plot(r, 1 / (1 + np.exp( - (r - j/5.0 + 0.5) * 5)) )
set_nice_plot_labels(axs) # I'm hiding some helper code that adds labels
Minimize the residual error over the training data.
$$ E(\vec{w}) = \frac12 \sum_{n=1}^N (y(x_n, \vec{w}) - t_n)^2 = \frac12 \sum_{n=1}^N \left( \sum_{j=0}^{M-1} w_j\phi_j(\vec{x}^{(n)}) - t^{(n)} \right)^2 $$The design matrix is a matrix $\Phi \in R^{N \times M}$, applying
Goal: $\Phi \vec{w} \approx \vec{t}$
This is the Moore-Penrose pseudoinverse, $\Phi^\dagger = (\Phi^T \Phi)^{-1} \Phi^T$ applied to solve the linear system $\Phi w \approx t$.
stys = ['k-', 'k-.', 'm-', 'm-.']
plt.plot(xx, yy, "g--")
plt.plot(x, y, "ro", label="Training data")
for _i in range(4):
plt.plot(xx, tsts[_i], stys[_i], label='Order {0}'.format(_i))
plt.legend()
<matplotlib.legend.Legend at 0x10a7f8970>
f, ax = plt.subplots(ncols=4, sharey=True, figsize=(16,4))
for _i in range(4):
ax[_i].plot(xx, yy, "g--")
ax[_i].plot(x, y, "ro")
ax[_i].plot(xx, tsts[_i])
ax[_i].set_title('Order {0}, RMSE={1:4.3f}'.format(_i, es[_i]))
f, ax = plt.subplots(ncols=4, sharey=True, figsize=(16,4))
for _i in range(4):
ax[_i].plot(y, y, "r-")
ax[_i].plot(y, trns[_i], "bo")
ax[_i].set_title('Order {0}, R2={1:4.3f}'.format(_i, rs[_i]))