Last time, we introduced linear regression as a new class of learners which we called linear. Let’s start with a little recap…

We considered a set of features $\mathfrak{X}$ together with labels which in turn took values in a finite-dimensional inner product space $\mathfrak{y}$. We next considered any finite-dimensional subspace $\mathfrak{H}\subset \mathfrak{y}^\mathfrak{X}$ of the vector space of functions $\mathfrak{X}\longrightarrow \mathfrak{y}$ as the possible hypotheses as well as a dataspace $\mathfrak{D}$ consisting of finite subsets of $\mathfrak{X}\times \mathfrak{y}$ which separate the hypothesis space $\mathfrak{H}$. We finally associated to any dataset $\Delta \in \mathfrak{D}$ and any hypothesis $f \in \mathfrak{H}$ a cost which was simply the norm of the vector $\big(f(x)-y)\big)_{(x,y) \in \Delta}$ in the space $\mathfrak{y}^\Delta$ or more explicitely: $$c(\Delta, f) = \vert \vert (f(x)-y)_{(x,y)\in \Delta}\vert \vert_{\mathfrak{y}^\Delta} = \sum_{(x,y) \in \Delta}\sqrt{\vert \vert f(x)-y\vert \vert^2_\mathfrak{y}}$$ We then claimed (without proof )that this indeed defines a sharp learner in the sense that for any dataset $\Delta \in \mathfrak{D}$, the learned hypothesis which by definition is given by $$h_\Delta := \textrm{argmin}_{f\ \in \mathfrak{H}} c(\Delta,f) $$ was indeed well defined

The proof is actually rather easy, and today we will guide you through the steps:

Let’s begin by recalling the following fact from linear algebra:

- the vector $ v-w$ is orthogonal to the subspace $ W$
- $\vert \vert v-w\vert \vert \le \vert \vert v-u\vert \vert $ $, \forall u \in W$

The above lemma tells us that the *projection* onto the subspace $W$ defined as $\pi(v)=\text{argmin}_{w \in W}\vert \vert v-w\vert \vert$ coincides with the other notion of *projection* of onto the subspace $W$ by considering the decomposition of $V$ into $V=W\oplus W^{\perp}$ and writing $v = w+(v-w)$. We will call $w$ map *the* projection of $v$ onto $W$ as no confusion can arise.

In fact we’ll be interested in a slight variation of the above definition: instead of a subspace, we’ll consider a linear map $f:V\longrightarrow W$ and assume that $W$ is now a finite dimensional inner product space. We let $U$ denote the subspace $U=\text{im}(f)\subset W$. Now, for any vector $w \in W$ we can ask what the projection of $w$ onto $U$ looks like. There is a very nice answer to this question:

- $f(v)$ is the projection of $w$ onto $U$
- the vector $v \in V$ satisfies $(f^* \circ f)(v) = f^*(w)$