The exercise [1, p. 34 exercise 2.4] asks to show that if \mathbf{F} is a full rank m \times n matrix with  m > n, \mathbf{x} is a m \times 1 vector, and \mathbf{y} is an m \times 1 vector, that the effect of the linear transformation
\mathbf{y}=\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H\mathbf{x} (1)

is to project \mathbf{x} onto the subspace spanned by the columns of \mathbf{F}. Specifically, if \{\mathbf{f_1},\mathbf{f_2},...,\mathbf{f_n}\} are the columns of \mathbf{F}, the exercise [1, p. 34 exercise 2.4] asks to show that
(\mathbf{x-y})^{H}\mathbf{f_i}=0, \textrm{  i=1,2,...,n.} (2)

Furthermore it is asked why the transform \mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H must be idempotent. Solution: Let T: \mathbb{C}^{m}\rightarrow \mathbb{W} \subseteq \mathbb{C}^{n} be the linear transform to which the matrix \mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H is associated to. We are asked to show that the linear transform is surjective (onto) to \mathbb{W}=\mathrm{span}\{\mathbf{f_1},...,\mathbf{f_n}\}. This linear transform can be thought as the composite of the linear transforms [2, p. 49, proposition 6.3] T_1:\mathbb{C}^{m}\rightarrow \mathbb{C}^{n},T_2:\mathbb{C}^{n}\rightarrow \mathbb{C}^{n}, T_3:\mathbb{C}^{n}\rightarrow \mathbb{C}^{m} to which the matrices \mathbf{F^H},\mathbf{\left(\mathbf{F^H}\mathbf{F}\right)} and \mathbf{F} are associated respectively. We note that T_1 is surjective as the rank of the matrix equals n: rank(\mathbf{F})=rank(\mathbf{F^H})=n. Thus the whole image \mathbb{C}^n is spanned by this transform per definition. The transform T_2 is bijective as an invertible matrix is associated with it. Therefore T_2 is also surjective to \mathbb{C}^n. From this we conclude that the standard basis \vec{v}_i, \mathrm{i=1,...,n} (\vec{v}_i is the vector composed of zeros at all elements except the i^{th} that is equal to one) is also included in the image of T_2. Because any vector \vec{u} \in \mathbb{C}^n can be represented as a weighted sum of the standard basis: \vec{u}=\sum_{i=1}^{n}\left(\alpha_i \vec{v}_i\right) we obtain applying the linear transform T_3 to this vector the following relation:
T_3(\vec{u})=T_3\left(\sum\limits_{i=1}^{n}\left(\alpha_i \vec{v}_i\right)\right)
=\sum\limits_{i=1}^{n}\alpha_i T_3\left(\vec{v}_i\right)

Now let \mathbf{F} be the matrix of the transform T_3 relative to the standard bases \vec{v}_i \in \mathbb{C}^n with i=1,...,n and  \vec{w}_i \in  \mathbb{C}^m , j=1,...,m. If the elements of the matrix are \{f_{ij}\} then the transform T_3(\vec{v}_j) can be rewritten as [2, p. 47]: T_3(\vec{v}_j)=\sum_i^{k}f_{ij} \cdot  \vec{w}_i. This is simply the vector composed of the elements of the j^{th} column of the matrix \mathbf{F}. This vector will be denoted as \vec{f_j}. Using matrix notation T_3(\vec{v}_i) can be written simply as \mathbf{F}\mathbf{v_i}=\mathbf{f}_i. Thus T_3(\vec{v})=\sum_{i=1}^{n}\alpha_i \vec{f}_i, which means that the image of T_3 is composed of all linear combination of the columns of the corresponding matrix, this is per definition span\{\mathbf{f}_1,...,\mathbf{f}_n\} and thus the composite of the transforms T=T_3 \circ T_2 \circ T_1 is surjective (onto) to \mathbb{W}=span\{\mathbf{f}_1,...,\mathbf{f}_n\}. Now let us proceed to show that
(\mathbf{x-y})^{H}\mathbf{f_i}=0, i=1,2,...,n. (3)

First we will compute the hermitian transpose of the column vector \mathbf{y}. By using the associative property of matrix multiplication and the rule [1, p. 23] (\mathbf{A}\mathbf{B})^H=\mathbf{B}^H\mathbf{A}^H we derive the following relations:
\mathbf{y}^H=\left(\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H\mathbf{x}\right)^H
\mathbf{y}^H=\mathbf{x}^H\left(\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H\right)^H
\mathbf{y}^H=\mathbf{x}^H \left(\left(\mathbf{F}^H\right)^H\left[\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\right]^H\mathbf{F}^H\right)
\mathbf{y}^H=\mathbf{x}^H\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H (4)

Now instead of multiplying (\mathbf{x-y})^{H} by one column \mathbf{f}_i of the matrix \mathbf{F}=[\mathbf{f}_1 \cdots \mathbf{f}_i \cdots \mathbf{f}_n], as in (3), we will proceed compute the multiplication with the whole matrix, effectiveley using all columns \mathbf{f}_i. Furthermore applying also (4) we obtain:
(\mathbf{x-y})^{H}\mathbf{F}=(\mathbf{x}^H-\mathbf{y}^{H})\mathbf{F}
=(\mathbf{x}^H-\mathbf{x}^H\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H)\mathbf{F}
=\mathbf{x}^H(I-\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H)\mathbf{F}
=\mathbf{x}^H(\mathbf{F}-(\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H)\mathbf{F})
=\mathbf{x}^H(\mathbf{F}-(\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1})(\mathbf{F}^H\mathbf{F}))
=\mathbf{x}^H(\mathbf{F}-\mathbf{F}\left(\left(\mathbf{F}^H\mathbf{F}\right)^{-1}(\mathbf{F}^H\mathbf{F})\right))
=\mathbf{x}^H(\mathbf{F}-\mathbf{F}\mathbf{I})
=\mathbf{x}^H\cdot \mathbf{0}
=\mathbf{0}\Leftrightarrow
(\mathbf{x-y})^{H}\mathbf{f_i}=0 ,\forall i=1,2,...,n.

A matrix \mathbf{A} is idempotent by definition [1, p.21] when \mathbf{A}^2=\mathbf{A}. Multiplying \left(\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H\right) by itself and using again the associative property of matrix multiplication results in:
\left(\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H\right)^2=\left(\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H\right)\left(\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H\right)
=\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\left(\mathbf{F}^H\mathbf{F}\right)\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H
=\mathbf{F}\left[\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\left(\mathbf{F}^H\mathbf{F}\right)\right]\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H
=\mathbf{F}\mathbf{I}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H
=\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H,

which proves that the matrix is idempotent. While this finding may be used to answer the question why the matrix has to be idempotent (it has to, because it is) we will use another approach. We note by [1, p.30,equation 2.57] and the previous analysis that \mathbf{y} is the least squares approximation in the space spanned by the columns of \mathbf{F} to \mathbf{x}. Applying the matrix \left(\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H\right) to \mathbf{y} provides thus the least square approximation to \mathbf{y} that is in the space spanned by the columns of \mathbf{F}. But as \mathbf{y} resides already within the space spanned by the columns of the matrix \mathbf{F}, the least square approximation that also resides in the same space has to be the vector \mathbf{y} itself. But this simply means that consecutive applications of the same transform provide the same result as the single application of the transform \left(\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H\right). Thus \left(\mathbf{F}\left(\mathbf{F}^H\mathbf{F}\right)^{-1}\mathbf{F}^H\right) is idempotent. QED.

[1] Steven M. Kay: “Modern Spectral Estimation – Theory and Applications”, Prentice Hall, ISBN: 0-13-598582-X.
[2] Lawrence J. Corwin and Robert H. Szczarba: “Calculus in Vector Spaces”, Marcel Dekker, Inc, 2nd edition, ISBN: 0824792793.