In [1, p. 61 exercise 3.11] it is asked to repeat problem [1, p. 61 exercise 3.10] (see also the solution [2] ) for the general case when the predictor is given as
\hat{x}[n]=-\sum\limits_{k=1}^{p}\alpha_{k}x[n-k]. (1)

Furthermore we are asked to show that the optimal prediction coefficients \{\alpha_{1},\alpha_{2},..., \alpha_{p}\} are found by solving [1, p. 157, eq. 6.4 ] and the minimum prediction error power is given by [1, p. 157, eq. 6.5 ].
Solution: The equation for determining the optimal prediction coefficients from [1, p. 157, eq. 6.4 ] is given by:
r_{xx}[k]=-\sum\limits_{l=1}^{p}\alpha_{l}r_{xx}[k-l] \;,k=1,2,...,p (2)

whereas the minimum MSE is given by [1, p. 157, eq. 6.5 ] as:
\rho_{MIN}=r_{xx}[0]+\sum\limits_{k=1}^{p}\alpha_{k}r_{xx}[-k] \;, k=1,2,...,p (3)

Using the orthogonality principle we have to obtain the coefficients \alpha_{k} that are making the observed data x_[n-k] \; ,k=1,...,p orthogonal to the error \hat{x}[n]-x[n] that is:
E\{x[n-k]^{\ast} (- \hat{x}[n]+x[n]) \}=0
E\{ x[n-k]^{\ast} (\sum\limits_{l=1}^{p}\alpha_{k}x[n-l]+x[n])\}=0
\sum\limits_{l=1}^{p}\alpha_{l}E\{ x^{\ast}[n-k]x[n-l]\}+E\{ x^{\ast}[n-k]x[n])\}=0
\sum\limits_{l=1}^{p}\alpha_{l} r_{xx}[k-l]+r_{xx}[k]=0   (4)

We see that this is the form of (2), thus the first part of the exercise is solved. The previous relation is a linear equation in the variables a_{i}, \; i=1,...,p, and setting \mathbf{r}_{xx}=\left[\begin{array}{cccc}r_{xx}[1] & r_{xx}[2] & ... &   r_{xx}[p]\end{array}\right]^{T}, \mathbf{\alpha}=\left[\begin{array}{cccc}\alpha_{1} & \alpha_{1}& ... &   \alpha_{p} \end{array}\right]^{T} and
R_{xx}=\left[\begin{array}{cccc}r_{xx}[0] & r_{xx}[-1]& \cdots&r_{xx}[-(p-1)] \\r_{xx}[1] &r_{xx}[0] &\cdots&r_{xx}[-(p-2)] \\ \vdots &\vdots & \ddots& \vdots \\ r_{xx}[p-1]&r_{xx}[p-2]&... &r_{xx}[0]\end{array}\right] (5)

the linear equations can be written in matrix notation as:
R_{xx}\mathbf{\alpha}=-\mathbf{r}_{xx}
\mathbf{\alpha}=-R^{-1}_{xx}\mathbf{r}_{xx}. (6)

The previous relation provides the solution for the optimum prediction parameters. The MSE for those parameters is
MSE=E\left\{(x[n]+\sum\limits_{k=1}^{p}\alpha_{k}x[n-k])^{\ast}(x[n]+\sum\limits_{k=1}^{p}\alpha_{k}x[n-k])\right\}
=E\left\{ x[n]^{\ast}x[n]\right\}+ \sum\limits_{k=1}^{p}\alpha_{k}E\left\{x^{\ast}[n]x[n-k]\right\}
 + \sum\limits_{k=1}^{p}\alpha^{\ast}_{k}E\left\{x[n]x^{\ast}[n-k]\right\}
 + E\left\{  \left(\sum\limits_{k=1}^{p}\alpha^{\ast}_{k}x^{\ast}[n-k]\right)\left(\sum\limits_{k=1}^{p}\alpha_{k}x[n-k]\right)\right\} 
(7)

Let \mathbf{x}=\left[\begin{array}{ccc} x[n-1] & \cdots & x[n-p] \end{array}\right]^{T} then because R_{xx}=E\left\{ \mathbf{x}^{H}\mathbf{x} \right\} equation (7) can be written as:
MSE=E\left\{ x[n]^{\ast}x[n]\right\}+ \sum\limits_{k=1}^{p}\alpha_{k}E\left\{x^{\ast}[n]x[n-k]\right\}
 + \sum\limits_{k=1}^{p}\alpha^{\ast}_{k}E\left\{x[n]x^{\ast}[n-k]\right\}
 +  \mathbf{\alpha}^{H} E\left\{ \mathbf{x}^{H}\mathbf{x} \right\} \mathbf{\alpha}
=r_{xx}[0]+ \sum\limits_{k=1}^{p}\alpha_{k}r_{xx}[-k]+ \sum\limits_{k=1}^{p}\alpha^{\ast}_{k}r_{xx}[k] + \mathbf{\alpha}^{H} R_{xx}\mathbf{\alpha}
=r_{xx}[0]+ \mathbf{r}_{xx}^{H}\mathbf{\alpha}+ \mathbf{\alpha}^{H} \mathbf{r}_{xx} + \mathbf{\alpha}^{H} R_{xx}\mathbf{\alpha}

Because \mathbf{\alpha}= -R^{-1}_{xx}\mathbf{r}_{xx} the mean squared error can be reduced to:
MSE=r_{xx}[0]- \mathbf{r}_{xx}^{H}\mathbf{R}^{-1}_{xx}\mathbf{r}_{xx}- \left(\mathbf{R}^{-1}_{xx}\mathbf{r}_{xx}\right)^{H} \mathbf{r}_{xx}
 + \left(\mathbf{R}^{-1}_{xx}\mathbf{r}_{xx}\right)^{H} \mathbf{R}_{xx}\mathbf{R}^{-1}_{xx}\mathbf{r}_{xx}
=r_{xx}[0]- \mathbf{r}_{xx}^{H}\mathbf{R}^{-1}_{xx}\mathbf{r}_{xx}-\mathbf{r}_{xx}^{H}  \left(\mathbf{R}^{-1}_{xx}\right)^{H}\mathbf{r}_{xx} + \mathbf{r}_{xx}^{H}  \left(\mathbf{R}^{-1}_{xx}\right)^{H}\mathbf{r}_{xx}
=r_{xx}[0]- \mathbf{r}_{xx}^{H}\mathbf{R}^{-1}_{xx}\mathbf{r}_{xx}
=r_{xx}[0]+\sum\limits_{k=1}^{p}\alpha_{k}r_{xx}[-k]

We note that the last formula was obtained by replacing -R^{-1}_{xx}\mathbf{r}_{xx}=\mathbf{\alpha} , r^{\ast}_{xx}[k]=r_{xx}[-k] and writing out the resulting inner product. The last equation is the same as the one given at (3) in [1, p. 157, eq. 6.5 ]. Using the orthogonality principle the result can be found even faster because:
MSE=E\left\{(x[n]+\sum\limits_{k=1}^{p}\alpha_{k}x[n-k])^{\ast}(x[n]+\sum\limits_{k=1}^{p}\alpha_{k}x[n-k])\right\}
=E\left\{x^{\ast}[n]\left( x[n]+\sum\limits_{k=1}^{p}\alpha_{k}x[n-k] \right)\right\}
 +\sum\limits_{l=1}^{p}\alpha_{l}E\left\{x^{\ast}[n-l]\left( x[n]+\sum\limits_{k=1}^{p}\alpha_{k}x[n-k] \right)\right\}

Applying the orthogonality principle to the last term of the previous equation we obtain E\left\{x^{\ast}[n-l]\left( x[n]+\sum_{k=1}^{p}\alpha_{k}x[n-k] \right)\right\} =0, and thus the MSE is equal to:
MSE=E\left\{x^{\ast}[n]\left( x[n]+\sum\limits_{k=1}^{p}\alpha_{k}x[n-k] \right)\right\}
=r_{xx}[0]+\sum\limits_{k=1}^{p}\alpha_{k}r_{xx}[-k]

The previous equation is again the same as the one given at (3) in [1, p. 157, eq. 6.5 ]. We have thus proven using the orthogonality principle [1, p. 157, eq. 6.4 ] and [1, p. 157, eq. 6.5 ]. QED.

[1] Steven M. Kay: “Modern Spectral Estimation – Theory and Applications”, Prentice Hall, ISBN: 0-13-598582-X.
[2] Chatzichrisafis: “Solution of exercise 3.10 from Kay’s Modern Spectral Estimation - Theory and Applications”.