The desire to predict the complex WSS random process based on the sample x[n-1] by using a linear predictor
\hat{x}[n]=-\alpha_{1}x[n-1] (1)

is expressed in [1, p. 61 exercise 3.10]. It is asked to chose \alpha_{1} to minimize the MSE or prediction error power
MSE = \mathcal{E}\left\{\left| x[n] -\hat{x}[n] \right|^{2} \right\}. (2)

We are asked to find the optimal prediction parameter \alpha_{1} and the minimum prediction error power by using the orthogonality principle.
Solution: Using the orthogonality principle [1, p.51, eq. 3.38] for the estimation of x[n] translates into finding the \alpha_{1} for which the observed data x[n-1] will be normal to the error
E\left\{ x[n-1]\left(x[n]+a_{1}x[n-1]\right)^{H}\right\}=0
r_{xx}[-1]+a^{\ast}_{1}r_{xx}[0]=0
a_{1}=-\frac{r^{\ast}_{xx}[-1]}{r_{xx}[0]}. (3)

Considering (3), the mean squared error can be written as:
MSE=E\left\{(x[n]+a_{1}x[n-1])(x[n]+a_{1}x[n-1])^{\ast}\right\}
=r_{xx}[0]+a_{1}r_{xx}[-1]+a^{\ast}_{1}r_{xx}[1]+|a_{1}|^{2}r_{xx}[0]
=r_{xx}[0]+a_{1}r_{xx}[-1]+a^{\ast}_{1}r^{\ast}_{xx}[-1]+|a_{1}|^{2}r_{xx}[0]
=r_{xx}[0]- \frac{|r_{xx}[-1]|^{2}}{r_{xx}[0]}-\frac{|r_{xx}[-1]|^{2}}{r_{xx}[0]}+\frac{|r_{xx}[-1]|^{2}}{r_{xx}[0]}
=r_{xx}[0]- \frac{|r_{xx}[-1]|^{2}}{r_{xx}[0]} (4)

This result can also be obtained by the equations [1, eq. 3.36, eq. 3.37]. They provide the solution for the optimal prediction parameter of a linear predictor \hat{\theta}=-\sum_{i=0}^{N-1}\beta_{i}^{\ast}x_{i} , that minimizes the MSE and the minimum MSE. Note that in the book \mathbf{C}_{xx} is used instead of \mathbf{R}_{xx} and \sigma_{\theta},\sigma_{x} instead of r_{\theta \theta}[0] and r_{xx}[0] . This is only correct for a zero mean random process x[n], as it is assumed in the derivation of the formula in the book. The formula for the optimal coefficients is thus:
\mathbf{\hat{\beta}}=-\mathbf{R}_{xx}^{-1}\mathbf{r}_{\theta x}.

The minimum MSE is for a general signal x[n] is equal to:
MSE_{MIN}=r_{\theta\theta}[0]+\mathbf{r}_{\theta x}^{H}\mathbf{\hat{\beta}}.

Translating the formulas to the notation of the exercise, we obtain \theta =x[n],\mathbf{x}=x[n-1], \beta_{1}=\alpha^{\ast}_{1} and \mathbf{r}_{\theta x}=E\left\{\mathbf{x}\theta^{H}\right\}=E\left\{x^{\ast}[n]x[n-1]\right\}=r_{xx}[-1], \mathbf{R}_{xx}=E\left\{ x^{\ast}[n-1]x[n-1]\right\}=r_{xx}[0] and for a zero mean process r_{xx}[0]=\sigma_{\theta}^{2}=\sigma_{x}^{2}. The optimal prediction parameter \beta_{1}=\alpha^{\ast}_{1} is thus given by:
\alpha_{1}=-\frac{r^{\ast}_{xx}[-1]}{r_{xx}[0]} (5)

while the minimum MSE is given by:
MSE_{MIN}=r_{xx}[0]+r^{\ast}_{xx}[-1] \alpha^{\ast}_{1}
=r_{xx}[0]-  \frac{r^{\ast}_{xx}[-1] r_{xx}[-1]}{r_{xx}[0]}
=r_{xx}[0]-  \frac{|r_{xx}[-1]|^{2}}{r_{xx}[0]} (6)

Which is equal to the solution that was obtained using the orthogonality principle (4).

[1] Steven M. Kay: “Modern Spectral Estimation – Theory and Applications”, Prentice Hall, ISBN: 0-13-598582-X.