and the physical realizability(Paley-Wiener)criterion In S(0) (16.9) there exists a unique function H(z)that is analytic together with its inverse in 2< 1(minimum phase fac such that z <1 S_ ( 0)= lim I H(re/)P=l H(e/),ae H(z) is known as the wiener factor associated with S,_ (0)and as Eq (16.5 )shows, when driven by white noise, it generates a stochastic process x nt) from past samples and its power spectral density matches with the given In this context, given a finite set of autocorrelations To, fi,..., Tm, the spectral extension problem is to obtain the class of all extensions that match the given data, i. e, such an extension K(e)must automatically satisfy K(6)≥0 K(e)e/de k=0 in addition to satisfying Eqs. (16.8)and (16.9) The solution to this problem is closely related to the trigonometric moment problem, and it has a long and continuing history through the works of Schur [1917]; Nevanlinna, Akheizer and Krein [Akheizer and Krein 1962] Geronimus [1954]; and Shohat and Tamarkin[ 1970), to name a few. If the given autocorrelations are such that the matrix T, in Eq (16.4)is singular, then there exists an m such that Tm-I is positive definite (Tm-1>0)and Tm is singular [det Tm.=0, det(- )representing the determinant of ()) In that case, there exists a unique vector X=(xo, x,,..., xm)such that Tm X=0 and further, the autocorrelations have a unique extension given by (16.11) where ejei, i=1- m are the m zeros of the polynomial x+x,2+.+ xm zm and Pi >0. This gives m-1=A P T A (16.12) 00….P c 2000 by CRC Press LLC
© 2000 by CRC Press LLC and the physical realizability (Paley–Wiener) criterion (16.9) there exists a unique function H(z) that is analytic together with its inverse in *z* < 1 (minimum phase factor) such that (16.10) and H(z) is known as the Wiener factor associated with Sx(q) and as Eq. (16.5) shows, when driven by white noise, it generates a stochastic process x(nT) from past samples and its power spectral density matches with the given Sx(q). In this context, given a finite set of autocorrelations r0 , r1 , . . ., rn, the spectral extension problem is to obtain the class of all extensions that match the given data, i.e., such an extension K(q) must automatically satisfy K(q) ³ 0 and in addition to satisfying Eqs. (16.8) and (16.9). The solution to this problem is closely related to the trigonometric moment problem, and it has a long and continuing history through the works of Schur [1917]; Nevanlinna, Akheizer and Krein [Akheizer and Krein, 1962]; Geronimus [1954]; and Shohat and Tamarkin [1970], to name a few. If the given autocorrelations are such that the matrix Tn in Eq. (16.4) is singular, then there exists an m £ n such that Tm – 1 is positive definite (Tm – 1 > 0) and Tm is singular [det Tm = 0, det (.) representing the determinant of (.)]. In that case, there exists a unique vector X = (x0 , x1 , . . ., xm)T such that Tm X = 0 and further, the autocorrelations have a unique extension given by (16.11) where ejqi , i = 1 Æ m are the m zeros of the polynomial x0 + x1 z + . . . + xm zm and Pi > 0. This gives (16.12) ln ( ) S d x q q p p > -• -Ú Hz bz z k k k () , = < = • Â * * 1 0 S H re H e a e x r j j q q q ( ) = = Æ - lim1 0 2 2 | ( )| | ( )| , . . 1 2 0 p q q q p p K ed r k n j k k () , -Ú = =Æ c Pe k k i j k i m i = = Æ• = Â q , * * 0 1 TA A m m P P P – ... ... ... * 1 1 2 0 0 0 0 0 0 = Ê Ë Á Á Á Á ˆ ¯ ˜ ˜ ˜ ˜ M M OM
where a is an m x m Vandermonde matrix given by 入1 入 A=222 入;=e,i=1→m and Eq (16.12)can be used to determine Pk>0,k=1-m. The power spectrum associated with Eq (16.11) s(e)=∑P(0- and it represents a discrete spectrum that corresponds to pure uncorrelated sinusoids with signal powers P If the given autocorrelations satisfy T, >0, from Eq (16. 4), every unknown Tg, k2n+ I, must be selected so as to satisfy Tk>0, and this gives rx+1-2≤R2 where 5k=f r)and r=det T, /det t - From Eq.(16.13), the unknowns could be anywhere inside a sequence of circles with center 5k and radius Ri and as a result there are an infinite number of solutions to this problem. Schur and Nevanlinna have given an analytic characterization to these solutions in terms of bounded function extensions. A bounded function p(z)is analytic in z< 1 and satisfies the inequality p(a)s 1 everywhere in z<1 In a network theory context, Youla [1980] has also given a closed form parametrization to this class of solutions. In that case, given ro, ri,..., ru, the minimum phase transfer functions satisfying Eqs. (16. 8)and (16.9)are given by (16.14) P(z)-zp(z)P(z) where p(z) is an arbitrary bounded function that satisfies the inequality(paley-Wiener criterion) ∫nm and r(z) is the minimum phase factor obtained from the factorization -lp(e)2=r(e)2 Further, Pn(z) represents the Levinson polynomial generated from ro -r through the recursion sP(z)=Pn2(2)-xnP2-(2) c 2000 by CRC Press LLC
© 2000 by CRC Press LLC where A is an m ¥ m Vandermonde matrix given by and Eq. (16.12) can be used to determine Pk > 0, k = 1 Æ m. The power spectrum associated with Eq. (16.11) is given by and it represents a discrete spectrum that corresponds to pure uncorrelated sinusoids with signal powers P1 , P2 , …, Pm . If the given autocorrelations satisfy Tn > 0, from Eq. (16.4), every unknown rk , k ³ n + 1, must be selected so as to satisfy Tk > 0, and this gives *rk+1 – zk* 2 £ R2 k (16.13) where zk = f T k T – k 1 bk , fk = (r1, r2 , . . ., rk )T , bk = (rk , rk –1 , . . ., r1 ) and Rk = det Tk /det Tk – 1 . From Eq. (16.13), the unknowns could be anywhere inside a sequence of circles with center zk and radius Rk, and as a result, there are an infinite number of solutions to this problem. Schur and Nevanlinna have given an analytic characterization to these solutions in terms of bounded function extensions. A bounded function r(z) is analytic in *z* < 1 and satisfies the inequality *r(z)* £ 1 everywhere in *z* < 1. In a network theory context, Youla [1980] has also given a closed form parametrization to this class of solutions. In that case, given r0 , r1 , . . ., rn , the minimum phase transfer functions satisfying Eqs. (16.8) and (16.9) are given by (16.14) where r(z) is an arbitrary bounded function that satisfies the inequality (Paley-Wiener criterion) and G(z) is the minimum phase factor obtained from the factorization 1 – *r(ejq)* 2 = *G(ejq)* 2 Further, Pn(z) represents the Levinson polynomial generated from r0 Æ rn through the recursion A = Ê Ë Á Á Á Á Á Á ˆ ¯ ˜ ˜ ˜ ˜ ˜ ˜ = =Æ - 11 1 1 1 2 1 2 2 2 2 1 1 2 1 1 ... ... ... ... ... , , – – ll l ll l ll l l q m m m m m m i j ei m i MM M S Pk k k m () ( ) q dq q = - = Â 1 H z z P z z zP z n n r r ( ) ( ) ( )– ( ) ˜ ( ) = G ln ( ) -Ú - [ ] > -• p p q 1 r q 2 * * e d j 1 2 1 1 - s P z P z zs P z n n n nn ( ) = - - ( ) - ( ) ˜
that starts with Po(z)=1//ro,where rk2 (16.15) represents the reflection coefficient at stage n. Here, I I, denotes the coefficient of z" in the expansion I,and P(z)4 z P*(1/z) represents the polynomial reciprocal to Pn(z). Notice that the given information ro -rn enters Pn(z)through Eq. (16.5). The power spectral density ()=|He)2 associated with Eq (16.14)satisfies all the interpolation properties described before. In Eq (16.14), the solution p(z)=0 gives H(2)=1/P,(z), a pure AR(n) system that coincides with Burg's maximum entropy extension. Clearly, if Hp(z)is rational, then p(z) must be rational and, more interestingly, every rational system must follow from Eq (16.14)for a specific rational bounded function p(z) Of course, the choice of p(z) brings in extra freedom, and this can be profitably used for system identification as well as rational and stable approxi mation of nonrational systems [Pillai and Shim, 1993] Defining Terms Autocorrelation function: The expected value of the product of two random variables generated from a random process for two time instants; it represents their interdependence. Expected value(or mean) of a random variable: Ensemble average value of a random variable that is given by integrating the random variable after scaling by its probability density function(weighted average ver the entire range Power spectrum: A nonnegative function that describes the distribution of power versus frequency. For wide sense stationary processes, the power spectrum and the autocorrelation function form a Fourier transform Probability density function: The probability of the random variable taking values between two real numbers x, and x2 is given by the area under the nonnegative probability density function between those two points. Random variable: A continuous or discrete valued variable that maps the set of all outcomes of an experiment into the real line(or complex plane). Because the outcomes of an experiment are inherently randor the final value of the variable cannot be predetermined. Stochastic process: A real valued function of time t, which for every fixed t behaves like a random variable. Related Topics 14.1 Fourier Transforms.40.2 Spectrum, Specifications, and Measurement Techniques.73.3 Stochastic References NI. Akheizer and M. Krein, Some Questions in the Theory of Moments, American Math. Soc. Monogr., 2, 1962. J.B. J. Fourier, Theorie Analytique de la Chaleur(Analytical Theory of Heat ), Paris, 1822. L. Y. Geronimus, Polynomials Orthogonal on a Circle and Their Applications, American Math. Soc., Translation, 104,1954. Newton, Philos. Trans., vol IV, P 3076, 1671 S U. Pillai and T I. Shim, Spectrum Estimation and System Identification, New York: Springer-Verlag, 1993 L. Schur, "Uber Potenzreihen, die im Innern des Einheitzkreises Beschrankt Sind, Journal fur Reine und Angewandte Mathematik, vol. 147, PP. 205-232, 1917. JA. Shohat and J D. Tamarkin, The Problem of Moments, American Math. Soc., Math. Surveys, 1,1970 c 2000 by CRC Press LLC
© 2000 by CRC Press LLC that starts with P0(z) = 1/ , where (16.15) represents the reflection coefficient at stage n. Here, { }n denotes the coefficient of zn in the expansion { }, and ~ Pn(z) D = znP* n (1/z*) represents the polynomial reciprocal to Rn(z). Notice that the given information r0 Æ rn enters Pn(z) through Eq. (16.5). The power spectral density K(q) = *Hr(ej q)* 2 associated with Eq. (16.14) satisfies all the interpolation properties described before. In Eq. (16.14), the solution r(z) [ 0 gives H(z) = 1/Pn(z), a pure AR(n) system that coincides with Burg’s maximum entropy extension. Clearly, if Hr(z) is rational, then r(z) must be rational and, more interestingly, every rational system must follow from Eq. (16.14) for a specific rational bounded function r(z). Of course, the choice of r(z) brings in extra freedom, and this can be profitably used for system identification as well as rational and stable approximation of nonrational systems [Pillai and Shim, 1993]. Defining Terms Autocorrelation function: The expected value of the product of two random variables generated from a random process for two time instants; it represents their interdependence. Expected value (or mean) of a random variable: Ensemble average value of a random variable that is given by integrating the random variable after scaling by its probability density function (weighted average) over the entire range. Power spectrum: A nonnegative function that describes the distribution of power versus frequency. For wide sense stationary processes, the power spectrum and the autocorrelation function form a Fourier transform pair. Probability density function: The probability of the random variable taking values between two real numbers x1 and x2 is given by the area under the nonnegative probability density function between those two points. Random variable: A continuous or discrete valued variable that maps the set of all outcomes of an experiment into the real line (or complex plane). Because the outcomes of an experiment are inherently random, the final value of the variable cannot be predetermined. Stochastic process: A real valued function of time t, which for every fixed t behaves like a random variable. Related Topics 14.1 Fourier Transforms • 40.2 Spectrum, Specifications, and Measurement Techniques • 73.3 Stochastic Processes References N.I. Akheizer and M. Krein, Some Questions in the Theory of Moments, American Math. Soc. Monogr., 2, 1962. J.B.J. Fourier, Theorie Analytique de la Chaleur (Analytical Theory of Heat), Paris, 1822. L. Y. Geronimus, Polynomials Orthogonal on a Circle and Their Applications, American Math. Soc., Translation, 104, 1954. I. Newton, Philos. Trans., vol. IV, p. 3076, 1671. S.U. Pillai and T.I. Shim, Spectrum Estimation and System Identification, New York: Springer-Verlag, 1993. I. Schur, “Uber Potenzreihen, die im Innern des Einheitzkreises Beschrankt Sind,” Journal fur Reine und Angewandte Mathematik, vol. 147, pp. 205–232, 1917. J.A. Shohat and J.D. Tamarkin, The Problem of Moments, American Math. Soc., Math. Surveys, 1, 1970. r0 s P z rz P nn k k k n n = n Ï Ì Ô Ó Ô ¸ ˝ Ô ˛ = Ô – – 1() () Â 1 1 0
N. Wiener"Generalized harmonic analysis, Acta Math., vol 55, pp. 117-258, 1930 D.C. Youla, The FEE: A New Tunable High-Resolution Spectral Estimator: Part L, Technical note, no 3, Dept of Electrical Engineering, Polytechnic Institute of New York, Brooklyn, New York; also RADC Report, RADC-TR-81-397,ADA114996,1982,1980. G U. Yule, On a method of investigating periodicities in disturbed series, with special reference to wolfer unspot numbers, Philos. Trans. R. Soc. London, Ser. A, vol. 226, Pp. 267-298, 1927. 16.2 Parameter estimation Stella n batalama and dimitri kazakos Parameter estimation is the operation of assigning a value in a continuum of alternatives to an unknown parameter based on a set of observations involving some function of the parameter. Estimate is the value assigned to the parameter and estimator is the function of the observations that yields the estimate. The basic elements in the parameter estimation are a vector parameter gm, a vector space & m where em takes its values, a stochastic process X(r) parameterized by m and a performance criterion or cost function. The estimate 8(x")based on the observation vector x"=[x, x2,., xm] is a solution of some optimization proble according to the performance criterion. In the following, the function f(x"em)will denote the conditional joint probability density function of the random variables xr,,x- There are several parameter estimation schemes. If the process X(r) is parametrically known, i.e., if its conditional joint probability density functions are known for each fixed value em of the vector parameter 8 then the corresponding parameter estimation scheme is called parametric. If the statistics of the process X(t) are nonparametrically described, i.e, given 8E &m any joint probability density function of the process is a member of some nonparametric class of probability density functions, then the nonparametric estimation schemes arise Let In denote the n-dimensional observation space. Then an estimator e(xn)of a vector parameter em is a function from the observation space, r", to the parameter space, &m. Since this is a function of random variables, it is itself a random variable (or random vector) There are certain stochastic properties of estimators that quantify somehow their quality. In this stimator is said to be unbiased if its expected value is the true parameter value, i. e, if Ee{6m(x")}=θ′ where the subscript 0 on the expectation symbol denotes that the expectation is taken according to the probability density function f(xn0m) In the case where the observation space is the m and the parameter is a scalar, it Eo10(x"))=0(x")f(x e)dx The bias of the estimate is the Euclidean norm m-E(0m(x))/2. Thus, the bias measures the distancebetween the expected value of the estimate and the true value of the parameter. Clearly, the estimator is unbiased when the bias is zero 6. Usually it is of interest to know the conditional variance of an unbiased estimate. The bias of the estimate x")and the conditional variance Fl6(x")-E6叫(x")3吗 generally represent a trade-off. Indeed, an unbiased estimate may induce relatively large variance On the other hand, the introduction of some low-level bias may then result in a significant reduction of the induced variance. c 2000 by CRC Press LLC
© 2000 by CRC Press LLC N. Wiener “Generalized harmonic analysis,” Acta Math., vol. 55, pp. 117–258, 1930. D.C. Youla, “The FEE: A New Tunable High-Resolution Spectral Estimator: Part I,” Technical note, no. 3, Dept. of Electrical Engineering, Polytechnic Institute of New York, Brooklyn, New York; also RADC Report, RADC-TR-81-397, AD A114996, 1982, 1980. G.U. Yule, “On a method of investigating periodicities in disturbed series, with special reference to Wolfer’s sunspot numbers,” Philos. Trans. R. Soc. London, Ser. A, vol. 226, pp. 267–298, 1927. 16.2 Parameter Estimation Stella N. Batalama and Dimitri Kazakos Parameter estimation is the operation of assigning a value in a continuum of alternatives to an unknown parameter based on a set of observations involving some function of the parameter. Estimate is the value assigned to the parameter and estimator is the function of the observations that yields the estimate. The basic elements in the parameter estimation are a vector parameter qm, a vector space %m where qm takes its values, a stochastic process X(t) parameterized by qm and a performance criterion or cost function. The estimate q ^ m(xn) based on the observation vector xn = [x1 , x2 , ...,xn] is a solution of some optimization problem according to the performance criterion. In the following, the function f(xn *qm) will denote the conditional joint probability density function of the random variables x1,...,xn. There are several parameter estimation schemes. If the process X(t) is parametrically known, i.e., if its conditional joint probability density functions are known for each fixed value qm of the vector parameter qm, then the corresponding parameter estimation scheme is called parametric. If the statistics of the process X(t) are nonparametrically described, i.e., given qm Œ %m any joint probability density function of the process is a member of some nonparametric class of probability density functions, then the nonparametric estimation schemes arise. Let Gn denote the n-dimensional observation space. Then an estimator q ^ (xn) of a vector parameter qm is a function from the observation space, Gn, to the parameter space,%m. Since this is a function of random variables, it is itself a random variable (or random vector). There are certain stochastic properties of estimators that quantify somehow their quality. In this sense an estimator is said to be unbiased if its expected value is the true parameter value, i.e., if Eq{q ^m(xn)} = qm where the subscript q on the expectation symbol denotes that the expectation is taken according to the probability density function f(x n *qm). In the case where the observation space is the ¬n and the parameter is a scalar, it is The bias of the estimate is the Euclidean norm **qm – Eq{qm(xn)}**1/2. Thus, the bias measures the distance between the expected value of the estimate and the true value of the parameter. Clearly, the estimator is unbiased when the bias is zero. Usually it is of interest to know the conditional variance of an unbiased estimate. The bias of the estimate q ^ m(xn ) and the conditional variance Eq{**q ^ m(xn) – Eq{q ^ m(xn)}**2 *qm} generally represent a trade-off. Indeed, an unbiased estimate may induce relatively large variance. On the other hand, the introduction of some low-level bias may then result in a significant reduction of the induced variance. E x f x dx n R n n m n n q {q q q ˆ( )} ˆ x = ( ) ( ) Ú *
In general, the bias versus variance trade-off should be studied carefully for the correct evaluation of any given parameter estimate. A parameter estimate is called efficient if the conditional variance equals a lower bound known as the Rao-Cramer bound It will be useful to present briefly this boun The Rao-Cramer bound gives a theoretical mum for the variance of any estimate More specifically, let e(x") be the estimate of a scalar parameter e given the observation vector x". Let f(xle)be given,twice ntinuously differentiable with respect to 0, and satisfy also some other mild regularity conditions. Then, E。16x2)-}≥E。11ogf(x|) Sometimes we need to consider the case where the sample size n increases to infinity. In such a case, an estimator is said to be consistent if 6m(x")→masn→∞ Since the estimate 0m(x")is a random variable, we have to specify in what sense the above holds.Thus, if the above limit holds w.p. 1, we say that 0m(x) is strongly consistent or consistent w.P. I. In a similar way we can define a weakly consistent estimator As far as the asymptotic distribution of A(x)as n - oo is concerned, it turns out that the central limit theorem can often be applied to e(x)to infer that n[e(x")-e] is asymptotically normal with zero mean In order to examine certain parameter estimation schemes we need first to present the definition of some related functions. Penalty or cost function dem, 0m(xn) is a scalar, nonnegative function whose values vary as g m varies in the parameter space (m and as the sequence x" takes different values in the observation space, T". Conditional expected penalty dem, e m)induced by the parameter estimate and the penalty function is a function defined as follows: de, 0 m)= eec[em,0"(x"II If an a priori density function p(e")is available, then the expected penalty de m, p) can be evaluated The various existing parameter estimation schemes evolve as the solutions of optimization problems whose objective function is either the conditional expected penalty or the conditional density function f(xylem) Bayesian Estimation Scheme In the Bayesian estimation scheme the available assets are a parametrically known stochastic process parameterized by em, in other words, a given co joint density function f(x 0m)defined on the observation space r", where 0m is a well-defined vector 2. A realization x from the underlying active process, where the implied assumption is that the process remains unchanged throughout the whole observation period 3. A density function p(e )defined on the parameter space & m. 4. For each data sequence x", parameter vector Amand parameter estimate 8"(x"), a penalty scalar function c[e, em(x) is given. 5. A performance criterion which is the minimization of the expected penalty c(em, P) The estimate Ao that minimizes the expected penalty is called optimal Bayesian estimate at p. Under some mild conditions the optimal Bayesian estimate Bm(x)is the conditional expectation E(0mlxk c 2000 by CRC Press LLC
© 2000 by CRC Press LLC In general, the bias versus variance trade-off should be studied carefully for the correct evaluation of any given parameter estimate. A parameter estimate is called efficient if the conditional variance equals a lower bound known as the Rao-Cramèr bound. It will be useful to present briefly this bound. The Rao-Cramèr bound gives a theoretical minimum for the variance of any estimate. More specifically, let q ^ (xn) be the estimate of a scalar parameter q given the observation vector xn. Let f(x n *q) be given, twice continuously differentiable with respect to q, and satisfy also some other mild regularity conditions. Then, . Sometimes we need to consider the case where the sample size n increases to infinity. In such a case, an estimator is said to be consistent if q ^ m(xn) Æ qm as n Æ • Since the estimate q ^ m(xn) is a random variable, we have to specify in what sense the above holds. Thus, if the above limit holds w.p. 1, we say that q ^ m(xn) is strongly consistent or consistent w.p. 1. In a similar way we can define a weakly consistent estimator. As far as the asymptotic distribution of q(xn) as n Æ • is concerned, it turns out that the central limit theorem can often be applied to q ^ (xn) to infer that [q ^ (xn) – q] is asymptotically normal with zero mean as n Æ •. In order to examine certain parameter estimation schemes we need first to present the definition of some related functions. Penalty or cost function c[qm, q ^ m(xn)] is a scalar, nonnegative function whose values vary as qm varies in the parameter space E m and as the sequence xn takes different values in the observation space, Gn. Conditional expected penalty c(qm, q ^ m) induced by the parameter estimate and the penalty function is a function defined as follows: c(qm,q ^ m) = Eq{c[qm, q ^ m(xn)]} If an a priori density function p(qm) is available, then the expected penalty c(q ^ m, p) can be evaluated. The various existing parameter estimation schemes evolve as the solutions of optimization problems whose objective function is either the conditional expected penalty or the conditional density function f(xn*qm). Bayesian Estimation Scheme In the Bayesian estimation scheme the available assets are: 1. A parametrically known stochastic process parameterized by qm, in other words, a given conditional joint density function f(xn*qm) defined on the observation space Gn, where qm is a well-defined parameter vector. 2. A realization xn from the underlying active process, where the implied assumption is that the process remains unchanged throughout the whole observation period. 3. A density function p(qm) defined on the parameter space %m. 4. For each data sequence xn, parameter vector qm and parameter estimate q ^ m(xn), a penalty scalar function c[qm, q ^ m(xn)] is given. 5. A performance criterion which is the minimization of the expected penalty c(qm, p). The estimate q ^ m 0 that minimizes the expected penalty is called optimal Bayesian estimate at p. Under some mild conditions the optimal Bayesian estimate q ^ m 0 (xn ) is the conditional expectation E{qm*xn }. E E f x n n q q q q ¶ ¶q q ˆ( ) – log ( ) – x [ ] Ï Ì Ó ¸ ˝ ˛ ³ È Î Í Í ˘ ˚ ˙ ˙ Ï Ì Ô Ó Ô ¸ ˝ Ô ˛ Ô 2 2 1 * n