MODEL-DATASYNTHESISINTERRESTRIALCARBONOBSERVATION383(Raupach, 2001), thus, z=Hy+noise.For now, thedata and the uncertainties are completely inseparable intheformalism.Toput thepoint provocatively,noiseisassumedtobeGaussianwithzeromeanandnotemporal correlation,and thuscompletely characterisedproviding data and allowing another researcher toby an observation error covariancematrix [Covz].Byprovide the uncertainty is indistinguishable fromminimizing Janalytically,one obtains the expressionallowing the second researcher to make up the data in(Tarantola,1987,p.196; Enting,2002):the first place.This realization informs the emphasis onuncertaintythroughout this paper.=y+[CovH'[Covz]-'(z-Hy),(4)where [Cov y], the estimated error covariance of the aposterioriestimatey,isgiven byAlgorithmsfornonsequential problems[Cov y]-1 = [Cov y]-1 + H'[Covz]-"H.(5)The task in general is to find the target variables ywhich minimize J(y). Clearly, the shape of J(y) is allThese expressions already tell us some importantimportant:itmay havea singleminimum or multiplethings. The posterior estimates are given by the priorseparated local minima, only one of which is the trueestimates plus a term depending on themismatchglobal minimum.Neartheminimum,Jmaybe shapedbetween the experimental observations and thelike a long,narrowellipsoidal valley.If this valley hasaobservations as predicted bythe prior estimates.Thisflatfloortracing out some line iny space,thenall pointsmismatch is weighted by our confidence in thealong that line are equally acceptable and these yobservations,[Covz]-1.Thus,observations with littlecoordinates cannot be distinguished in terms ofweight hardly shiftthe posteriorestimate from theoptimality,so such combinations of target variablesprior, and vice versa.Furthermore, the transpose of thecannot be resolved by model-data synthesis with theobservation operator (H) multiplies the weightedavailable data and model.Diagnostic indicators aboutmismatch. If this operator is very weak, that is if thethese issues are provided by the Hessian or curvatureavailableobservations are only weakly related tothematrix D='l/Qyjoyx,a measure of the local curvaturetargetvariables,then the update to the initial estimate isofJ(y).Thedegreeof orthogonalityamongcolumnsofalso small.Finallythe posterior covariance [Cov y] (EqnD indicates the extent to which it is possible to find a(5)) is bounded above,in some sense,by the priorunique local minimumtoJ(y)inthevicinityofthepointcovariance [Covy]. If the prior covariance is smallat which D is evaluated. A high ‘condition number'(suggestingsubstantial confidence in theinitial(ratioof largestto smallest eigenvalue)forDindicatesestimate)then the increment y-y (the differencethat some linear combination(s)of the columns of Darebetween the posteriorand prior estimates, a measurenearly zero, that is, that the curvature is nearly zero inof the information added by the observations z, andsomedirection(s),sothattheminimizationproblemisequal to the second term in Eqn (4) in the present case)ill-conditioned, as in the case of a valley witha flat floor.is also small.Giventheseconsiderations,classes ofmethod forAll the above is reasonable. More surprising is thefinding theminimum in J(y)include thefollowingrelationship between the data, its uncertainty and the1.Analytic solution is possiblewhenthe observationcost function.We candecompose the (positivedefinite)operator h(y) is linear (z=Hy +noise). In this case I(y)matrix [Covz]-1 into a matrix product A'A, using theis a quadratic form shaped like a parabolic bowl, andCholeskydecompositionforapositivedefinitematrix.theminimization can be carried out analytically as inFor a diagonal covariance matrix, diag [], thethe example of Eqns (4) and (5). This 'direct or one-decomposition is trivial:A=diag [1/2].Likewise, westep' solution is highly efficient when applicable;can write [Cov y]-1 =B'B. The cost function, Eqn (3),however,mostproblemsarenonlinearand requireacanthenberewrittenasnonlinearmethod.J(y) = (a - Ah(y)*(a - Ah(y)2. Gradient descent algorithms arethe most familiar+(b - b)(b - b)search algorithms for nonlinear optimization. They(6)include (for example) steepest-descent, conjugate-gra-where a=Az, b=By and b=By. Thus the costdient, quasi-Newton and Levenberg-Marquardt algo-function, and thence the entire minimization,takes arithms (Press et al.,1992).Gradient-descent methods areformin which neither theobservations nor the prioreasily implemented, provided that the gradient vectorestimatesappear; they are replaced byquantities a andVJ=OJ/Oykcanbe calculated.Themain advantages ofb scaled by the square roots of the inverse covariancegradient-descent algorithms are relative simplicity andmatrices, which are measures of confidence.This is nolowcost;themaindisadvantageisthatifthesurfacemathematical nicety;rather it demonstrates that theJ(y)hasmultipleminima,theytendtofind local2005 Blackwell Publishing Ltd, Global Change Biology,11, 378-397
(Raupach, 2001), thus, z 5 Hy 1 noise. For now, the noise is assumed to be Gaussian with zero mean and no temporal correlation, and thus completely characterised by an observation error covariance matrix [Cov z]. By minimizing J analytically, one obtains the expression (Tarantola, 1987, p. 196; Enting, 2002): y ^ ¼ y _ þ Cov y ^ h iHT½ Cov z 1 z Hy_ ; ð4Þ where ½Cov y ^, the estimated error covariance of the a posteriori estimate y ^, is given by ½Cov y ^ 1 ¼ ½Cov y _ 1 þ HT½Cov z 1 H: ð5Þ These expressions already tell us some important things. The posterior estimates are given by the prior estimates plus a term depending on the mismatch between the experimental observations and the observations as predicted by the prior estimates. This mismatch is weighted by our confidence in the observations, [Cov z] 1 . Thus, observations with little weight hardly shift the posterior estimate from the prior, and vice versa. Furthermore, the transpose of the observation operator (HT ) multiplies the weighted mismatch. If this operator is very weak, that is if the available observations are only weakly related to the target variables, then the update to the initial estimate is also small. Finally the posterior covariance ½Cov y ^ (Eqn (5)) is bounded above, in some sense, by the prior covariance ½Cov y _. If the prior covariance is small (suggesting substantial confidence in the initial estimate) then the increment y ^ y _ (the difference between the posterior and prior estimates, a measure of the information added by the observations z, and equal to the second term in Eqn (4) in the present case) is also small. All the above is reasonable. More surprising is the relationship between the data, its uncertainty and the cost function. We can decompose the (positive definite) matrix [Cov z] 1 into a matrix product AT A, using the Cholesky decomposition for a positive definite matrix. For a diagonal covariance matrix, diag ½s2 m, the decomposition is trivial: A 5 diag ½1=s2 m. Likewise, we can write ½Cov y _ 1 5 BT B. The cost function, Eqn (3), can then be rewritten as JðyÞ¼ða AhðyÞÞTða AhðyÞÞ þðb b _ Þ Tðb b _ Þ; ð6Þ where a 5 Az, b 5 By and b _ ¼ By_. Thus the cost function, and thence the entire minimization, takes a form in which neither the observations nor the prior estimates appear; they are replaced by quantities a and b scaled by the square roots of the inverse covariance matrices, which are measures of confidence. This is no mathematical nicety; rather it demonstrates that the data and the uncertainties are completely inseparable in the formalism. To put the point provocatively, providing data and allowing another researcher to provide the uncertainty is indistinguishable from allowing the second researcher to make up the data in the first place. This realization informs the emphasis on uncertainty throughout this paper. Algorithms for nonsequential problems The task in general is to find the target variables y which minimize J(y). Clearly, the shape of J(y) is all important: it may have a single minimum or multiple separated local minima, only one of which is the true global minimum. Near the minimum, J may be shaped like a long, narrow ellipsoidal valley. If this valley has a flat floor tracing out some line in y space, then all points along that line are equally acceptable and these y coordinates cannot be distinguished in terms of optimality, so such combinations of target variables cannot be resolved by model–data synthesis with the available data and model. Diagnostic indicators about these issues are provided by the Hessian or curvature matrix D 5 @2 J/@yj@yk, a measure of the local curvature of J(y). The degree of orthogonality among columns of D indicates the extent to which it is possible to find a unique local minimum to J(y) in the vicinity of the point at which D is evaluated. A high ‘condition number’ (ratio of largest to smallest eigenvalue) for D indicates that some linear combination(s) of the columns of D are nearly zero, that is, that the curvature is nearly zero in some direction(s), so that the minimization problem is ill-conditioned, as in the case of a valley with a flat floor. Given these considerations, classes of method for finding the minimum in J(y) include the following. 1. Analytic solution is possible when the observation operator h(y) is linear (z 5 Hy 1 noise). In this case J(y) is a quadratic form shaped like a parabolic bowl, and the minimization can be carried out analytically as in the example of Eqns (4) and (5). This ‘direct’ or ‘onestep’ solution is highly efficient when applicable; however, most problems are nonlinear and require a nonlinear method. 2. Gradient descent algorithms are the most familiar search algorithms for nonlinear optimization. They include (for example) steepest-descent, conjugate-gradient, quasi-Newton and Levenberg–Marquardt algorithms (Press et al., 1992). Gradient-descent methods are easily implemented, provided that the gradient vector HyJ 5 @J/@yk can be calculated. The main advantages of gradient-descent algorithms are relative simplicity and low cost; the main disadvantage is that if the surface J(y) has multiple minima, they tend to find local MODEL –DATA SYNTHESIS IN TERRESTRIAL CARBON OBSERVATION 383 r 2005 Blackwell Publishing Ltd, Global Change Biology, 11, 378–397
384M.R.RAUPACHetal.StateStateminima near thestartingvalue of yrather than theObservablevariableuncertaintyglobal minimum.3.Global searchmethods find theglobal minimum inaTime step n-ja-1[Cov yn-]2~1(posterior)function J(y) by searching (effectively) the whole of y1olo.oPredictionspace.They overcome the local-minimum pitfall (so tostepTimestepnspeak)of gradient-descent methods, but have the2Hyn[Covy"(prior)disadvantage of higher computational costs. SimulatedH,Rannealing and genetic algorithms are two examples.AnalysisdifferenThesemethodsareefficient atfindingthevicinityof asteps个global minimumwherethere may be multiplelocal-Time step nzynCovynminima, but do not locate an exact local minimum.(posterior)They may be combined withgradient-descent methodsFig.1 Information flow in the linear Kalman filter. Linearfor finding an exact global minimum once in the rightoperators are indicated nexttoarrows.Operationsinpredictionvicinity.and analysis steps are shown as dashed and solid lines,respectively.Search strategies for sequential problemsIn sequential problems, the task is to solve fora set ofvariablesy from one time step tothe next, by atarget variables y"associated with a particular timelinearized version of the model.The second term (Q) isstep, usually including the state variables of thethecovarianceofthenoiseterminthedvnamicmodeldynamic model (x").Theprocess is then repeatedEqn (1),which includes both model imperfections andsequentially to give a time history fory.Informationstochastic variability in forcings and parameters. Thisabout y" can come from two sources:evolution of theterm plays a crucial role in the Kalman filter: itdynamic model from the previous time step, andquantifies our lack of confidence in the ability of thecomparison between the observations at the currentdynamic model to propagate the model state, and istime step (z")and the model predictions (h(y").usually referred to as model error. In most imple-mentations of theKalman filter themodel errorisKalmanfilter.IntroducedbyKalman (1960),theKalmanassumed to be Gaussian with zero mean and notem-filter is by nowa group of algorithmsfor the sequentialporal correlation, and thus completely characterized bycombinationofdynamicandobservationalthecovariancematrixQ.information, using a ‘prediction' step and anIn the analysis step,the prior estimates are refined'analysis'step.In the prediction step,the dynamicby the inclusion of data. This is done using the priormodel is used to calculate prior estimates y for theestimateforthepredictedobservationvector,target variables at time step n,from the best (posterior)2"=h(y"), and its covarianceestimatesyatthepreviousstep.Intheanalysisstep[Cov 2"] = H[Cov y"JHT + R,(8)posterior estimates y at step n are obtained by"improvingtheprior estimateswith data.Themodelwhere H=Oh/oyistheJacobian matrix of thestate is then ready for evolution to the next (n+1) timeobservation model h(y), and R is the data covariancestep.Akeypoint isthatthe confidence in the currentmatrix [Covz], indicating lack of confidence in the datastate, embodied in the error covariance for the targetand often called the data error. Again it is usuallyvariables y,isalso evolved withthe dynamicmodel andassumed that thedata erroris Gaussianwith zeromeanimproved with observations.A schematic diagram ofand notemporal correlation,and thus completelythe information flow in the Kalman filter is given in Fig.characterized by R=[Covz].1.Theexpressionsforthefinal (posterior)estimatesforIntheprediction step,thetaskof evolving yisyand itscovariancearenowexactlyasforthecommon acrossall implementations of the Kalmannonsequential mode, except that the operation isfilter since it involves only a normal forward step of thecarried out for onetimestep only:dynamic model:y"=(y").Theprior estimatefor" -y" +[Covy"jH[Cov2"-(" - h(")the covariance at time step n evolves according to(9)=y" +K(z" - h(y"),n-ljoT +Q,[Cov y"] = @[Cov y"(7)where@=p/oy,theJacobianmatrix of thedynamic[Covy"]-} =[Cov y"]-" + H'RH or(10)model p(y).The first term on the right represents the[Cov y"] =(I - KH)[Cov y' ],propagation of the error covariance in the target2005 Blackwell Publishing Ltd, Global Change Biology,11, 378-397
minima near the starting value of y rather than the global minimum. 3. Global search methods find the global minimum in a function J(y) by searching (effectively) the whole of y space. They overcome the local-minimum pitfall (so to speak) of gradient-descent methods, but have the disadvantage of higher computational costs. Simulated annealing and genetic algorithms are two examples. These methods are efficient at finding the vicinity of a global minimum where there may be multiple local minima, but do not locate an exact local minimum. They may be combined with gradient-descent methods for finding an exact global minimum once in the right vicinity. Search strategies for sequential problems In sequential problems, the task is to solve for a set of target variables yn associated with a particular time step, usually including the state variables of the dynamic model (xn ). The process is then repeated sequentially to give a time history for yn . Information about yn can come from two sources: evolution of the dynamic model from the previous time step, and comparison between the observations at the current time step (zn ) and the model predictions (h(yn )). Kalman filter. Introduced by Kalman (1960), the Kalman filter is by now a group of algorithms for the sequential combination of dynamic and observational information, using a ‘prediction’ step and an ‘analysis’ step. In the prediction step, the dynamic model is used to calculate prior estimates y _n for the target variables at time step n, from the best (posterior) estimates y ^n1 at the previous step. In the analysis step, posterior estimates y ^n at step n are obtained by ‘improving’ the prior estimates with data. The model state is then ready for evolution to the next (n 1 1) time step. A key point is that the confidence in the current state, embodied in the error covariance for the target variables y, is also evolved with the dynamic model and improved with observations. A schematic diagram of the information flow in the Kalman filter is given in Fig. 1. In the prediction step, the task of evolving y is common across all implementations of the Kalman filter since it involves only a normal forward step of the dynamic model: y _n ¼ u y ^n1 . The prior estimate for the covariance at time step n evolves according to ½Cov y _n ¼ U½Cov y ^n1 UT þ Q; ð7Þ where U 5 @u/@y, the Jacobian matrix of the dynamic model u(y). The first term on the right represents the propagation of the error covariance in the target variables y from one time step to the next, by a linearized version of the model. The second term (Q) is the covariance of the noise term in the dynamic model, Eqn (1), which includes both model imperfections and stochastic variability in forcings and parameters. This term plays a crucial role in the Kalman filter: it quantifies our lack of confidence in the ability of the dynamic model to propagate the model state, and is usually referred to as model error. In most implementations of the Kalman filter the model error is assumed to be Gaussian with zero mean and no temporal correlation, and thus completely characterized by the covariance matrix Q. In the analysis step, the prior estimates are refined by the inclusion of data. This is done using the prior estimate for the predicted observation vector, z _n ¼ hðy _n Þ, and its covariance ½Cov z _n ¼ H½Cov y _n HT þ R; ð8Þ where H 5 @h/@y is the Jacobian matrix of the observation model h(y), and R is the data covariance matrix [Cov z], indicating lack of confidence in the data and often called the data error. Again it is usually assumed that the data error is Gaussian with zero mean and no temporal correlation, and thus completely characterized by R 5 [Cov z]. The expressions for the final (posterior) estimates for y and its covariance are now exactly as for the nonsequential mode, except that the operation is carried out for one time step only: y ^n ¼y _n þ ½Cov y _n HT½Cov z _n 1 ðzn hðy _n ÞÞ ¼y _n þ Kðzn hðy _n ÞÞ; ð9Þ ½Cov y ^n 1 ¼½Cov y _n 1 þ HTRH or ½Cov y ^n ¼ðI KHÞ½Cov y _n ; ð10Þ K H H Analysis steps zn−1 yn−1 zn zn yn yn Observable State variable State uncertainty Φ Φ, Q H, R difference Prediction step Time step n−1 (posterior) Time step n (prior) Time step n (posterior) Cov yn−1 Cov yn Cov yn Fig. 1 Information flow in the linear Kalman filter. Linear operators are indicated next to arrows. Operations in prediction and analysis steps are shown as dashed and solid lines, respectively. 384 M. R. RAUPACH et al. r 2005 Blackwell Publishing Ltd, Global Change Biology, 11, 378–397