A Startup Identification Procedure Measured Validation Data output and the arx and state-space models simulated outputs If these agreements are reasonable, the problem is not so difficult, and a relatively simple linear model will do a good job. Some fine tuning of model orders, and noise models have to be made and you can proceed to Step 4 Otherwise go to Step 3 Step 3: Examining the Difficulties There may be several reasons why the comparisons in Step 2 did not look good This section discusses the most common ones, and how they can be handled. Model unstable The ARX or state-space model may turn out to be unstable, but could still be useful for control purposes. Change to a 5-or 10-step ahead prediction instead of simulation in the Model Output View Feedback in data If there is feedback from the output to the input, due to some regulator, then the spectral and correlations analysis estimates are not reliable. Discrepancies between these estimates and the arX and state-space models can therefore be disregarded in this case. In the Model Residuals View of the parametric models. feedback in data can also be visible as correlation between residuals and input for negative lags Disturbance model If the state-space model is clearly better than the arx model at reproducing the measured output, this is an indication that the disturbances have a substantial influence, and it will be necessary to model them carefully Model order If a fourth order model does not give a good Model Output plot, try eighth order. If the fit clearly improves, it follows that higher order models will b required, but that linear models could be sufficient Additional Inputs If the Model Output fit has not significantly improved by the tests so far, think over the physics of the application. Are there more signals that have been, or 1-15
A Startup Identification Procedure 1-15 • Measured Validation Data output and the ARX and state-space models’ simulated outputs If these agreements are reasonable, the problem is not so difficult, and a relatively simple linear model will do a good job. Some fine tuning of model orders, and noise models have to be made and you can proceed to Step 4. Otherwise go to Step 3. Step 3: Examining the Difficulties There may be several reasons why the comparisons in Step 2 did not look good. This section discusses the most common ones, and how they can be handled. Model Unstable The ARX or state-space model may turn out to be unstable, but could still be useful for control purposes. Change to a 5- or 10-step ahead prediction instead of simulation in the Model Output View. Feedback in Data If there is feedback from the output to the input, due to some regulator, then the spectral and correlations analysis estimates are not reliable. Discrepancies between these estimates and the ARX and state-space models can therefore be disregarded in this case. In the Model Residuals View of the parametric models, feedback in data can also be visible as correlation between residuals and input for negative lags. Disturbance Model If the state-space model is clearly better than the ARX model at reproducing the measured output, this is an indication that the disturbances have a substantial influence, and it will be necessary to model them carefully. Model Order If a fourth order model does not give a good Model Output plot, try eighth order. If the fit clearly improves, it follows that higher order models will be required, but that linear models could be sufficient. Additional Inputs If the Model Output fit has not significantly improved by the tests so far, think over the physics of the application. Are there more signals that have been, or
1 The System Identification Problem could be, measured that might influence the output? If so, include these among the inputs and try again a fourth order Arx model from all the inputs. (Note that the inputs need not at all be control signals, anything measurable including disturbances, should be treated as inputs Nonlinear Effects If the fit between measured and model output is still bad, consider the physics of the application. Are there nonlinear effects in the system? In that case, form the nonlinearities from the measured data and add those transformed measurements as extra inputs. This could be as simple as forming the product of voltage and current measurements, if you realize that it is the electrical power that is the driving stimulus in, say, a heating process, and temperature is the output. This is of course application dependent. It does not take very much work, however, to form a number of additional inputs by reasonable nonlinear transformations of the measured ones, and just test if inclusion of them improves the fit Still Problems? If none of these tests leads to a model that is able to reproduce the validation Data reasonably well, the conclusion might be that a sufficiently good model cannot be produced from the data. There may be many reasons for this. It may be that the system has some quite complicated nonlinearities, which cannot be realized on physical grounds. In such cases, nonlinear, black-box models could be a solution. Among the most used models of this character are the Artificial Neural Networks (ANN) Another important reason is that the data simply do not contain sufficient information, e.g., due to bad signal to noise ratios, large and nonstationary disturbances, varying system properties, etc Otherwise, use the insights of which inputs to use and which model orders to expect and proceed to Step 4 Step 4: Fine tuning orders and Disturbance Structures For real data there is no such thing as a"correct model structure. " However, different structures can give quite different model quality. The only way to find this out is to try out a number of different structures and compare the 1-16
1 The System Identification Problem 1-16 could be, measured that might influence the output? If so, include these among the inputs and try again a fourth order ARX model from all the inputs. (Note that the inputs need not at all be control signals, anything measurable, including disturbances, should be treated as inputs). Nonlinear Effects If the fit between measured and model output is still bad, consider the physics of the application. Are there nonlinear effects in the system? In that case, form the nonlinearities from the measured data and add those transformed measurements as extra inputs. This could be as simple as forming the product of voltage and current measurements, if you realize that it is the electrical power that is the driving stimulus in, say, a heating process, and temperature is the output. This is of course application dependent. It does not take very much work, however, to form a number of additional inputs by reasonable nonlinear transformations of the measured ones, and just test if inclusion of them improves the fit. Still Problems? If none of these tests leads to a model that is able to reproduce the Validation Data reasonably well, the conclusion might be that a sufficiently good model cannot be produced from the data. There may be many reasons for this. It may be that the system has some quite complicated nonlinearities, which cannot be realized on physical grounds. In such cases, nonlinear, black-box models could be a solution. Among the most used models of this character are the Artificial Neural Networks (ANN). Another important reason is that the data simply do not contain sufficient information, e.g., due to bad signal to noise ratios, large and nonstationary disturbances, varying system properties, etc. Otherwise, use the insights of which inputs to use and which model orders to expect and proceed to Step 4. Step 4: Fine Tuning Orders and Disturbance Structures For real data there is no such thing as a “correct model structure.” However, different structures can give quite different model quality. The only way to find this out is to try out a number of different structures and compare the
A Startup Identification Procedure properties of the obtained models. There are a few things to look for in these Fit Between Simulated and Measured output Keep the Model Output View open and look at the fit between the model's simulated output and the measured one for the Validation Data. Formally, you could pick that model, for which this number is the highest. In practice it is better to be more pragmatic, and also take into account the model complexity and whether the important features of the output response are captured Residual Analysis Test You should require of a good model that the cross correlation function between residuals and input does not go significantly outside the confidence region Otherwise there is something in the residuals that originate from the input, and has not been properly taken care of by the model. a clear peak at lag k shows that the effect from input u(t-k)on y(t)is not correctly described. a rule of thumb is that a slowly varying cross correlation function outside the confidence region is an indication of too few poles, while sharper peaks indicate too few zeros or wrong delays Pole zero cancellations If the pole-zero plot (including confidence intervals) indicates pole-zere cancellations in the dynamics, this suggests that lower order models can be used. In particular, if it turns out that the orders of ARX models have to be increased to get a good fit, but that pole-zero cancellations are indicated, then the extra poles are just introduced to describe the noise. Then try ARMAX, OE, r BJ model structures with an A or F polynomial of an order equal to that of the number of noncanceled poles What model structures should be tested? Well, you can spend any amount of time to check out a very large number of structures. It often takes just a few seconds to compute and evaluate a model in a certain structure, so that you should have a generous attitude to the testing. However, experience shows that when the basic properties of the systems behavior have been picked up, it is not much use to fine tune orders in absurdum just to press the fit by fractions of percents Many ARX models: There is a very cheap way of testing many ARX structures simultaneously. Enter in the orders text field many combinations of orders 1-17
A Startup Identification Procedure 1-17 properties of the obtained models. There are a few things to look for in these comparisons. Fit Between Simulated and Measured Output Keep the Model Output View open and look at the fit between the model’s simulated output and the measured one for the Validation Data. Formally, you could pick that model, for which this number is the highest. In practice, it is better to be more pragmatic, and also take into account the model complexity, and whether the important features of the output response are captured. Residual Analysis Test You should require of a good model that the cross correlation function between residuals and input does not go significantly outside the confidence region. Otherwise there is something in the residuals that originate from the input, and has not been properly taken care of by the model. A clear peak at lag k shows that the effect from input u(t-k) on y(t) is not correctly described. A rule of thumb is that a slowly varying cross correlation function outside the confidence region is an indication of too few poles, while sharper peaks indicate too few zeros or wrong delays. Pole Zero Cancellations If the pole-zero plot (including confidence intervals) indicates pole-zero cancellations in the dynamics, this suggests that lower order models can be used. In particular, if it turns out that the orders of ARX models have to be increased to get a good fit, but that pole-zero cancellations are indicated, then the extra poles are just introduced to describe the noise. Then try ARMAX, OE, or BJ model structures with an A or F polynomial of an order equal to that of the number of noncanceled poles. What Model Structures Should be Tested? Well, you can spend any amount of time to check out a very large number of structures. It often takes just a few seconds to compute and evaluate a model in a certain structure, so that you should have a generous attitude to the testing. However, experience shows that when the basic properties of the system’s behavior have been picked up, it is not much use to fine tune orders in absurdum just to press the fit by fractions of percents. Many ARX models: There is a very cheap way of testing many ARX structures simultaneously. Enter in the Orders text field many combinations of orders
1 The System Identification Problem sing the colon(: ")notation. You can also press the Order Selection button. When you select Estimate, models for all combinations(easily several hundreds) are computed and their (prediction error)fit to Validation Data is shown in a special plot. By clicking in this plot the best models with any chosen number of parameters will be inserted into the Model Board, and evaluated as Many State-space models: A similar feature is also available for black-box state-space models, estimated using nasi d. When a good order has been found, try the PEM estimation method, which often improves on the accuracy ARMAX, OE, and BJ models: Once you have a feel for suitable delays and dynamics orders, if is often useful to try out ARMAX, OE, and/or BJ with these orders and test some different orders for the disturbance transfer functions(C nd D). Especially for poorly damped systems, the OE structure is suitabl There is a quite extensive literature on order and structure selection, and anyone who would like to know more should consult the references. Multivariable Systems Systems with many input signals and/or many output signals are called multivariable. Such systems are often more challenging to model. In particular systems with several outputs could be difficult. a basic reason for the difficulties is that the couplings between several inputs and outputs lead to more complex models. The structures involved are richer and more parameters will be required to obtain a good fit Available models The System Identification Toolbox as well as the GUI handle general, linear multivariable models. All earlier mentioned models are supported in the single output, multiple input case For multiple outputs, ARX models and state-space models are covered. Multi-output ARMAX and OE models are covered via state-space representations: ARMAX corresponds to estimating the K-matrix, while oe corresponds to fixing K to zero. (These are pop-up options in the gUI Generally speaking, it is preferable to work with state-space models in the multivariable case, since the model structure complexity is easier to deal with It is essentially just a matter of choosing the model order. 1-18
1 The System Identification Problem 1-18 using the colon (“:”) notation. You can also press the Order Selection button. When you select Estimate, models for all combinations (easily several hundreds) are computed and their (prediction error) fit to Validation Data is shown in a special plot. By clicking in this plot the best models with any chosen number of parameters will be inserted into the Model Board, and evaluated as desired. Many State-space models: A similar feature is also available for black-box state-space models, estimated using n4sid. When a good order has been found, try the PEM estimation method, which often improves on the accuracy. ARMAX, OE, and BJ models: Once you have a feel for suitable delays and dynamics orders, if is often useful to try out ARMAX, OE, and/or BJ with these orders and test some different orders for the disturbance transfer functions (C and D). Especially for poorly damped systems, the OE structure is suitable. There is a quite extensive literature on order and structure selection, and anyone who would like to know more should consult the references. Multivariable Systems Systems with many input signals and/or many output signals are called multivariable. Such systems are often more challenging to model. In particular systems with several outputs could be difficult. A basic reason for the difficulties is that the couplings between several inputs and outputs lead to more complex models. The structures involved are richer and more parameters will be required to obtain a good fit. Available Models The System Identification Toolbox as well as the GUI handle general, linear multivariable models. All earlier mentioned models are supported in the single output, multiple input case. For multiple outputs, ARX models and state-space models are covered. Multi-output ARMAX and OE models are covered via state-space representations: ARMAX corresponds to estimating the K-matrix, while OE corresponds to fixing K to zero. (These are pop-up options in the GUI model order editor.) Generally speaking, it is preferable to work with state-space models in the multivariable case, since the model structure complexity is easier to deal with. It is essentially just a matter of choosing the model order
A Startup Identification Procedure Working with Subsets of the Input-Output Channels In the process of identifying good models of a system, it is often useful to select subsets of the input and output channels. Partial models of the systems behavior will then be constructed. It might not, for example, be clear if all measured inputs have a significant influence on the outputs That is most easily tested by removing an input channel from the data, building a model for how the output(s) depends on the remaining input channels, and checking if there is a significant deterioration in the model output's fit to the measured one. See also the discussion under Step 3 above Generally speaking, the fit gets better when more inputs are included and often gets worse when more outputs are included. To understand the latter fact, you should realize that a model that has to explain the behavior of several outputs has a tougher job than one that just must account for a single output If you have difficulties obtaining good models for a multi-output system, it might be wise to model one output at a time to find out which are the difficult ones to handle Models that are just to be used for simulations could very well be built up from single-output models, for one output at a time. However, models for prediction and control will be able to produce better results if constructed for all outputs simultaneously. This follows from the fact that knowing the set of all previous output channels gives a better basis for prediction, than just knowing the past utputs in one channel. Also, for systems, where the different outputs reflect similar dynamics, using several outputs simultaneously will help estimating the dynan Some practical Advice Both the GUI and command line operation will do useful bookkeeping for you, ing different ch annels. You could follow the steps of this agenda Import data and create a data set with all input and output channels of interest. Do the necessary preprocessing of this set in terms of detrending, etc. and then select a validation Data set with all channels Then select a Working Data set with all channels, and estimate state-space models of different orders using nasi d for these data. Examine the resulting model primarily using the Model Output view If it is difficult to get a good fit in all output channels or you would like to investigate how important the different input channels are, construct n 1-19
A Startup Identification Procedure 1-19 Working with Subsets of the Input-Output Channels In the process of identifying good models of a system, it is often useful to select subsets of the input and output channels. Partial models of the system’s behavior will then be constructed. It might not, for example, be clear if all measured inputs have a significant influence on the outputs. That is most easily tested by removing an input channel from the data, building a model for how the output(s) depends on the remaining input channels, and checking if there is a significant deterioration in the model output’s fit to the measured one. See also the discussion under Step 3 above. Generally speaking, the fit gets better when more inputs are included and often gets worse when more outputs are included. To understand the latter fact, you should realize that a model that has to explain the behavior of several outputs has a tougher job than one that just must account for a single output. If you have difficulties obtaining good models for a multi-output system, it might be wise to model one output at a time, to find out which are the difficult ones to handle. Models that are just to be used for simulations could very well be built up from single-output models, for one output at a time. However, models for prediction and control will be able to produce better results if constructed for all outputs simultaneously. This follows from the fact that knowing the set of all previous output channels gives a better basis for prediction, than just knowing the past outputs in one channel. Also, for systems, where the different outputs reflect similar dynamics, using several outputs simultaneously will help estimating the dynamics. Some Practical Advice Both the GUI and command line operation will do useful bookkeeping for you, handling different channels. You could follow the steps of this agenda: • Import data and create a data set with all input and output channels of interest. Do the necessary preprocessing of this set in terms of detrending, etc., and then select a Validation Data set with all channels. • Then select a Working Data set with all channels, and estimate state-space models of different orders using n4sid for these data. Examine the resulting model primarily using the Model Output view. • If it is difficult to get a good fit in all output channels or you would like to investigate how important the different input channels are, construct new