function approximation. Finally, we discuss issues in the choice of the data that we use to construct the approximators, discuss the incorporation of linguistic information, and provide an example of how to construct a data set for a parameter estimation problem 3.2.1 The Function Approximation Problem Given some function where x o" and y c, we wish to construct a fuzzy system f: X where X cx and Yc y are some domain and range of interest, by choosing a parameter vector 0(which may include membership function centers, widths, etc. )so that g(x)=f(x|0)+e(x) (3.1) for all x=[x,x,,]'eX where the approximation error e(x)is as small as possible. If we want to refer to the input at time k, we will use x(k) for the vector and x;i (k) for its "component Assume that all that is available to choose the parameters e of the fuzzy system f(re) is some part of the function g in the form of a finite set of input-output data pairs (i.e, the functional mapping implemented by g is largely unknown). The ith input-output data pair from the system g is denoted by (x', y) where xEX, yEr, and y=g(x'). We let x'=[x, x,.,x,' represent the input vector for the i"data pair. Hence, x, is the j element of the ith data vector(it has a specific value and is
function approximation. Finally, we discuss issues in the choice of the data that we use to construct the approximators , discuss the incorporation of linguistic information, and provide an example of how to construct a data set for a parameter estimation problem. 3.2.1 The Function Approximation Problem Given some function gx y : → where n x ⊂ ℜ and y ⊂ ℜ, we wish to construct a fuzzy system f : X Y → where X ⊂ x and are some domain and range of interest, by choosing a parameter vector Y y ⊂ θ (which may include membership function centers, widths, etc.) so that gx f x ex () ( ) () = + θ (3.1) for all 1 2 [ , ,..., ]T n x = xx x X ∈ where the approximation error e(x) is as small as possible. If we want to refer to the input at time k, we will use x(k) for the vector and xj(k) for its j'h component. Assume that all that is available to choose the parameters θ of the fuzzy system f (x θ) is some part of the function g in the form of a finite set of input-output data pairs (i.e., the functional mapping implemented by g is largely unknown). The input-output data pair from the system g is denoted by ( , th i ) i i x y where i x ∈ X , i y ∈Y , and . We let i = g(x ) i y 1 2 [ , ,..., ] i ii i n T x = xx x represent the input vector for the i"1 data pair. Hence, i j x is the j'h element of the data vector (it has a specific value and is th i
not a variable). We call the set of input-output data pairs the training data set and denote it by G={(x,y2),(x,y)}cX×Y (3.2) where M denotes the number of input-output data pairs contained in g For convenience, we will sometimes use the notation d(i) for data pair To get a graphical picture of the function approximation problem, see Figure 3. 1. This clearly shows the challenge; it can certainly be hard to come up with a good function f to match the mapping g when we know only a little bit about the association between X and y in the form of data pairs G. Moreover, it may be hard to know when we have a good approximation-that is, when f approximates g over the whole space of nputs X FIGURE 3. I Function mapping with three known input-output data pairs To make the function approximation problem even more concrete consider a simple example. Suppose that n=2, XC2, Y=[0, 101, and g:X-Y. Let M=3 and the training data set
not a variable). We call the set of input-output data pairs the training data set and denote it by 1 1 {( , ),...,( , )} M M G xy x y X = ⊂ ×Y (3.2) where M denotes the number of input-output data pairs contained in G. For convenience, we will sometimes use the notation d i( ) for data pair (, ) i i x y . To get a graphical picture of the function approximation problem, see Figure 3.1. This clearly shows the challenge; it can certainly be hard to come up with a good function f to match the mapping g when we know only a little bit about the association between X and Y in the form of data pairs G. Moreover, it may be hard to know when we have a good approximation—that is, when f approximates over the whole space of inputs X. g FIGURE 3.1 Function mapping with three known input-output data pairs. To make the function approximation problem even more concrete, consider a simple example. Suppose that , , Y = [0, 10], and . Let M = 3 and the training data set n=2 2 X ⊂ ℜ g : X Y →
(3.3) 6 which partially specifies g as shown in Figure 3. 2. The function approximation problem amounts to finding a function f(ro) by manipulating 0 so that f(re) approximates g as closely as possible We will use this simple data set to illustrate several of the methods we develop in this chapter How do we evaluate how closely a fuzzy system f(ra) approximates the function g(x) for all xe X for a given Notice that sup(x)-f(lo) (34) is a bound on the approximation error (if it exists). However, specification of such a bound requires that the function g be completely known; however, as stated above, we know only a part of g given by the finite set G. Therefore, we are only able to evaluate the accuracy of approximation by evaluating the error between g(x) and f(re) at certain points xe X given by available input-output data. We call this set of input-output data the test set and denote it as r, where FIGURE 3. 2 The training data G generated from the function g
0 23 ,1 , ,5 ,6 2 46 G ⎧ ⎫ ⎪ ⎪ ⎛ ⎞ ⎛ ⎞⎛ ⎡⎤ ⎡⎤ ⎡⎤ = ⎨⎜ ⎟ ⎜ ⎟⎜ ⎢⎥ ⎢⎥ ⎢⎥ ⎪ ⎪ ⎩ ⎭ ⎝ ⎠ ⎝ ⎠⎝ ⎣⎦ ⎣⎦ ⎣⎦ ⎞ ⎟⎬ ⎠ (3.3) which partially specifies g as shown in Figure 3.2. The function approximation problem amounts to finding a function f (x θ) by manipulating θ so that f (x θ) approximates g as closely as possible. We will use this simple data set to illustrate several of the methods we develop in this chapter. How do we evaluate how closely a fuzzy system f ( ) x θ approximates the function g (x) for all x∈ X for a givenθ ? Notice that sup{ () ( )} x X gx f x θ ∈ − (3.4) is a bound on the approximation error (if it exists). However, specification of such a bound requires that the function g be completely known; however, as stated above, we know only a part of g given by the finite set G. Therefore, we are only able to evaluate the accuracy of approximation by evaluating the error between g(x) and f (x θ) at certain points x∈ X given by available input-output data. We call this set of input-output data the test set and denote it as Γ , where 0 2 x 1234567 1 x 1 2 3 4 5 6 7 0 1234567 y FIGURE 3.2 The training data G generated from the function g
Here, Mr denotes the number of known input-output data pairs contained within the test set. It is important to note that the input-output data pairs (x, y)contained in r may not be contained in G, or vice versa. It also might be the case that the test set is equal to the training set (G=r) however, this choice is not al ways a good one. Most often you will want to test the system with at least some data that were not used to construct f(re) since this will often provide a more realistic assessment of the quality of the approximation We see that evaluation of the error in approximation between g and a fuzzy system f(lo) based on a test set F may or may not be a true measure of the error between g and f for every xEX, but it is the only evaluation we can make based on known information. Hence you can use measures like ∑(g(x)-f(x) (3.6) or sup ig(x)-f(re (3.7) (r, her to measure the approximation error. Accurate function approximation requires that some expression of this nature be small; however, this clearly does not guarantee perfect representation of g with f since most often we cannot test that f matches g over all possible input points We would like to emphasize that the type of function that you
Here, MΓ denotes the number of known input-output data pairs contained within the test set. It is important to note that the input-output data pairs (, ) i i x y contained in Γ may not be contained in G, or vice versa. It also might be the case that the test set is equal to the training set (G = ) Γ ; however, this choice is not always a good one. Most often you will want to test the system with at least some data that were not used to construct f ( ) x θ since this will often provide a more realistic assessment of the quality of the approximation. We see that evaluation of the error in approximation between g and a fuzzy system f (x θ) based on a test set F may or may not be a true measure of the error between g and f for every x∈X, but it is the only evaluation we can make based on known information. Hence, you can use measures like ( ( ) ( )) 2 (,) i i i i x y gx f x θ ∈Γ ∑ − (3.6) or { } (,) sup () ( ) i i x y gx f x θ ∈Γ − (3.7) to measure the approximation error. Accurate function approximation requires that some expression of this nature be small; however, this clearly does not guarantee perfect representation of g with f since most often we cannot test that f matches g over all possible input points. We would like to emphasize that the type of function that you
choose to adjust (i. e, f(x 0)) can have a significant impact on the ultimate accuracy of the approximator. For instance, it may be that a Takagi-Sugeno (or functional) fuzzy system will provide a better approximator than a standard fuzzy system for a particular application We think of f(x0)as a structure for an approximator that is parameterized by 6. In this chapter we will study the use of fuzz systems as approximators, and use a fuzzy system as the structure for the approximator. The choice of the parameter vector 0 depends on,for example, how many membership functions and rules you use. Generally you want enough membership functions and rules to be able to get good accuracy, but not too many since if your function is"overparameterized this can actually degrade approximation accuracy. Often, it is best if the structure of the approximator is based on some physical knowledge of the system, as we explain how to do in Section 3. 2. 4 on page 228 Finally, while in this book we focus primarily on fuzzy systems(or if you understand neural networks you will see that several of the methods of this chapter directly apply to those also), at times it may be beneficial to use other approximation structures such as neural networks polynomials, wavelets, or splines(see Section 3. 10 For Further Study on page 287) 3.2.2 Relation to ldentification estimation and prediction Many applications exist in the control and signal processing areas that
choose to adjust (i.e., f (x θ)) can have a significant impact on the ultimate accuracy of the approximator. For instance, it may be that a Takagi-Sugeno (or functional) fuzzy system will provide a better approximator than a standard fuzzy system for a particular application. We think of f (x θ) as a structure for an approximator that is parameterized by θ . In this chapter we will study the use of fuzzy systems as approximators, and use a fuzzy system as the structure for the approximator. The choice of the parameter vector θ depends on, for example, how many membership functions and rules you use. Generally, you want enough membership functions and rules to be able to get good accuracy, but not too many since if your function is "overparameterized" this can actually degrade approximation accuracy. Often, it is best if the structure of the approximator is based on some physical knowledge of the system, as we explain how to do in Section 3.2.4 on page 228. Finally, while in this book we focus primarily on fuzzy systems (or, if you understand neural networks you will see that several of the methods of this chapter directly apply to those also), at times it may be beneficial to use other approximation structures such as neural networks, polynomials, wavelets, or splines (see Section 3.10 "For Further Study," on page 287). 3.2.2 Relation to Identification, Estimation, and Prediction Many applications exist in the control and signal processing areas that