当前位置：和泉文库 > 数学 > 浏览文档

《模式识别》课程教学资源（书籍文献）Introduction to Support Vector Learning

文件格式：PDF，文件大小：394.22KB，售价：11.66元

文档详细内容（约41页）

96 EmpilCaReults2 ImpienEtaions2 chd FulhEIDecpmets 0 196 Em pirical Results:Im plem entations:and Further Developm ents Hoving described the basics of SV machines'we now summarize empirical findings and theoretical dev elopments whic were to follow:We cannot report all contri5 butions that have advanced the state of the art in sV learning since the time the algorithm was first prop osed:Not even the present book can do this job'let alone a single section:Presently'we merely give a concise overview: By the use of kernels'the optimal margin classifier was turned into a classifier which b ecame a serious comp etitor of high performance cassifiers:Surprisingly' it was noticed that when di-erent kernel functions are used in SV machines ,specifically',(2(.',(:30.'and (:3(..'they lead to very similar dassification accuracies and SV sets,Scholkopf et al:'(115.:In this sense'the SV set seems to caracterize,or compress.the given task in a manner which up to a certain degree is indep endent of the type of kerndl,i:e:the type of cassifier.used: mitial work at AT&T Bell Labs focused on OCR,optical caracter recognition.' a problem where the two main issues are classification accuracy and classification speed:Consequently'some eort went into the improvement of SV machines on these issues'leading to the Virtual SV method for incorporating prior knowledge ab out transformation inv ariances by transforming sVs'and the reduced set meth od for speeding up dassification:This way'SV machines became competitive with the best av ail able dlassifiers on both OCR and object recognition tasks;Scholkopf et al: (116a Burges'(116;Burges and Scholkopf(119;Scholkopf(119.:Twoyears later' the ab ove are still topics of ongoing researc'as shown by chapter (6 and,Scolkopf et al:'(118b.'prop osing alternative Reduced Set methods'as well as by chapter 9 and;Scolkopf et al:'(118d.'constructing kerne functions which incorporate prior knowledge ab out a given problem: Another initial weakness of SV machines'less apparent in OCR applications which are characterized by low noise levels'was that the size of the quadratic programming problem scaled with the number of Support Vectors:This was due to the fact that in ,(:33.'the quadratic part contained at least all SVs-the common practice was to extract the Svs by going through the training data in chunks while regularly testing for the possibility that some of the patterns that were initially not identified as svs turn out to become svs at a later stage note that without chunking'the sie of the matrix would be<where is the numb er of all training xamples.:What happens if we have a high noise problem?In this case'many of the slac variables &i will become nonzero'and all the corresponding examples will become SVs:For this case'a decomposition algorithm was proposed,Osuna et al:'(119a.'whic is based on the observation that not only can we leave out the nonasv examples,i:e:the xi with 3i =0. from the current chunk'but also some of the SVs'especially those that hit the upper boundary i:e:3i C.:In fact'one can use chunks which do not even contain all SVs'and maximize over the corresponding subproblems:Chapter (2 explores an extreme case'where the sub problems are chosen so small that one 1998/08/251631

Empirical Results Implementations and Further Developments Empirical Results Implementations and Further Developments Having described the basics of SV machines we now summarize empirical ndings and theoretical developments which were to follow We cannot report all contri butions that have advanced the state of the art in SV learning since the time the algorithm was rst proposed Not even the present book can do this job let alone a single section Presently we merely give a concise overview By the use of kernels the optimal margin classi er was turned into a classi er which became a serious competitor of highperformance classi ers Surprisingly it was noticed that when di erent kernel functions are used in SV machines speci cally and they lead to very similar classi cation accuracies and SV sets Scholkopf et al In this sense the SV set seems to characterize or compress the given task in a manner which up to a certain degree is independent of the type of kernel ie the type of classi er used Initial work at AT$T Bell Labs focused on OCR optical character recognition a problem where the two main issues are classi cation accuracy and classi cation speed Consequently some e ort went into the improvement of SV machines on these issues leading to the Virtual SV method for incorporating prior knowledge about transformation invariances by transforming SVs and the Reduced Set method for speeding up classi cation This way SV machines became competitive with the best available classi ers on both OCR and ob ject recognition tasks Scholkopf et al a Burges Burges and Scholkopf Scholkopf Two years later the above are still topics of ongoing research as shown by chapter and Scholkopf et al b proposing alternative Reduced Set methods as well as by chapter and Scholkopf et al d constructing kernel functions which incorporate prior knowledge about a given problem Another initial weakness of SV machines less apparent in OCR applications which are characterized by low noise levels was that the size of the quadratic programming problem scaled with the number of Support Vectors This was due to the fact that in the quadratic part contained at least all SVs the common practice was to extract the SVs by going through the training data in chunks while regularly testing for the possibility that some of the patterns that were initially not identi ed as SVs turn out to become SVs at a later stage note that without chunking the size of the matrix would be where is the number of all training examples What happens if we have a highnoise problem In this case many of the slack variables i will become nonzero and all the corresponding examples will become SVs For this case a decomposition algorithm was proposed Osuna et al a which is based on the observation that not only can we leave out the nonSV examples ie the xi with i from the current chunk but also some of the SVs especially those that hit the upper boundary ie i C In fact one can use chunks which do not even contain all SVs and maximize over the corresponding subproblems Chapter explores an extreme case where the subproblems are chosen so small that one

点击进入文档下载页（PDF格式）

共41页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录