TLFeBooK Prefa The World Wide Web(www) has changed the way people communicate with each other, how information is disseminated and retrieved and how business is conducted. The term Semantic Web comprises techniques that promise to dramatically improve the current Www and its use. This book is about this emerging technology. The success of each book should be judged against the authors' aims. This is an introductory textbook about the Semantic Web. Its main use will be to serve as the basis for university courses about the Semantic Web. It can also be used for self -study by anyone who wishes to learn about Semantic Web technologies The question arises whether there is a need for a textbook, given that all information is available online. We think there is a need because on the Web there are too many sources of varying quality and too much information Some information is valid, some outdated, some wrong, and most sources talk about obscure details. Anyone who is a newcomer and wishes to learn something about the Semantic Web, or who wishes to set up a course on the Semantic Web, is faced with these problems. This book is meant to help out. A textbook must be selective in the topics it covers. Particularly in a field as fast developing as this, a textbook should concentrate on fundamental aspects that can reasonably be expected to remain relevant some time into the future. But, of course, authors always have their personal bias Even for the topics covered, this book is not meant to be a reference work that describes every small detail. Long books have already been written on certain topics, such as XML. And there is no need for a reference work in e semantic Web area because all definitions and manuals are available on line. Instead, we concentrate on the main ideas and techniques and provide enough detail to enable readers to engage with the material constructively and to build applications of their own TLFeBOoK
Preface The World Wide Web (WWW) has changed the way people communicate with each other, how information is disseminated and retrieved, and how business is conducted. The term Semantic Web comprises techniques that promise to dramatically improve the current WWW and its use. This book is about this emerging technology. The success of each book should be judged against the authors’ aims. This is an introductory textbook about the Semantic Web. Its main use will be to serve as the basis for university courses about the Semantic Web. It can also be used for self -study by anyone who wishes to learn about Semantic Web technologies. The question arises whether there is a need for a textbook, given that all information is available online. We think there is a need because on the Web there are too many sources of varying quality and too much information. Some information is valid, some outdated, some wrong, and most sources talk about obscure details. Anyone who is a newcomer and wishes to learn something about the Semantic Web, or who wishes to set up a course on the Semantic Web, is faced with these problems. This book is meant to help out. A textbook must be selective in the topics it covers. Particularly in a field as fast developing as this, a textbook should concentrate on fundamental aspects that can reasonably be expected to remain relevant some time into the future. But, of course, authors always have their personal bias. Even for the topics covered, this book is not meant to be a reference work that describes every small detail. Long books have already been written on certain topics, such as XML. And there is no need for a reference work in the Semantic Web area because all definitions and manuals are available online. Instead, we concentrate on the main ideas and techniques and provide enough detail to enable readers to engage with the material constructively and to build applications of their own. TLFeBOOK TLFeBOOK
TLFeBOOK This way readers will be equipped with sufficient knowledge to easily get the remaining details from other sources. In fact, an annotated list of refer- ences is found at the end of each chapter Acknowledgments We thank Jeen Broekstra, Michel Klein, and Marta Sabou for pioneering much of this material in our course on Web-based knowledge representa tion at the Free University in Amsterdam, and Annette ten Teije, Zharko Aleksovski and Wouter Jansweijer for critically reading early versions of the manuscript We thank Christoph Grimmer and Peter Koenig for proofreading parts of the book and assisting with the creation of the figures and with LaTeX pro- cessIng Also, we wish to thank the MIT Press people for their professional assis- tance with the final preparation of the manuscript, and Christopher Manning for his ETEX 2g ma TLFebooK
xx Preface This way readers will be equipped with sufficient knowledge to easily get the remaining details from other sources. In fact, an annotated list of references is found at the end of each chapter. Acknowledgments We thank Jeen Broekstra, Michel Klein, and Marta Sabou for pioneering much of this material in our course on Web-based knowledge representation at the Free University in Amsterdam, and Annette ten Teije, Zharko Aleksovski and Wouter Jansweijer for critically reading early versions of the manuscript. We thank Christoph Grimmer and Peter Koenig for proofreading parts of the book and assisting with the creation of the figures and with LaTeX processing. Also, we wish to thank the MIT Press people for their professional assistance with the final preparation of the manuscript, and Christopher Manning for his LATEX 2ε macros. TLFeBOOK TLFeBOOK
TLFeBooK The semantic Web vision 1.1 Today s web The World Wide Web has changed the way people communicate with each other and the way business is conducted. It lies at the heart of a revolu- tion that is currently transforming the developed world toward a knowledge economy and, more broadly speaking, to a knowledge society This development has also changed the way we think of computers. Orig inally they were used for computing numerical calculations. Currently their predominant use is for information processing, typical applications being data bases, text processing, and games. At present there is a transition of focus towards the view of computers as entry points to the information high Most of todays Web content is suitable for human consumption. Even Web content that is generated automatically from databases is usuall presented without the original structural information found in databases Typical uses of the Web today involve people s seeking and making use of information, searching for and getting in touch with other people, review- ing catalogs of online stores and ordering products by filling out forms, and viewing adult material These activities are not particularly well supported by software tools Apart from the existence of links that establish connections between docu- ments, the main valuable, indeed indispensable, tools are search engines. Keyword-based search engines, such as Alta Vista, Yahoo, and Google, are the main tools for using today s Web. It is clear that the Web would not have been the huge success it was, were it not for search engines. However, there are serious problems associated with their use: TLFeBOoK
1 The Semantic Web Vision 1.1 Today’s Web The World Wide Web has changed the way people communicate with each other and the way business is conducted. It lies at the heart of a revolution that is currently transforming the developed world toward a knowledge economy and, more broadly speaking, to a knowledge society. This development has also changed the way we think of computers. Originally they were used for computing numerical calculations. Currently their predominant use is for information processing, typical applications being data bases, text processing, and games. At present there is a transition of focus towards the view of computers as entry points to the information highways. Most of today’s Web content is suitable for human consumption. Even Web content that is generated automatically from databases is usually presented without the original structural information found in databases. Typical uses of the Web today involve people’s seeking and making use of information, searching for and getting in touch with other people, reviewing catalogs of online stores and ordering products by filling out forms, and viewing adult material. These activities are not particularly well supported by software tools. Apart from the existence of links that establish connections between documents, the main valuable, indeed indispensable, tools are search engines. Keyword-based search engines, such as AltaVista, Yahoo, and Google, are the main tools for using today’s Web. It is clear that the Web would not have been the huge success it was, were it not for search engines. However, there are serious problems associated with their use: TLFeBOOK TLFeBOOK
TLFeBOOK High recall, low precision. Even if the main relevant pages are retrieved, they are of little use if another 28, 758 mildly relevant or irrelevant doc uments were also retrieved. Too much can easily become as bad as too Low or no recall. Often it happens that we don't get any answer for our request, or that important and relevant pages are not retrieved. Although low recall is a less frequent problem with current search engines, it does cur Results are highly sensitive to vocabulary. Often our initial keywords do not get the results we want; in these cases the relevant documents use dif ferent terminology from the original query. This is unsatisfactory because semantically similar queries should return similar results Results are single Web pages. If we need information that is spread over various documents, we must initiate several queries to collect the relevant documents, and then we must manually extract the partial information terestingly, despite improvements in search engine technology, the diffi- culties remain essentially the same. It seems that the amount of Web content es technological pre But even if a search is successful, it is the person who must browse selected documents to extract the information he is looking for. That is, there is not much support for retrieving the information, a very time-consuming activ ity. Therefore, the term information retrieval, used in association with search engines, is somewhat misleading; location finder might be a more appropri- ate term. Also, results of Web searches are not readily accessible by other software tools; search engines are often isolated applications. The main obstacle to providing better support to Web users is that, at present, the meaning of Web content is not machine-accessible. Of course, there are tools that can retrieve texts, split them into parts, check the spelling, count their words. But when it comes to interpreting sentences and extracting useful information for users, the capabilities of current software are still very limited. It is simply difficult to distinguish the meaning of i am a professor of computer science trom I am a professor of computer science, you may think Well TLFeBOoK
2 1 The Semantic Web Vision • High recall, low precision. Even if the main relevant pages are retrieved, they are of little use if another 28,758 mildly relevant or irrelevant documents were also retrieved. Too much can easily become as bad as too little. • Low or no recall. Often it happens that we don’t get any answer for our request, or that important and relevant pages are not retrieved. Although low recall is a less frequent problem with current search engines, it does occur. • Results are highly sensitive to vocabulary. Often our initial keywords do not get the results we want; in these cases the relevant documents use different terminology from the original query. This is unsatisfactory because semantically similar queries should return similar results. • Results are single Web pages. If we need information that is spread over various documents, we must initiate several queries to collect the relevant documents, and then we must manually extract the partial information and put it together. Interestingly, despite improvements in search engine technology, the diffi- culties remain essentially the same. It seems that the amount of Web content outpaces technological progress. But even if a search is successful, it is the person who must browse selected documents to extract the information he is looking for. That is, there is not much support for retrieving the information, a very time-consuming activity. Therefore, the term information retrieval, used in association with search engines, is somewhat misleading; location finder might be a more appropriate term. Also, results of Web searches are not readily accessible by other software tools; search engines are often isolated applications. The main obstacle to providing better support to Web users is that, at present, the meaning of Web content is not machine-accessible. Of course, there are tools that can retrieve texts, split them into parts, check the spelling, count their words. But when it comes to interpreting sentences and extracting useful information for users, the capabilities of current software are still very limited. It is simply difficult to distinguish the meaning of I amaprofessor of computer science. from I amaprofessor of computer science, you may think. Well, ... TLFeBOOK TLFeBOOK
TLFeBooK 1.2 From Today's Web to the Semantic Web: Examples Using text processing, how can the current situation be improved? One so- lution is to use the content as it is represented today and to develop increas- ingly sophisticated techniques based on artificial intelligence and computa tional linguistics. This approach has been followed for some time now, but despite some advances the task still appears too ambitious An alternative approach is to represent Web content in a form that is more easily machine-processable'and to use intelligent techniques to take advan tage of these representations. We refer to this plan of revolutionizing the Web as the Semantic Web initiative. It is important to understand that the Seman tic Web will not be a new global information highway parallel to the existing World Wide Web; instead it will gradually evolve out of the existing Web The Semantic Web is propagated by the World Wide Web Consortium (W3C), an international standardization body for the Web. The driving force of the Semantic Web initiative is Tim Berners-Lee, the very person who in- vented the www in the late 1980s. He expects from this initiative the re- alization of his original vision of the Web, a vision where the meaning of information played a far more important role than it does in todays Web The development of the Semantic Web has a lot of industry momentum, and governments are investing heavily. The U.S. government has established the DARPA Agent Markup Language(DAML) Project, and the Semantic Web is among the key action lines of the European Unions Sixth Framework amme 1.2 From Today's Web to the Semantic Web: Examples 1.2.1 Knowledge Management Knowledge management concerns itself with acquiring, accessing, and taining knowledge within an organization. It has emerged as a key activity of large businesses because they view internal knowledge as an in- tellectual asset from which they can draw greater productivity, create new ticularly important for internapeonaveness. Knowledge management is par- 1. In the literature the term machine understandable is used quite often. We believe it is the wrong word because it gives the wrong impression. It is not necessary for intelligent agents to under- stand information; it is sufficient for them to process information effectively, which sometimes causes people to think the machine really understands TLFeBOoK
1.2 From Today’s Web to the Semantic Web: Examples 3 Using text processing, how can the current situation be improved? One solution is to use the content as it is represented today and to develop increasingly sophisticated techniques based on artificial intelligence and computational linguistics. This approach has been followed for some time now, but despite some advances the task still appears too ambitious. An alternative approach is to represent Web content in a form that is more easily machine-processable1 and to use intelligent techniques to take advantage of these representations. We refer to this plan of revolutionizing the Web as the Semantic Web initiative. It is important to understand that the Semantic Web will not be a new global information highway parallel to the existing World Wide Web; instead it will gradually evolve out of the existing Web. The Semantic Web is propagated by the World Wide Web Consortium (W3C), an international standardization body for the Web. The driving force of the Semantic Web initiative is Tim Berners-Lee, the very person who invented the WWW in the late 1980s. He expects from this initiative the realization of his original vision of the Web, a vision where the meaning of information played a far more important role than it does in today’s Web. The development of the Semantic Web has a lot of industry momentum, and governments are investing heavily. The U.S. government has established the DARPA Agent Markup Language (DAML) Project, and the Semantic Web is among the key action lines of the European Union’s Sixth Framework Programme. 1.2 From Today’s Web to the Semantic Web: Examples 1.2.1 Knowledge Management Knowledge management concerns itself with acquiring, accessing, and maintaining knowledge within an organization. It has emerged as a key activity of large businesses because they view internal knowledge as an intellectual asset from which they can draw greater productivity, create new value, and increase their competitiveness. Knowledge management is particularly important for international organizations with geographically dispersed departments. 1. In the literature the term machine understandable is used quite often. We believe it is the wrong word because it gives the wrong impression. It is not necessary for intelligent agents to understand information; it is sufficient for them to process information effectively, which sometimes causes people to think the machine really understands. TLFeBOOK TLFeBOOK