Word Sense disambiguation Zhang Yu zhangyu( irhit. edu.cn
Word Sense Disambiguation Zhang Yu zhangyu@ir.hit.edu.cn
Overview of the problem Problem: many words have different meanings Or senses, i.e. there is ambiguity about how they are to be specifically interpreted (e. g, differentian Task to determine which of the senses of an ambiguous word is invoked in a particular use of the word by looking at the context of its use Note: more often than not the different senses of a word are closely related 20212/5 Natural Language Processing--Word Sense Disambiguation
2021/2/5 Natural Language Processing -- Word Sense Disambiguation 2 Overview of the Problem ◼ Problem: many words have different meanings or senses, i.e., there is ambiguity about how they are to be specifically interpreted (e.g., differentiate). ◼ Task: to determine which of the senses of an ambiguous word is invoked in a particular use of the word by looking at the context of its use. ◼ Note: more often than not the different senses of a word are closely related
Ambiguity resolution Bank Title The rising ground bordering a Name/heading of a book lake river. or sea statue. work of art or music An establishment for the etc custody, loan exchange,or Material at the start of a film issue of money, for the The right of legal ownership extension of credit and for (of land) facilitating the transmission of The document that is evidence funds of the right A n appe lation of respect attached to a person s name A written work(synecdoche part stands for the whole) 20212/5 Natural Language Processing--Word Sense Disambiguation
2021/2/5 Natural Language Processing -- Word Sense Disambiguation 3 Ambiguity Resolution ◼ Bank ◼ The rising ground bordering a lake, river, or sea ◼ An establishment for the custody, loan exchange, or issue of money, for the extension of credit, and for facilitating the transmission of funds ◼ Title ◼ Name/heading of a book, statue, work of art or music, etc. ◼ Material at the start of a film ◼ The right of legal ownership (of land) ◼ The document that is evidence of the right ◼ An appellation of respect attached to a person’ s name ◼ A written work (synecdoche: part stands for the whole)
Overview of our discussion Methodology Supervised Disambiguation: based on a labeled training set Dictionary-Based Disambiguation: based on lexical resources such as dictionaries and thesauri Unsupervised Disambiguation: based on unlabeled corpora 20212/5 Natural Language Processing--Word Sense Disambiguation
2021/2/5 Natural Language Processing -- Word Sense Disambiguation 4 Overview of our Discussion ◼ Methodology ◼ Supervised Disambiguation: based on a labeled training set. ◼ Dictionary-Based Disambiguation: based on lexical resources such as dictionaries and thesauri. ◼ Unsupervised Disambiguation: based on unlabeled corpora
Methodological Preliminaries Supervised versus Unsupervised Learning: In supervised learning(classification), the sense label of each word occurrence is provided in the training set; whereas, in unsupervised learning (clustering), it is not provided Pseudowords: used to generate artificial evaluation data for comparison and improvements of text-processing algorithms e.g, replace each of two words(e.g,, bell and book) with a psuedoword(e.g, bell-book a Upper and Lower Bounds on Performance: used to find out how well an algorithm performs relative to the difficulty of the task Upper: human performance Lower: baseline using highest frequency alternative(best of 2 versus 10) 20212/5 Natural Language Processing--Word Sense Disambiguation
2021/2/5 Natural Language Processing -- Word Sense Disambiguation 5 Methodological Preliminaries ◼ Supervised versus Unsupervised Learning: In supervised learning (classification), the sense label of each word occurrence is provided in the training set; whereas, in unsupervised learning (clustering), it is not provided. ◼ Pseudowords: used to generate artificial evaluation data for comparison and improvements of text-processing algorithms, e.g., replace each of two words (e.g., bell and book) with a psuedoword (e.g., bell-book). ◼ Upper and Lower Bounds on Performance: used to find out how well an algorithm performs relative to the difficulty of the task. ◼ – Upper: human performance ◼ – Lower: baseline using highest frequency alternative (best of 2 versus 10)