Emy S Nst: April 2017

Building A Lexical Knowledge-Base of Near-Synonym Differences
By Diana Inkpen

A thesis submited in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Computer Sience University of Toronto

Background
Current natural language generation or machine translation systems cannot distinguish among Near-Synonym-words that share the same core meaning but vary in their lexical nuances. This is due to a lack of knowledge about differences between Near-Synonyms in existing computational lexical resources.
The goal of this thesis is to automatically acquire a lexical knowledge-base of Near-Synonym differences from multiple sources, and to show how it can be used in a practical natural language processing system.

Finding or Contributions
In this thesis she show that it is possible to automatically acquire knowledge about the differences between Near-Synonyms. Denotational, attitudinal, and stylistic differences are extracted from a special dictionary of synonyms and from machine-readable dictionaries. Knowledge about the collocational behaviour of the Near-Synonyms is acquired from free text. The resulting lexical knowledge-base of Near-Synonym differences is useful in many natural language processing aplications. She show how a natural language generation system can use it in order to choose the best Near-Synonym that matches a set of input preferences. If the preferences are lexical nuances extracted by the analysis component of a machine translation system, the translation quality would be higher: it would preserve not only the meaning of the text, but also its nuances of meaning.

Method
She designed a method to automatically acquire knowledge from dictionaries of Near-Synonym discrimination written for human readers. An unsupervised decision list algorithm learns patterns and words for classes of distinctions. The patterns are automatically, followed by a manual validation step. The extraction of distinctions between Near-Synonyms is entirely automatic.
She enriched the initial LKB of NS with information extracted from other sources. First,information about the senses of Near-Synonym was added (WordNet senses). Second, knowledge about the collocational behaviour of Near-Synonym was acquired from free text. Third, knowledge about distinctions between Near-Synonym was acquired from machine readable dictionaries (the General Inquirer and the Macquarie Dictionary)

Strengtness and Weakness
This study used a modified version of the SynLex dictionary to estimate the level of synonymy between word pairs. The maximum level of synonym did not guarantee that substitutions would be correct, rather the precision at this level was still relatively poor.

Conclussion
Lexical simplification is a topic that requires more attention in research on automatic text simplification. A common assumption is that frequency alone is a sufficient criterion for estimating the difficulty of words. Although this is naturally not always the case word frequencies are usually a good estimate of word difficulty. Based on this assumption it is easy to compare the difficulty of two words by simply reffering to word frequency information. Many researchers apply this reasoning to lexical simplification but do not give appripriate attention to many of the related questions.

Emy S Nst

Senin, 17 April 2017

Reviewing Article (synonym)

Arsip Blog