Presentation is loading. Please wait.

Presentation is loading. Please wait.

Translation Equivalence and Synonymy: Preserving the Synsets in Cross-lingual Wordnets Olivia O.Y. Kwong The Chinese University of Hong Kong oykwong@arts.cuhk.edu.hk.

Similar presentations


Presentation on theme: "Translation Equivalence and Synonymy: Preserving the Synsets in Cross-lingual Wordnets Olivia O.Y. Kwong The Chinese University of Hong Kong oykwong@arts.cuhk.edu.hk."— Presentation transcript:

1 Translation Equivalence and Synonymy: Preserving the Synsets in Cross-lingual Wordnets
Olivia O.Y. Kwong The Chinese University of Hong Kong

2 Infrastructure of Princeton WordNet
Synsets as building blocks Unordered sets of words that “denote the same concept and are interchangeable in many contexts” Synonymy / mutual substitutability Nouns, verbs, adjectives, adverbs Adjectives not hierarchically ordered, considered polysemous but of limited use in conveying info GWC 2018, NTU, Singapore 10 Jan 2018

3 Wordnets in other languages
Princeton WordNet Merge Model Select vocabulary and develop synsets separately and locally Generate equivalence relations to PWN Expand Model Start with PWN vocab and synsets Translate synsets into target language using bilingual dictionaries Wordnets in other languages GWC 2018, NTU, Singapore 10 Jan 2018

4 Chinese Wordnets Various attempts (Huang et al., 2004; Xu et al., 2008; Huang et al., 2010; Wang and Bond, 2013) (Semi-)automatic identification of translation equivalents with human verification Some limited the number of translation equivalents for a synset, while others intentionally added more entries Chinese Open Wordnet (Wang and Bond, 2013) Follow Expand Model, with detailed guidelines for checking Chinese translations obtained by merging existing data, checked manually, adding new translations from authoritative bilingual dictionaries High coverage but possibly lower accuracy Adjectives: 13.8% of 4,960 core synsets GWC 2018, NTU, Singapore 10 Jan 2018

5 Potential Blind Spots 好 Generalness of the concept
nice (pleasant or pleasing or agreeable in nature or appearance) 体贴(的),合意(的),美好(的),和蔼(的),友好(的),令人愉快(的),令人快乐(的),讨人喜欢(的) Generalness of the concept pleasant / pleasing / agreeable nature / appearance ==> ANYTHING ! 和蔼 --> person 美好 --> inanimate obj GWC 2018, NTU, Singapore 10 Jan 2018

6 Potential Blind Spots 和蔼 exists in both synsets
kind (having or showing a tender and considerate and helpful nature; used especially of persons and their behavior) 体谅(的),体贴(的),善良(的),仁慈(的),和善(的),宽厚(的),友善(的),好心(的),好心肠(的),亲切(的),温和(的),和蔼(的),宽宏大量(的),友好(的),乐于助人(的) considerate friendly helpful 和蔼 exists in both synsets --> “nice” and “kind” synonymous? --> Multiple senses of 和蔼 in most dictionaries? --> Legitimate to treat it as translation equivalents for both synsets? --> 和蔼 and 体贴 synonymous? --> Still qualify as a synset? GWC 2018, NTU, Singapore 10 Jan 2018

7 Two Issues Seriousness of the problem across different parts of speech
Nouns and verbs may have more distinct references Fuzziness and subjectivity involved in adjectives Problem expected to be more pronounced among adjectives When the coverage of the meanings by the translation equivalents is at the expense of violating the requirements for synsets, are there better ways to handle such cases? GWC 2018, NTU, Singapore 10 Jan 2018

8 Nouns < Adjs < Verbs
Synset sizes: Nouns (1-39 items) Adjs (1-15 items) Verbs (1-13 items) Overall tendency: Nouns < Adjs < Verbs GWC 2018, NTU, Singapore 10 Jan 2018

9 Examples (Nouns) n black nightshade, common nightshade, poison-berry, poisonberry, Solanum nigrum (Eurasian herb naturalized in America having white flowers and poisonous hairy foliage and bearing black berries that are sometimes poisonous but sometimes edible) 老鸦酸浆草, 乌归菜, 野葡萄, 酸浆草, 救儿草, 黑姑娘, 天泡果, 地戎草, 七粒扣, 山海椒, 黑茄, 野茄子, 天泡草, 地泡子, 天天茄, 天茄子, 野辣 角, 野海椒, 后红子, 天茄苗儿, 老鸦眼睛草, 水茄, 水苦菜, 野伞子, 天茄菜, 山辣椒, 狗钮子, 苦葵, 苦菜, 野茄菜, 飞天龙, 龙葵, 耳坠菜, 乌疔草, 野辣椒 n aunt, auntie, aunty (the sister of your father or mother; the wife of your uncle) 妗, 姑母, 伯母, 姑姑, 老大妈, 阿姨, 妗母, 叔母, 姑妈, 舅母, 姑, 姨妈, 姨, 舅妈, 婶子, 婶婶, 姨母, 婶母 GWC 2018, NTU, Singapore 10 Jan 2018

10 Examples (Adjectives)
hot (extended meanings; especially of psychological heat; marked by intensity or vehemence especially of passion or enthusiasm) 流行(的), 热切(的), 激烈(的), 热门(的), 才发行(的), 急躁(的), 销路好(的), 刚出版(的), 轰动一时(的), 最新(的), 紧缺(的), 激动(的), 狂热(的),热烈(的),时新(的) popular impatient hot topic temper new book love affair argument GWC 2018, NTU, Singapore 10 Jan 2018

11 Examples (Verbs) v arrest, pick up, nail, apprehend, nab, collar, cop (take into custody) 捕捉, 捉到, 捕获, 逮捕, 拘留, 拘押, 拘捕, 抓住, 抓获, 当场逮捕, 擒获, 逮住 Too general Over-specific GWC 2018, NTU, Singapore 10 Jan 2018

12 Adjectives and Non-synsets
Examined 200 top-sized adjective synsets from COW At most 27 out of 200 do not contain phrasal members Show that bilingual dictionaries tend to provide translated definitions or paraphrase instead of or in addition to translation equivalents Compatibility with WordNet structure is questionable Possible causes of the non-synsets? GWC 2018, NTU, Singapore 10 Jan 2018

13 Different Sense Distinctions
a civilized, civilised (having a high state of culture and development both social and technological) 文明化(的), 有礼貌(的), 有教养(的), 开化(的), 文明(的), 文雅(的) a cultured, polite, civilized, civilised, cultivated, genteel (marked by refinement in taste and manners) 文雅(的), 有礼貌(的), 优雅(的), 有教养(的), 有礼(的), 文明(的), 有先进文化(的), 有修养(的) More collective sense elegant polite cultivated More personal and individual behaviour GWC 2018, NTU, Singapore 10 Jan 2018

14 Over-interpretation of Concepts
docile (willing to be taught or led or supervised or directed) 易管教(的), 驯服(的), 易教育(的), 易驾驭(的), 可教导(的), 容易教(的), 听话(的), 驯良(的), 愿学习(的), 易训练(的), 温顺(的), 顺从(的), 易控制(的) Lexicalised: 驯服,温顺,听话  Phrasal: 易管教 (easy to teach),易驾驭 (easy to control)  But 愿学习 (willing to learn) == willing to be taught / easy to control ?? GWC 2018, NTU, Singapore 10 Jan 2018

15 Multiple Facets of Concepts
Chinese (of or pertaining to China or its peoples or cultures) 中国文化(的), 汉, 华, 中文(的), 中国人(的), 汉语(的), 中国话(的), 中国(的), 中 Pertains to various aspects relating to China, but 中国人 == 中国话 ?? GWC 2018, NTU, Singapore 10 Jan 2018

16 Related but Subtly Different Words
brown, brownish, dark-brown, chocolate-brown (of a color similar to that of wood or earth ) 咖啡色(的), 呈褐色(的), 黑褐色(的), 茶褐色(的), 棕色(的), 褐色(的) Different hues and intensities of “brownness” GWC 2018, NTU, Singapore 10 Jan 2018

17 Contradictory Connotation
sharp, shrewd, astute (marked by practical hardheaded intelligence) 狡黠(的), 锐利(的), 精明(的), 狡猾(的), 机敏(的), 诡计多端(的), 锋利(的) - + - + - GWC 2018, NTU, Singapore 10 Jan 2018

18 Handling Extra-synset Information
Conceptual and lexical gaps across languages Useful info for language learning and translation by humans and machines alike Importance and potential use of multiple forms and renditions in a target language Value-adding to accommodate them in wordnets in some way Basic synset structure should be maintained GWC 2018, NTU, Singapore 10 Jan 2018

19 1. Lexicalised Items Only
Unless no lexicalised translation equivalent is available in target language Avoid over-interpretation a cold (having a low or inadequate temperature or feeling a sensation of coldness or having been made cold by e.g. ice or refrigeration) 冰,冻,冷,寒,冰冻,冰冷,寒冷,气温低,温度不足,温度没有达到要求 GWC 2018, NTU, Singapore 10 Jan 2018

20 2. Language-specific Extensions
Separate layer of class to store non-lexicalised expressions conveying meaning close enough to the original synset Should be a language-specific structure, not the core wordnet structure or the Inter-Lingual-Index Linked to base concepts GWC 2018, NTU, Singapore 10 Jan 2018

21 3. Comparable Specificity
For very general or highly polysemous adjectives, similarly general equivalents should be included in corresponding synset Collocation-specific equivalents indicating different facets or senses should be captured at a subsuming level If no corresponding synset for specific meaning in PWN, add extra synset in target language wordnet linked to general synset Link specific meanings with corresponding synsets in PWN with similar-to Wise 聪明,聪颖 General Smart 聪明,聪颖 similar_to similar_to sagacious, perspicacious, sapient 睿智 sharp, shrewd, astute 精明,机敏 Specific GWC 2018, NTU, Singapore 10 Jan 2018

22 4. Utilisation of Pertainym Relation
clever, wise, smart, intelligent, sharp, sagacious, canny … 聪明,聪颖,聪敏,机智,睿智,英明,精明 … General Mentally quick Able to make wise decisions Not equally synonymous Same word in too many synsets Distorted picture of polysemy Pertain to: Human Decision GWC 2018, NTU, Singapore 10 Jan 2018

23 5. Ensure logical validity
Avoid words with contradictory connotation in a synset Prudently handle phrasal expressions 喝醉 vs 烂醉 (drink+drunk) (very+drunk) 贫困 vs 极度贫困 (impoverished) (extremely+impoverished) GWC 2018, NTU, Singapore 10 Jan 2018

24 Conclusion Translation equivalents not necessarily synonymous
Could be a problem for building cross-lingual wordnets Vulnerability of adjectives, esp. the general ones Context-dependent equivalents separately linked Importance of keeping the theoretical foundation intact GWC 2018, NTU, Singapore 10 Jan 2018


Download ppt "Translation Equivalence and Synonymy: Preserving the Synsets in Cross-lingual Wordnets Olivia O.Y. Kwong The Chinese University of Hong Kong oykwong@arts.cuhk.edu.hk."

Similar presentations


Ads by Google