Translation Equivalence and Synonymy: Preserving the Synsets in Cross-lingual Wordnets Olivia O.Y. Kwong The Chinese University of Hong Kong oykwong@arts.cuhk.edu.hk.

Slides:



Advertisements
Similar presentations
Vocabulary and Grammar(1). 1. (all) kinds of ideas 2. explain things to us 3. never show off 4. bring balloons of all colours 5. play his CDs for us 6.
Advertisements

广州市教育局教学研究室英语科 Module 1 Unit 2 Reading STANDARD ENGLISH AND DIALECTS.
2 Learning procedures Comments on your homework ( 作业评讲) Tips on how to make your sentences better ( 技巧指导) 3 Practice makes perfect ( 巩固复习) 1.
考研英语复试 口语准备 考研英语口语复试. 考研英语复试 口语准备 服装 谦虚、微笑、自信 态度积极 乐观沉稳.
A self-reflection of my teaching design Unit 1 New Friends New Faces 戴弘梧.
高中英语教材分析与教学建议 福建教育学院外语研修部特级教师:周大明. 课程目录  一、理论创新与教材发展  二、现行教材的理论基础和编写体系  三、图式理论与 “ 话题教学 ”  四、课例分析与教学建议.
劉凝慧 青年新歌.
2014 年上学期 湖南长郡卫星远程学校 制作 13 Getting news from the Internet.
Healthy Breakfast 第四組 電子一甲(電資一) 指導老師:高美玉 組長:B 侯昌毅
-CHINESE TIME (中文时间): Free Response idea: 你周末做了什么?
增译法 作为翻译的一个普遍准则,译者不应当对原文的内容随意增减。不过,在实际翻译过程中,要准确地传达原文的信息,译者难免要对译文做一些增添或删减, 译者往往需要把原文中隐含的一些东西增补清楚,以便于读者理解。 例如: Success is often just an idea away. 原译:成功往往只是一个念头的距离。
On Irritability 英译汉.
当代中国流行文化与对外汉语教学 Contemporary Chinese Popular Culture and Teaching Chinese as a Foreign Language Yuhong Sun 孙玉红.
Business English Reading
CHIN 3010: reading & writing
后置定语 形容词是表示人或事物的性质、特征或属性的一类词。它在句中可以充当定语,对名词起修饰、描绘作用,还可以充当表语、宾语补足语等。形容词作定语修饰名词时,一般放在被修饰的名词之前,称作前置定语。但有时也可放在被修饰的名词之后,称作后置定语。
即兴中文讲演比赛 On-Site Speech 新型比赛项目
BRIEF GUIDELINE FOR AUTHOR PREPARING PAPER FOR PUBLICATION
SHARE with YOU Why am I here? (堅持……) What did I do?
从离线考试的翻译题谈起 - - 英译汉 词汇翻译技巧2则
Academic Year TFC EFL Data Collection Outline 学年美丽中国英语测试数据收集概述
云实践引导产业升级 沈寓实 博士 教授 MBA 中国云体系产业创新战略联盟秘书长 微软云计算中国区总监 WinHEC 2015
Unit2 School life Reading 2.
Chinglish 组员:吴海燕 冯华波 顾佳婧 孙 露 陆 凯 李 宁.
The Bug Book Adjectives
扬州市微课程设计 偷龙转凤 为我所用 年江苏高考新题型“读写任务型写作”中如何恰当转化句式概括要点 扬州市江都区仙城中学刘磊.
Teaching aims 1.To remember and master the usage of the key words and phrases. (记住并掌握重要单词及短语的用法) 2.To translate the sentences in the text freely. ( 能熟练的翻译课文)
Motivational Curriculum Design For A Lesson--Dating (约会)
人际交往:科学与艺术.
Calling about an apartment for rent II Objectives
Unit title: 买东西 - Shopping
Unit title: 嗨!Hi! Introducing yourself in Chinese
HOW TO ACE -- THE IELTS SPEAKING TEST
肢體殘障人士 Physically handicapped
创建型设计模式.
Unit 2 Key points summary.
Cross cultural communication in college english
Chapter 9 Intelligence.
Chinese 101 University of Puget Sound
人際關係與溝通 陳世聰
This Is English 3 双向视频文稿.
Dì 十三kè 我家很容易找.
Formal Pivot to both Language and Intelligence in Science
Lesson 44:Popular Sayings
基于课程标准的校本课程教学研究 乐清中学 赵海霞.
第十五课:在医院看病.
Traditional Chinese Medicine
英語科會考題目分析及有效教學策略建議 桃園市青溪國中許绣敏.
資料結構 Data Structures Fall 2006, 95學年第一學期 Instructor : 陳宗正.
高性能计算与天文技术联合实验室 智能与计算学部 天津大学
Unit 8 Our Clothes Topic1 What a nice coat! Section D 赤峰市翁牛特旗梧桐花中学 赵亚平.
Guide to a successful PowerPoint design – simple is best
中央社新聞— <LTTC:台灣學生英語聽說提升 讀寫相對下降>
Chapter 5 Culture Shock(2) 讲授者:余敏军
《语言与文化》 Unit 3 Verbal and Non-verbal Communication
Review and Analysis of the Usage of Degree Adverbs
——Teaching for t_______ hinking
從 ER 到 Logical Schema ──兼談Schema Integration
Unit 4 Body Language.
績效考核 一.績效考核: 1.意義 2.目的 3.影響績效的因素 二.要考核什麼? 三.誰來負責考核? 四.運用什麼工具與方法?
高考应试作文写作训练 5. 正反观点对比.
TEEN CHALLENGE Next Steps 核心价值观总结 CORE VALUES 青年挑战核心价值观
第五单元: Unit 5 第十五课:Lesson 15
政府的减贫计划如何使资源有效向穷人传递? How should government make and implement specific poverty reduction program to effectively transfer resources to the poor? Wang Sangui.
The Role of Parents in the Moral Development of the Child
精品学习网---初中频道 海量同步课件、同步备考、同步试题等资源免费下载!
冀教版 九年级 Lesson 20: Say It in Five.
研究发现: 绵羊记忆力惊人!.
健康按摩法 請開音樂.
英语口译 4 Education and Campus 大学英语教学部 向丁丁.
Presentation transcript:

Translation Equivalence and Synonymy: Preserving the Synsets in Cross-lingual Wordnets Olivia O.Y. Kwong The Chinese University of Hong Kong oykwong@arts.cuhk.edu.hk

Infrastructure of Princeton WordNet Synsets as building blocks Unordered sets of words that “denote the same concept and are interchangeable in many contexts” Synonymy / mutual substitutability Nouns, verbs, adjectives, adverbs Adjectives not hierarchically ordered, considered polysemous but of limited use in conveying info GWC 2018, NTU, Singapore 10 Jan 2018

Wordnets in other languages Princeton WordNet Merge Model Select vocabulary and develop synsets separately and locally Generate equivalence relations to PWN Expand Model Start with PWN vocab and synsets Translate synsets into target language using bilingual dictionaries Wordnets in other languages GWC 2018, NTU, Singapore 10 Jan 2018

Chinese Wordnets Various attempts (Huang et al., 2004; Xu et al., 2008; Huang et al., 2010; Wang and Bond, 2013) (Semi-)automatic identification of translation equivalents with human verification Some limited the number of translation equivalents for a synset, while others intentionally added more entries Chinese Open Wordnet (Wang and Bond, 2013) Follow Expand Model, with detailed guidelines for checking Chinese translations obtained by merging existing data, checked manually, adding new translations from authoritative bilingual dictionaries High coverage but possibly lower accuracy Adjectives: 13.8% of 4,960 core synsets GWC 2018, NTU, Singapore 10 Jan 2018

Potential Blind Spots 好 Generalness of the concept nice (pleasant or pleasing or agreeable in nature or appearance) 体贴(的),合意(的),美好(的),和蔼(的),友好(的),令人愉快(的),令人快乐(的),讨人喜欢(的) 好 Generalness of the concept pleasant / pleasing / agreeable nature / appearance ==> ANYTHING ! 和蔼 --> person 美好 --> inanimate obj GWC 2018, NTU, Singapore 10 Jan 2018

Potential Blind Spots 和蔼 exists in both synsets kind (having or showing a tender and considerate and helpful nature; used especially of persons and their behavior) 体谅(的),体贴(的),善良(的),仁慈(的),和善(的),宽厚(的),友善(的),好心(的),好心肠(的),亲切(的),温和(的),和蔼(的),宽宏大量(的),友好(的),乐于助人(的) considerate friendly helpful 和蔼 exists in both synsets --> “nice” and “kind” synonymous? --> Multiple senses of 和蔼 in most dictionaries? --> Legitimate to treat it as translation equivalents for both synsets? --> 和蔼 and 体贴 synonymous? --> Still qualify as a synset? GWC 2018, NTU, Singapore 10 Jan 2018

Two Issues Seriousness of the problem across different parts of speech Nouns and verbs may have more distinct references Fuzziness and subjectivity involved in adjectives Problem expected to be more pronounced among adjectives When the coverage of the meanings by the translation equivalents is at the expense of violating the requirements for synsets, are there better ways to handle such cases? GWC 2018, NTU, Singapore 10 Jan 2018

Nouns < Adjs < Verbs Synset sizes: Nouns (1-39 items) Adjs (1-15 items) Verbs (1-13 items) Overall tendency: Nouns < Adjs < Verbs GWC 2018, NTU, Singapore 10 Jan 2018

Examples (Nouns) 12896307-n black nightshade, common nightshade, poison-berry, poisonberry, Solanum nigrum (Eurasian herb naturalized in America having white flowers and poisonous hairy foliage and bearing black berries that are sometimes poisonous but sometimes edible) 老鸦酸浆草, 乌归菜, 野葡萄, 酸浆草, 救儿草, 黑姑娘, 天泡果, 地戎草, 七粒扣, 山海椒, 黑茄, 野茄子, 天泡草, 地泡子, 天天茄, 天茄子, 野辣 角, 野海椒, 后红子, 天茄苗儿, 老鸦眼睛草, 水茄, 水苦菜, 野伞子, 天茄菜, 山辣椒, 狗钮子, 苦葵, 苦菜, 野茄菜, 飞天龙, 龙葵, 耳坠菜, 乌疔草, 野辣椒 09823502-n aunt, auntie, aunty (the sister of your father or mother; the wife of your uncle) 妗, 姑母, 伯母, 姑姑, 老大妈, 阿姨, 妗母, 叔母, 姑妈, 舅母, 姑, 姨妈, 姨, 舅妈, 婶子, 婶婶, 姨母, 婶母 GWC 2018, NTU, Singapore 10 Jan 2018

Examples (Adjectives) hot (extended meanings; especially of psychological heat; marked by intensity or vehemence especially of passion or enthusiasm) 流行(的), 热切(的), 激烈(的), 热门(的), 才发行(的), 急躁(的), 销路好(的), 刚出版(的), 轰动一时(的), 最新(的), 紧缺(的), 激动(的), 狂热(的),热烈(的),时新(的) popular impatient hot topic temper new book love affair argument … GWC 2018, NTU, Singapore 10 Jan 2018

Examples (Verbs) 01215137-v arrest, pick up, nail, apprehend, nab, collar, cop (take into custody) 捕捉, 捉到, 捕获, 逮捕, 拘留, 拘押, 拘捕, 抓住, 抓获, 当场逮捕, 擒获, 逮住 Too general Over-specific GWC 2018, NTU, Singapore 10 Jan 2018

Adjectives and Non-synsets Examined 200 top-sized adjective synsets from COW At most 27 out of 200 do not contain phrasal members Show that bilingual dictionaries tend to provide translated definitions or paraphrase instead of or in addition to translation equivalents Compatibility with WordNet structure is questionable Possible causes of the non-synsets? GWC 2018, NTU, Singapore 10 Jan 2018

Different Sense Distinctions 00411886-a civilized, civilised (having a high state of culture and development both social and technological) 文明化(的), 有礼貌(的), 有教养(的), 开化(的), 文明(的), 文雅(的) 01947741-a cultured, polite, civilized, civilised, cultivated, genteel (marked by refinement in taste and manners) 文雅(的), 有礼貌(的), 优雅(的), 有教养(的), 有礼(的), 文明(的), 有先进文化(的), 有修养(的) More collective sense ? ? ?    elegant polite cultivated More personal and individual behaviour GWC 2018, NTU, Singapore 10 Jan 2018

Over-interpretation of Concepts docile (willing to be taught or led or supervised or directed) 易管教(的), 驯服(的), 易教育(的), 易驾驭(的), 可教导(的), 容易教(的), 听话(的), 驯良(的), 愿学习(的), 易训练(的), 温顺(的), 顺从(的), 易控制(的) Lexicalised: 驯服,温顺,听话  Phrasal: 易管教 (easy to teach),易驾驭 (easy to control)  But 愿学习 (willing to learn) == willing to be taught / easy to control ?? GWC 2018, NTU, Singapore 10 Jan 2018

Multiple Facets of Concepts Chinese (of or pertaining to China or its peoples or cultures) 中国文化(的), 汉, 华, 中文(的), 中国人(的), 汉语(的), 中国话(的), 中国(的), 中 Pertains to various aspects relating to China, but 中国人 == 中国话 ?? GWC 2018, NTU, Singapore 10 Jan 2018

Related but Subtly Different Words brown, brownish, dark-brown, chocolate-brown (of a color similar to that of wood or earth ) 咖啡色(的), 呈褐色(的), 黑褐色(的), 茶褐色(的), 棕色(的), 褐色(的) Different hues and intensities of “brownness” GWC 2018, NTU, Singapore 10 Jan 2018

Contradictory Connotation sharp, shrewd, astute (marked by practical hardheaded intelligence) 狡黠(的), 锐利(的), 精明(的), 狡猾(的), 机敏(的), 诡计多端(的), 锋利(的) - + - + - GWC 2018, NTU, Singapore 10 Jan 2018

Handling Extra-synset Information Conceptual and lexical gaps across languages Useful info for language learning and translation by humans and machines alike Importance and potential use of multiple forms and renditions in a target language Value-adding to accommodate them in wordnets in some way Basic synset structure should be maintained GWC 2018, NTU, Singapore 10 Jan 2018

1. Lexicalised Items Only Unless no lexicalised translation equivalent is available in target language Avoid over-interpretation 01251128-a cold (having a low or inadequate temperature or feeling a sensation of coldness or having been made cold by e.g. ice or refrigeration) 冰,冻,冷,寒,冰冻,冰冷,寒冷,气温低,温度不足,温度没有达到要求 GWC 2018, NTU, Singapore 10 Jan 2018

2. Language-specific Extensions Separate layer of class to store non-lexicalised expressions conveying meaning close enough to the original synset Should be a language-specific structure, not the core wordnet structure or the Inter-Lingual-Index Linked to base concepts GWC 2018, NTU, Singapore 10 Jan 2018

3. Comparable Specificity For very general or highly polysemous adjectives, similarly general equivalents should be included in corresponding synset Collocation-specific equivalents indicating different facets or senses should be captured at a subsuming level If no corresponding synset for specific meaning in PWN, add extra synset in target language wordnet linked to general synset Link specific meanings with corresponding synsets in PWN with similar-to Wise 聪明,聪颖 General Smart 聪明,聪颖 similar_to similar_to sagacious, perspicacious, sapient 睿智 sharp, shrewd, astute 精明,机敏 Specific GWC 2018, NTU, Singapore 10 Jan 2018

4. Utilisation of Pertainym Relation clever, wise, smart, intelligent, sharp, sagacious, canny … 聪明,聪颖,聪敏,机智,睿智,英明,精明 … General Mentally quick Able to make wise decisions Not equally synonymous Same word in too many synsets Distorted picture of polysemy Pertain to: Human Decision GWC 2018, NTU, Singapore 10 Jan 2018

5. Ensure logical validity Avoid words with contradictory connotation in a synset Prudently handle phrasal expressions 喝醉 vs 烂醉 (drink+drunk) (very+drunk) 贫困 vs 极度贫困 (impoverished) (extremely+impoverished) GWC 2018, NTU, Singapore 10 Jan 2018

Conclusion Translation equivalents not necessarily synonymous Could be a problem for building cross-lingual wordnets Vulnerability of adjectives, esp. the general ones Context-dependent equivalents separately linked Importance of keeping the theoretical foundation intact GWC 2018, NTU, Singapore 10 Jan 2018