Spoken Language Structure

Slides:

Advertisements

Similar presentations

广州市教育局教学研究室英语科 Module 1 Unit 2 Reading STANDARD ENGLISH AND DIALECTS.

Advertisements

高考短文改错专题张柱平. 高考短文改错专题一. 对短文改错的要求高考短文改错的目的在于测试考生判断发现, 纠正语篇中语言使用错误的能力, 以及考察考生在语篇中综合运用英语知识的能力. 二. 高考短文改错的命题特点高考短文改错题的形式有说明文. 短文故事. 书信等, 具有很强的实用性.

高考英语阅读分析 —— 七选五. 题型解读：试题模式：给出一篇缺少 5 个句子的文章，对应有七个选项，要求同学们根据文章结构、内容，选出正确的句子，填入相应的空白处。考查重点：主要考查考生对文章的整体内容和结构以及上下文逻辑意义的理解和掌握。（考试说明）选项特点：主旨概括句（文章整体内容）

桂林市 2011 年高三第二次调研考试质量分析暨备考教学建议桂林市教育科学研究所李陆桂. 二调平均分与一调、 2010 广西高考英语平均分的比较科目类别英语文科文科 2010 年广西一调二调与 10 年广西相差

期末考试作文讲解 % 的同学赞成住校 30% 的学生反对住校 1. 有利于培养我们良好的学习和生活习惯； 1. 学生住校不利于了解外界信息； 2 可与老师及同学充分交流有利于共同进步。 2. 和家人交流少。在寄宿制高中，大部分学生住校，但仍有一部分学生选择走读。你校就就此开展了一次问卷调查，主题为.

考研英语复试口语准备考研英语口语复试. 考研英语复试口语准备服装谦虚、微笑、自信态度积极乐观沉稳.

英语中考复习探讨如何写好书面表达宁波滨海学校李爱娣. 近三年中考试题分析评分标准试卷评分与练习 (2009 年书面表达为例）影响给分的因素：存在问题书面表达高分技巧建议.

第七课：电脑和网络. 生词上网 vs. 网上我上网看天气预报。今天早上看了网上的天气预报。正式 zhèngshì （报告，会议，纪录）他被这所学校正式录取大桥已经落成，日内就可以正式通车落伍 luòw ǔ 迟到 chídào 他怕迟到，六点就起床了.

2014 年上学期湖南长郡卫星远程学校制作 13 Getting news from the Internet.

-CHINESE TIME (中文时间): Free Response idea: 你周末做了什么？

专题八书面表达.

Business English Reading

CHIN 3010: reading & writing

大学英语新四级简答题讲座.

雅思大作文的结构 Presented by: 总统秘书王富贵.

摘要的开头： The passage mainly tells us sth.

Academic Year TFC EFL Data Collection Outline 学年美丽中国英语测试数据收集概述

XI. Hilbert Huang Transform (HHT)

Welcome Welcome to my class Welcome to my class!.

专题讲座武强中学外语组制作：刘瑞红.

Unit 4 I used to be afraid of the dark.

Been During the Vacation？

Module 5 Shopping 第2课时.

Applications of Digital Signal Processing

Platypus — Indoor Localization and Identification through Sensing Electric Potential Changes in Human Bodies.

Population proportion and sample proportion

考试与考生 --不对等与对等邹申上海外国语大学

Acoustic规范和测试 Base Band 瞿雪丽 2002/1/30.

HOW TO ACE -- THE IELTS SPEAKING TEST

Write a letter in a proper format

Fundamentals of Physics 8/e 27 - Circuit Theory

HLA - Time Management 陳昱豪.

创建型设计模式.

Unit 2 Key points summary.

機械波 Mechanical Waves Mechanical wave is a disturbance that travels through some material or substance called the medium for wave. Transverse wave is the.

印度武术 ——卡拉里帕亚特之秘.

This Is English 3 双向视频文稿.

2019/1/1 哈萨克铬业公司介绍 2008年10月10日.

塑膠材料的種類塑膠在模具內的流動模式流動性質的影響溫度性質的影響

Lesson 44:Popular Sayings

Chapter 3 Nationality Objectives:

十七課選課(xuǎn kè) 十七课选课(xuǎn kè)

基于课程标准的校本课程教学研究乐清中学赵海霞.

Single’s Day.

Customer Expectations of Service

句子成分的省略（1）.

语音技术的应用及挑战 APPLICATIONS & CHALLENGES OF SPEECH TECHNOLOGIES

IBM SWG Overall Introduction

Version Control System Based DSNs

Mechanics Exercise Class Ⅰ

Unit 8 Our Clothes Topic1 What a nice coat! Section D 赤峰市翁牛特旗梧桐花中学赵亚平.

Doing Business In Japan

虚拟仪器 virtual instrument

中央社新聞— ＜LTTC：台灣學生英語聽說提升讀寫相對下降＞

关联词 Writing.

Unit 7 Lesson 20 九中分校刘秀芬.

Simple Regression (簡單迴歸分析)

中考英语阅读理解完成句子命题与备考宝鸡市教育局教研室任军利

Transformational Leadership

Thinking Physics -Vibrations.

高考应试作文写作训练 5. 正反观点对比.

The Role of Parents in the Moral Development of the Child

磁共振原理的临床应用.

國立東華大學課程設計與潛能開發學系張德勝

陳煒 Rose Chen 田小鳳 Rossia Cheng

語音訊號的特徵向量張智星多媒體資訊檢索實驗室清華大學資訊工程系.

怎樣把同一評估給與在不同班級的學生 How to administer the Same assessment to students from Different classes and groups.

Principle and application of optical information technology

Gaussian Process Ruohua Shi Meeting

Presentation transcript:

Spoken Language Structure Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University Spoken language is used to communicate information from a speaker to a listener. Speech production and perception are both important components of the speech chain. References: - X. Huang et. al., Spoken Language Processing, Chapter 2 - 王小川教授，語音訊號處理，Chapters 2~3

Introduction Take a button-up approach to introduce the basic concepts from sound to phonetics (語音學) and phonology (音韻學) Syllables (音節) and words (詞) are followed by syntax (語法) and semantics (語意), which form the structure of spoken language processing Topics covered here Speech Production Speech Perception Phonetics and Phonology Structural Features of the Chinese Language

Determinants of Speech Communication Spoken language is used to communicate information from a speaker to a listener. Speech production and perception are both important components of the speech chain Speech signals are composed of analog sound patterns that serve as the basis for a discrete, symbolic representation of the spoken language – phonemes, syllables and words The production and interpretation of these sounds are governed by the syntax and semantics of the language spoken

Determinants of Speech Communication (cont.) Message Formulation Message Comprehension Language System Neuromuscular Mapping Neural Transduction Vocal Tract System Cochlea Motion Speech Analysis Speech Generation Articulatory Parameter Feature Extraction Phone, Word, Prosody Application Semantics, Actions Speech Understanding

Computer Counterpart The Speech Production Process Message formulation: create the concept (message) to be expressed Language system: convert the message into a sequence of words and find the pronunciation of the words (or the phoneme sequence). Apply the prosodic pattern: duration of phoneme, intonation (語調) of the sentence, and the loudness of the sounds Neuromuscular (神經肌肉) Mapping: perform articulatory (發聲的) mapping to control the vocal cords, lips, jaw, tongue, etc., to produce the sound sequence

Computer Counterpart (cont.) The Speech Understanding Process Cochlea (耳蝸) motion: the signal is passed to the cochlea in the inner ear, which performs the frequency analysis as a filter bank Neural transduction: converts the spectral signal into activity signals on the auditory nerve, corresponding to a feature extraction component It’s unclear how neural activity is mapped into the language system and how message comprehension (理解) is achieved in the brain

Sound Sound is a longitudinal (縱向的) pressure wave formed of compressions (壓縮) and rarefactions (稀疏) of air molecules (微粒), in a direction parallel to that of the application of energy Compressions are zones where air molecules have been forced by the application of energy into a tighter-than-usual configuration Rarefactions are zones where air molecules are less tightly packed

Sound (cont.) The alternating configurations of compression and rarefaction of air molecules along the path of an energy source are sometimes described by the graph of a sine wave The use of the sine graph is only a notational convenience for charting local pressure variations over time Chart:繪製...的圖表;繪製...的航海圖

Measures of Sounds Amplitude is related to the degree of displacement of the molecules from their resting position Measured on a logarithmic scale in decibels (dB, 分貝) A decibel is a means for comparing the intensity (強度) of two sounds: The intensity is proportional to the square of the sound pressure P. The Sound Pressure Level (SPL) is a measure of the absolute sound pressure P in dB The reference 0 dB corresponds to the threshold of hearing, which is P0=0.00002 μbar for a tone of 1KHz E.g., the speech conversation level at 3 feet is about 60dB SPL; a jackhammer’s level is about 120 db SPL Jackhammer:手提鑽 P0=0.00002 μbar: 2.聲音的響度響度是指發聲的[振動幅度],亦可稱為[力度],影響所及是聲音波形的高低,音波的振幅愈大,則響度愈大,其衡量基準並不是以時間為準,而是以振幅的大小為準,以[分貝(decibel,D B)]來衡量.為了使聲音有一可供比較的基準，故設定0分貝是正常人剛好能百分之五十偵測到的聲音響度，相當於0.0002微巴（μ bar）的大氣壓力，這個基準壓力大概相當於將一小粒花生米，掉落一厘米（cm）所作的功，所以是非常微小的。分貝的計算是採取一種叫作「聲音音壓位準」(SPL)的方式，音壓則相當於波幅的大小。利用這種方式可以決定聲音的大小水準，故也稱之為dBSPL。它的計算方式如下： L=20log(P/P0) P:聲音的音壓(或波幅大小)P0:基準音壓L:聲音的分貝值。 1μbar=1*10^-6bar

Measures of Sound (cont.) Absolute threshold of hearing: is the maximum amount of energy of a pure tone that cannot be detected by a listener in a noise free environment ♦ ♦ in sound pressure level

Speech Production – Articulation Produced by air-pressure waves emanating (發出) from the mouth and the nostrils(鼻孔) The inventory of phonemes (音素) are the basic units of speech and split into two classes Consonant (子音/輔音) Articulated (發音) when constrictions (壓縮) in the throat or obstructions (阻塞) in the mouth Vowel (母音/元音) without major constrictions and obstructions Emanate:發出,散發;放射

Speech Production – Articulation (cont.) Human speech production apparatus Lungs (肺): source of air during speech Vocal cords (larynx,喉頭): when the vocal folds (聲帶) are held close together and oscillate one another during a speech sound, the speech sound is said to be voiced (<=>unvoiced) Soft Palate (Velum,軟顎): allow passage of air through the nasal cavity Hard palate (硬顎): tongue placed on it to produce certain consonants Tongue(舌): flexible articulator, shaped away from palate for vowel, closed to or on the palate or other hard surfaces for consonant Teeth: braces (支撐) the tongue for certain consonants Lips(嘴唇): round or spread to affect vowel quality, closed completely to stop the oral air flow for certain consonants (p,b,m) p@qaVhk

Speech Production – Articulation (cont.) unvoiced consonant vowel voiced consonant

Speech Production - The Voicing Mechanisms Voiced sounds Including vowels, have a roughly regular pattern in both time and frequency structures than voiceless sounds Have more energy Vocal folds vibrate during phoneme articulation (otherwise is unvoiced) Vocal folds’ vibration (60H ~ 300 Hz, cycles in sec.) 男生分佈較低，女生分佈較高 The greater mass and length of adult male vocal folds as opposed to female In psychoacoustics, the distinct vowel timbres (of a sound of a instrument, 音質/色) is determined by how the tongue and lips shaping the oral resonance (共鳴/振) cavity Voiced sounds typically have more energy. (Vowels are voiced throughout their duration)

Speech Production - The Voicing Mechanisms (cont.) Voiced sounds (cont.) The rate of cycling (open and closing) of vocal folds in the larynx during phonation of voiced sounds is called the fundamental frequency (基頻) The fundamental frequency contributes more than any other single factor to the perception of pitch in speech A prosodic feature for use in recognition of tonal languages (e.g., Chinese) or as a measure of speaker identity or authenticity Authenticity:可信賴性;確實(性) The fundamental frequency sets the periodic baseline for all higher-frequency harmonics contributed by pharyngeal and oral resonance cavities .

Speech Production - Pitch quirk:突然的轉變

Speech Production - Formants The resonances (共振/共鳴) of the cavities that are typical of particular articulator configurations (e.g. the different vowel timbres) are called formants (共振峰)

Speech Production - Formants (cont.)

Speech Production - Formants (cont.) Spectrum 頻譜 Spectrogram 聲譜圖

Speech Production - Formants (cont.) Narrowband Spectrogram Both pitch harmonic and format information can be observed 100 ms/frame, 50 ms/frame move What is a spectrogram? A sound spectrogram (or sonogram) is a visual representation of an acoustic signal. To make a spectrogram, a Fourier transform is applied to an acoustic wave (or more technically, its electronic analog), deriving the frequencies and amplitudes of its component simple waves. Depending on the size of the Fourier analysis window, different levels of resolution are achieved. A long window resolves frequency at the expense of time-the result is a narrowband spectrogram, which reveals individual harmonics (component frequencies). If a small analysis window is used, adjacent harmonics are smeared together, but with better time resolution. The result is a wideband spectrogram in which individual pitch periods appear as vertical lines (or striations), with formant structure. Generally, wideband spectrograms are used in spectrogram reading because they give us more information about what's going on in the vocal tract, for reasons which should become clear. Name: 朱惠銘 1024-point FFT, 400 ms/frame, 200 ms/frame move Wide-band spectrograms：shorter windows (<10ms) Have good time resolution Narrow-band spectrograms：Longer windows (>20ms) The harmonics can be clearly seen

Speech Perception Physiology of the Ear The ear processes an acoustic pressure signal by First transforming it into a mechanical vibration pattern on the basilar membrane (基底膜) Then representing the pattern by a series of pulses to be transmitted by the auditory nerve Physiology of the Ear When air pressure variations reach the eardrum from the outside, it vibrates, and transmits the vibrations to bones adjacent to its opposite side Then the energy is transferred by mechanical action of the stapes into an impression on the membrane stretching over the oval window The cochlea can be roughly regarded as a set of filter banks, whose outputs are ordered by location Frequency-to-place transformation Physiology (生理機能) basilar membrane (基底膜) oval window (軟圓窗) cochlea (耳蝸)

Speech Perception Physiology of the Ear (cont.) The cochlea communicates directly with the auditory nerve, conducting a representation of sound to the brain.

Speech Perception Physiology of the Ear (cont.)

Speech Perception Physical vs. Perceptual Attributes Non-uniform equal loudness perception of tones of varying frequencies Tones of different pitch have different perceived loudness Sensitivity of the ear varies with the frequency and the quality of sound Hear sensitivity reaches a maximum around 4000 Hz

Speech Perception Physical vs. Perceptual Attributes Non-uniform equal loudness perception 4KHz : the first resonance of out ear canal 13KHz: the second resonance of out ear canal 4000 Hz

Speech Perception Physical vs. Perceptual Attributes (cont.) Masking: when the ear is exposed to two or more different tones, it’s a common experience that one tone may mask others An upward shift in the hearing threshold of the weaker tone by the louder tone A pure tone masks of higher frequency more effectively than those of lower frequency The greater the intensity of the masking tone, the broader the range of frequencies it can mask The Figure 2.15 shows both the threshold of hearing and the masked threshold of a tone at 1kHz with a 69dB SPL.

Speech Perception Physical vs. Perceptual Attributes (cont.) The sense of localization attention (Lateralization) Binaural listening greatly enhances our ability to sense the direction of the sound source Time and intensity cues have different impacts for low frequency and high frequency, respectively Low-frequency sounds are lateralized mainly on the basis of interaural time differences High-frequency sounds are lateralized mainly on the basis of interaural intensity differences The question of distinct voice quality Speech from different people sounds different, e.g., different fundamental frequencies, different vocal-tract length The concept of timbre (音質) is defined as that the attribute of auditory sensation by which a subject can judge that two sounds similarly presented and having the same loudness and pitch are dissimilar

Speech Perception Frequency Analysis Researchers undertook psychoacoustic (心理聲學) experimental work to derive frequency scales that attempt to model the natural response of the human perceptual system (the cochlea acts as a spectrum analyzer) The perceptual attributes of sounds at different frequencies may not be entirely simple or linear in natural Bark Scale: Fletcher’s work (1940) pointed to the existence of critical bands in the cochlear response The cochlea acts as if it were made up of overlapping filters having bandwidth equal to the critical bandwidth One class of critical band scales is called Bark frequency scale (24 critical bands)

Speech Perception Frequency Analysis (cont.) Bark Scale: (cont.) Treat spectral energy over the Bark scale, a more natural fit with spectral information processing in the ear can be achieved The perceptual resolution (解析度) is finer in the lower frequencies The critical bands are continuous such that a tone of any audible frequency always finds a critical band centered on it

Speech Perception Frequency Analysis (cont.) Bark Scale: (cont.) The perceptual resoultion is finer in the lower fequencies.

Speech Perception Frequency Analysis (cont.) Mel Frequency Scale (Mel): linear below 1 KHz and logarithmic above Model the sensitivity of the human ear Mel: a unit of measure of perceived pitch or frequency of a tone Steven and Volkman (1940) Arbitrarily choose the frequency 1,000 Hz as “1,000 mels”. Listeners were then asked to change the physical frequency until the pitch they perceived was twice the reference, then 10 times, and so on; and then half the reference, 1/10, and so on These pitches were labeled 2,000, 10,000 mels and so on; and 500 and 100 mels, and so on Determine a mapping between the real frequency scale (Hz) and the perceptual frequency (Mel) Have been widely used in modern speech recognition system

Speech Perception Frequency Analysis (cont.) Mel Frequency Scale (cont.)

Speech Perception Frequency Analysis (cont.)

Phoneme and Phone Phoneme and Phone In speech science, the term phoneme (音素/音位) is used to denote any of the minimal units of speech sound in a language that can serve to distinguish one word from another E.g., mean /iy/ and man /ae/ The term phone is used to denote a phoneme’s acoustic realization E.g., phoneme /t/ has two very different acoustic realizations in the word sat and meter. We had better treat them as two different phones when building a spoken language system E.g., phoneme /l/ : like and sail Phonemes and phones => Like words have different realizations of variant font sizes.

Phoneme and Phone Phoneme and phone interchangeably used to refer to the speaker-independent and context-independent units of meaningful sound contrast The set of phonemes will differ in realization across individual speakers

Vowels The tongue shape and positioning on the oral cavity do not form a major constriction (壓縮) of air flow during vowel articulation Variations of tongue placement give each vowel its distinct character by changing the resonances (the positions of formants) Just as different sizes and shapes of bottles give rise to different acoustic effects when struck The linguistically important dimensions of the tongue movements are generally the ranges [front <-> back] and [high <-> low] F1 and F2 The primary energy entering the pharyngeal (咽) and oral (口腔) cavities in vowel production vibrates at the fundamental frequency. The major resonances of the oral and pharyngeal cavities for vowels are called F1 and F2 Pharyngeal: 咽

Vowels (cont.) F1 and F2 (cont.) The major resonances of these two cavities for vowels are called F1 and F2, the first and second formants Determined by the tongue placement and oral tract shape in vowels Determine the characteristic timbre or quality of the vowel English vowels can be described by the relationship of F1 and F2 to one another F2 is determined by the size of the and shape of the oral portion, forward of the major tongue extrusion(擠壓) F1 corresponds to the back or pharyngeal portion of the cavity (the cavity from the glottis (聲門) to the tongue extrusion), which is longer than the forward part. Its resonance would be lower Rounding the lips has the effect of extending the front-of-tongue cavity, thus lowering F2 Rounding the lips has the effect of extending the front-of-tongue cavity, thus lowering F2 => The tongue position is at the back-low part => The front-part become longer! Take the vowel “i” in word “see”for example: The tongue extrusion is far forward in the mouth, creating an exceptionally long rear cavity, and correspondingly low F1. The forward part of the oral cavity , at the same time, is extremely short, contributing to higher F2.

Vowels (cont.) The characteristic F1 and F2 values are ideal locations for perception 嘴唇愈成圓形或愈開

Vowels (cont.) The tongue hump (彎曲、隆起) is the major actor in vowel articulation. The most important secondary vowel mechanism for English and many other language is lip rounding E.g. /iy/ (see) and /uw/ (blue) When you say /iy/, your tongue will be in the high/front position and your lips will be flat, slightly open, and somewhat spread Lower F1 and Higher F2 When you say /uw/, your tongue will be in the high/back position and your lips begin to round out, ending in a more puckered (縮攏的) position Higher F1 and Lower F2

Vowels (cont.) e.g. “see” e.g. “blue” e.g. “fill” e.g. “dog” When pronounce “See father” The movement of tongue from “front-high” “iy, see” to “back-low” “aa, father” is very easily! Diphthong: ay “tie” => “aa” to “iy” e.g. “gass” e.g. “father”

Vowels (cont.) Diphthongs(雙母音) A special class of vowels that combine two distinct sets of F1/F2 values foul:骯髒的,污濁的

Vowels (cont.) Note: not only tongue hump (彎曲、隆起) but also lip rounding is the two major actor in vowel articulation for most languages

Vowels (cont.) Neutral Speech (German) Affective speech (German) B. Vlasenko et al., “Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions,” Interspeech2011.

Consonants Characterized by significant constriction (壓縮) or obstruction (阻塞) in the pharyngeal and/or oral cavities Some consonants are voiced; others are not Many consonants occur in pairs, i.e., sharing the same configuration of articulators and one member of the pair additionally has voicing while the other lacks (e.g. /z, s/) In phonology, tenseness is a particular vowel quality that is phonemically contrastive in many languages, including English. It has also occasionally been used to describe contrasts in consonants. Unlike most distinctive features, the feature [tense] can be interpreted only relatively, that is, in a language like English that contrasts [i] (e.g. beat) and [ɪ] (e.g. bit), the former can be described as a tense vowel while the latter is a lax vowel. Another example is Vietnamese, where the letters ă and â represent lax vowels, and the letters a and ơ the corresponding tense vowels. Some languages like Spanish are often considered as having only tense vowels, but since the quality of tenseness is not a phonemic feature in this language, it cannot be applied to describe its vowels in any meaningful way. 破裂音鼻音摩擦音捲舌音舌邊音滑音

Consonants (cont.) Plosives (破裂音) Fricatives (摩擦音) Nasals (鼻音) E.g., /b, p/, /d, t/, /g, k/ Consonant that involve complete blockage of oral cavity Fricatives (摩擦音) E.g., /z, s/ Consonants that involve nearly complete blockage of oral cavity Nasals (鼻音) E.g., /m, n, ng/ Consonants that let the oral cavity significantly constricted, velar (軟顎) open, voicing and air pass through the nasal cavity Retroflex liquids (捲舌音) E.g., /r/ The tip of the tongue is curled back slightly Constricted:狹隘的; 受限制的

Consonants (cont.) Lateral liquids (舌邊音) Glides (滑音) E.g., /l/ Air stream flows around the sides of the tongue Glides (滑音) E.g. /y, w/ Be a little shortened and lack the ability to be stressed, usually at the initial position within a syllable (e.g., yes, well)

Consonants (cont.) Semi-vowels Non-sonorant consonants Have voicing without complete constriction or obstruction of the vocal tract Include the liquid group /r, l/ and glide group /y, w/ {vowels, semi-vowels}: sonorant (響音) Non-sonorant consonants Maintain some voicing before or during the obstruction until the pressure differential across the glottis (聲門) to disappear, due to the closure E.g., /b, d, g, z, zh, v/ (voicing) and their counterparts /p, t, k, s, sh, f/ (unvoicing) 帶聲的子音不帶聲的子音

Consonants (cont.) 阻塞部分在雙唇阻塞部分在舌尖與齒背阻塞部分在舌根與硬顎壓縮部分在舌尖對齒背壓縮部分在舌尖對硬顎前面壓縮部分在舌面對硬顎軟顎下降使得鼻腔與口腔相通阻塞部分在舌尖與齒背阻塞部分在雙唇阻塞部分在舌根與硬顎

Phonetic Typology (語音的類型) Length: Japanese vowels have a characteristic distinction of the length that can be hard for non-natives to perceive and use when learning the language The word kado (corner) and kaado (card) are spectrally identical, differing in their durations Length is phonemically distinctive for Japanese Pitch: The primary dimension lacks in English Many Asia and Africa language are tonal E.g. Chinese For tonal languages, they have lexical meaning contrasts cued by pitch E.g. Mandarin Chinese has four primary tones

Phonetic Typology (cont.) Pitch: (cont.) Though English don’t make systematic use of pitch in its inventory of word contrasts, we always see with any possible phonetic effect: Pitch is systematically viewed in English to signal a speaker’s emotions, intentions and attitudes Pitch has some linguistic function in signaling grammatical structure as well

Phonetic Typology (cont.) 語(1) 音(2) 實(3) 驗(4) 室(5)

The Allophone: Sound and Context Phonetic units should be correlated with potential meaning distinctions mean /m iy n/ and men /m eh n/ However, the fundamental meaning-distinguishing sound is often modified in some systematic way by its phonetic neighbors Coarticulation: the process by which the neighbor sounds influence one another Allophone: when the variations resulting from coarticulatory processes can be consciously perceived, the modified phonemes are called allophones E.g. : p in (pin, /p ih n/) produces a notice puff (噴出) of air, called aspiration (送氣), but loses its aspiration in (spin, /s p ih n/) A vowel before a voicing consonant, .e.g., bad /d/, seems typically longer than the same vowel before the unvoiced counterpart, in this case bat /t/ Spin:編造,虛構

The Allophone: Sound and Context (cont.)

Structural Features of Chinese Language Not Alphabetic (字母的) At Least 10,000 Commonly Used Characters (字) Almost all morphemes (詞素) with their own meaning All monosyllabic Unlimited Number of Words (詞) , at Least 100,000 Commonly Used , Each Composed of One to Several Characters (字) The meaning of the word can be directly or partly related, or even completely irrelevant to the meaning of the component characters 書店，大學，和尚，光棍 Chinese is a Tonal Language 4 lexical tones, 1 neutral tone (the number is for Mandarin) Adapted from Prof. Lin-shan Lee

Structural Features of Chinese Language (cont.) About 1,335 Syllables Only (the number is for Mandarin) About 408 base-syllables if differences in tone disregarded (the number is for Mandarin) Large Number of Homonym Characters (同音字) Sharing the Same Syllable Monosyllabic Structure of Chinese Language Each syllable stands for many characters with different meaning Combination of syllables (characters) gives unlimited number of words Small number of syllables carries plurality (多重性) of linguistic information Almost Each Character with Its Own Meaning, thus Playing Some Linguistic Role Independently Adapted from Prof. Lin-shan Lee

Structural Features of Chinese Language (cont.) No Natural Word Boundaries in a Chinese Sentence 電腦科技的進步改變了人類的生活和工作方式 Word segmentation not unique Words not well defined Commonly accepted lexicon not existing Open Vocabulary Nature with Flexible Wording Structure New words easily created everyday 電 (electricity) + 腦 (brain)→電腦 (computer) Long word arbitrarily abbreviated 臺灣大學 (Taiwan University) →臺大 Name/title 李登輝總統 (President T.H. Lee) →李總統登輝 Unlimited number of compound words 高 (high) + 速 (speed) + 公路 (highway) →高速公路(freeway) Adapted from Prof. Lin-shan Lee

Structural Features of Chinese Language (cont.) Difficult for Word-based Approaches Popularly Used in Alphabetic Languages – Serious out of vocabulary (OOV) problem Considering Phonetic Structure of Mandarin Syllables INITIAL / FINAL’s Phone-like-units / phonemes Different Degrees of Context Dependency Intra-syllable only Intra-syllable plus inter-syllable Right context dependent only Both right and left context dependent

Structural Features of Chinese Language (cont.) Examples 22 INITIAL’s extended to 113 right-context-dependent INITIAL’s 33 phone-like-units extended to 145 intra-syllable right-context-dependent phone-like-units, or 481 with both intra/inter-syllable context dependency 4,606 triphones with intra/inter-syllable context dependency

Explanations 首先要整理自己的思想，決定要說的訊息內容把它們變為適當的語言形式，選擇適當的詞彙，按照某種語言的法則，組成詞句，以表達想說的訊息內容 (遣詞造句) 以生理神經式衝動的形式，言運動神經傳播到聲帶、舌唇等器官的肌肉，驅動這些肌肉運動空氣發生壓力變化，經過聲腔的調節，從而產生出通常的語言聲波

Explanations for Speech Production 人的發音器官可分三大部分動力器官：肺和氣管等呼吸器官我們大約每五秒呼吸一次，說話是在呼氣的過程中進行利用肺部呼出的氣流作為動力來激勵聲帶振動發聲器官：聲帶、喉頭及一些軟骨組織等來自肺部的穩定氣流由於喉頭的開關節制動作，因此被改變，成為聽得見的、像蜂鳴一樣的聲音。喉頭的節制動作主要依賴聲帶來完成的。聲帶是發聲體本身，為語音提供主要的聲源。聲帶振動產生的一系列的脈衝(impulses)，是一種週期波，其頻譜含有大量的諧波(harmonics)成分，它們的頻率是基頻 (fundamental frequency) 的整數倍

Explanations for Speech Production (cont.) 共鳴(共振)調節器官:口腔、鼻腔、咽腔 (統稱”聲腔”, vocal tract) 聲腔是充滿氣體的管腔，具有一定的自然頻率。當來自聲帶的脈衝之某一諧波與聲腔的某一自然頻率相同或相近時，就發生共鳴(resonance)現象，此一脈衝諧波頻率成分被加強而提起。因此，從口中輻射出的語音的頻譜在聲腔的自然頻率處就有共振峰(Formats)，它們的頻率叫做共振峰頻率發音(articulation)機制、調音機制: 指聲腔對於聲帶產生聲音的共鳴和調節作用，它與語音的音色關係極為密切聲腔變化主要是由舌的高低前後所造成的，像語音學(phonetics)常用的母音舌位圖雙唇與牙齒是唯一從外部看得見的發音器官，可以額外地為人提供許多語言交際的信息

Explanations for Speech Production (cont.) 聲腔在發母音(vowel)與發子音(consonant)時的表現發母音時聲腔裡沒有阻塞，但發子音時，聲腔的某兩個部位必定構成阻塞、阻礙，然後突然釋放被阻空氣，氣流通過從狹縫洩出或突然衝出，從而形成噪音子音的音色跟聲腔阻塞部分的不同和解除的方式的不同有直接相關

Explanations for Speech Perception 聽力形成： 1.聲音由耳翼(pinna)接收，並傳至外耳道再傳至耳膜(eardrum) 2.耳膜接收聲音的能量，並將它轉變成機械能量，所以第一個能量的轉換是從耳膜開始 3.耳膜再把機械能量，傳送到聽小骨鏈 4.鐙骨(stapes)的踏板接在卵圓窗上面，它將機械能再轉成液能，這裏是第二個能量轉換處 5.前庭階的能量會傳遞到中階，中階液體的移動，會造成柯氏器上面毛髮細胞的移動 6.中階再將液能轉為電能量，此為第三個能量轉換處。 7.毛髮細胞會刺激在柯氏器基部的神經細胞，再將這些神經訊號經由聽神經傳到腦部 8.能源轉換結論：外耳(聲能) →中耳(機械能) →內耳(液能及電能)

Speech Perception Physiology of the Ear (cont.) 外耳：耳道：是一個充滿氣體的管子，是一種共鳴器，當傳入聲波的某些頻率接近它的一套自然頻率時，就被放大的約二至四倍中耳：三小聽骨：錘骨、鉆骨、蹬骨。錘骨與鼓膜相連，蹬骨與覆蓋著卵圓窗 (oval window) 兩種主要功能：放大作用，以提高傳入內耳的聲音能量(槓桿原理) 保會內耳免受特強音的損害內耳：耳蝸：充滿淋巴液，黏度幾乎為水的兩倍，耳蝸隔膜分隔兩區，淋巴液由蝸孔自由流通兩區。耳蝸隔膜內有耳蝸導管，充滿內淋巴液。基底膜在靠近卵圓窗處，較窄、薄，繃的緊；而靠近蝸孔部分最為寬鬆肥大基底膜的這種特性，讓其能最傳入聲波不同的頻率產生響應主要功能：把外界機械動能轉換成神經衝動

Consonants (cont.) 最後再看嘴唇、舌頭跟口腔的一些關係閉唇 (labial): /p/, /b/, /m/, /w/ 舌被齒或齒與唇夾(dental or labio-dental consonants): /f/, /v/, /th/, /dh/ 舌頭前端碰齒槽(alveolar consonants): /t/, /d/, /n/, /s/, /z/, /r/, /l/ 舌頭前端碰上顎(palatal consonants): /sh, zh, y/ 舌頭後端碰軟顎(velar consonants): /k/, /g/, /ng/