Chapter 6 Basics of Digital Audio 取樣定理 頻域轉換 for 定理證明、濾波處理 濾波器使用測試 格式 & 傳輸儲存方法 Chapter 6 Basics of Digital Audio 6.1 Digitization of Sound 6.2 MIDI: Musical Instrument Digital Interface 6.3 Quantization and Transmission of Audio 6.4 Further Exploration
Issues (modified outline) 數位化 格式 1.取樣 2.量化 5.記錄 6.傳輸 F.T. 4.濾波 3.合成 辨識 偵測 處理
What is Sound A wave phenomenon like light Molecules of air being compressed and expanded under the action of some physical device pressure wave continuous values (before digitized) reflection (反射) refraction (折射) diffraction (繞射)
Interesting Titbits Typical Sampling Rates = 8k / 48k Hz Human voice up to 4K Hz. Human ear can hear 20 ~ 20K Hz. Nyquist Sampling Rate (later) Musicology/ Octave/ Harmonics: note “A” (La) within middle C is 440 Hz. Octave above is another A note doubling the frequency, i.e., 880 Hz. any series of musical tones whose frequencies are integral multiples of the frequency of a fundamental tone.
Issues 數位化 格式 1.取樣 2.量化 5.記錄 6.傳輸 F.T. 4.濾波 3.合成 辨識 偵測 處理
Orthogonality (正交性) W1 W2 W W5 F G W3 W q W4 x= v0 cos(q) t – W/m t2 兩個分量其內積(1-by-1相乘相加)為零 無法再分解出投影在對方成分上的係數值 y= v0 sin(q) t – g t2 可用來投影、觀察、或數個物件進行加總、求平衡…
Signal Decomposition Signals can be decomposed into a sum of sinusoids
Orthogonality of Trigonometric Funcs. 三角函數的正交性
Euler-Fourier Formula [Proof: ak] 兩邊同時乘 cos(kx) 再逐項積分[-p,p] 意義: 依頻訊號強度
Fourier Series (複數型式的數列) 展開加項k, 去掉相乘之時域積分值為0 的項目 此處ak 是個複數參數(具雙部)
ak bk (不含負號與虛數 j) 結論:兩組轉換式相同
Fourier Transform (Rad) 把係數抽出來,不必 執著於等式的展開, 可以正/逆轉換即可。
Fourier Transform (Hz) w: 每秒相角轉幾弧度? u: 每秒振動幾次(轉幾圈)?
Basic Properties Time Domain Frequency Domain f(t) F(u) g(t) + h(t) G(u) + H(u) g(t) × h(t) G(u) × H(u) G(u) × H(u) d (t – T ) d (u - 1/T) 可見”帶通濾波”在”時域(time domain)”有多難處理 Demo
Issues 數位化 格式 1.取樣 2.量化 5.記錄 6.傳輸 F.T. 4.濾波 3.合成 辨識 偵測 處理
Issues for Digital Audio Data What is the sampling rate? How finely is the data to be quantized, and is quantization uniform? How is audio data formatted? (file format)
Digitization Quantization Sampling
Nyquist Theorem (1924) Harry Nyquist (1889-1976) If a signal is band-limited, i.e., there is a lower limit f1 and an upper limit f2 of frequency components in the signal Sampling rate should be at least 2(f2 – f1). Usually, f1 is “0”.
Time Domain Observation
Alias Frequency Sampling at 1.5 times per cycle produces an alias perceived frequency
Nyquist Rate
Fourier Transform (example)
Fourier Transform (Hz) recall Fourier Transform (Hz) w: 每秒相角轉幾弧度? u: 每秒振動幾次(轉幾圈)?
Basic Properties Time Domain Frequency Domain f(t) F(u) g(t) + h(t) recall Basic Properties Time Domain Frequency Domain f(t) F(u) g(t) + h(t) G(u) + H(u) g(t) × h(t) G(u) × H(u) G(u) × H(u) d (t – T ) d (u - 1/T) 可見”帶通濾波”在”時域(time domain)”有多難處理 Demo
Basic Properties (Cont.) Time Domain Frequency Domain g(t) × h(t) G(u) × H(u) d (t – T ) d (u - 1/T) Convolution 中譯: 疊代 or 旋積 Impulse Function 中譯:沖激函數
Sampling Rate Time Domain Frequency Domain g(t) + h(t) G(u) + H(u) d (t – T ) d (u - 1/T) T 1/T
Fourier Spectrum f(t) | F(u) | fs(t) = f(t).s(t) Fs(u) = F(u) × S(u) umax f(t) fs(t) fs(t) = f(t).s(t) Qu: what about T0 ? 1/T |Fs(u)| Fs(u) = F(u) × S(u) umax usampling
Nyquist Theorem (freq. Domain) umax 1/T 2/T =usampling 取樣頻率不到二倍 頻譜間格就不夠寬 -1/T 1/T 2/T umax =usampling
Nyquist Theorem (freq. Domain) 原本是兩個紅色peaks, 但取樣 複製出綠色peaks 而被誤解 如果是一段頻域如三角形所示,因複製干擾,則會產生Aliasing(串音) -1/T umax 1/T 2/T -1/T 1/T 2/T umax =usampling
Issues 數位化 格式 1.取樣 2.量化 5.記錄 6.傳輸 F.T. 4.濾波 3.合成 辨識 鑑定 處理
Issues for Digital Audio Data What is the sampling rate? How finely is the data to be quantized, and is quantization uniform? How is audio data formatted? (file format)
Signal to Noise Ratio (SNR) A measure of the quality of the signal. In units of dB (decibel), 10dB= 1 bel Base-10 logarithms of the Ratio of (the power of the correct signal) and (the power of the noise) Note: P=V2/R The higher the better
dB Applied to Common Sounds A ratio to the quietest sound The quietest sound capable of hearing i.e. the just audible sound with frequency 1KHz Def. 10-5 N/m2 The lower the better
環保署噪音管制標準(1020065143號修正)
微軟消音室「-20.3分貝」 全球最安靜 美國華盛頓州瑞蒙市微軟總部87號大樓 獲認2015年金氏世界紀錄 -- 負20.3分貝 2015-10-18 世界日報 美國華盛頓州瑞蒙市微軟總部87號大樓 獲認2015年金氏世界紀錄 -- 負20.3分貝 接近地球上可能達到的最安靜極限負23分貝 空氣分子彼此碰撞製造的噪音強度 訓練太空人適應太空的「安靜環境」 讓人產生幻覺和失去方向感,甚至站不穩 安靜到讓人受不了,熬最久的人只停留了45分鐘 聽到自己的心跳,甚至聽到肺部的聲音,以及肚子裡東西流動的聲音,自己變成了噪音來源
Signal to Quantization Noise Ratio SQNR, Quantization noise = round-off error Let quantization accuracy = N bits per sample The worst case SQNR = 6.02 N (dB) input signal is sinusoidal, the quantization error is statistically independent, SQNR = 6.02 N + 1.76 (dB) SNR (SQNR) > 70 Can be acceptable in general, i.e., We need N > 12
Linear and Non-linear Quantization Linear format: samples are typically stored as uniformly quantized values. Non-uniform quantization: set up more finely-spaced levels where humans hear with the most acuity. Weber's Law stated formally says that equally perceived differences have values proportional to absolute levels: Δresponse ∞ ΔStimulus / Stimulus (6.5)
Nonlinear Quantization Transforming an analog signal from the raw s space into the theoretical r space, and then uniformly quantizing the resulting values quantization of r giving finer resolution in s at the quiet end Called m-law encoding, (or u-law). A very similar rule, called A-law used in telephony in Europe.
Equations of u-law and A-law (6.9) (6.10)
Nonlinear Transform for audio signals Fig 6.6 音量較低的訊號 在量化過程中 被 “放大” 檢視
Data rate and bandwidth in sample audio applications Table 6.2 Bytes x 1/8 [1,2,6] 1/2 , “>=”
AM vs FM
Issues 數位化 格式 1.取樣 2.量化 5.記錄 6.傳輸 F.T. 4.濾波 3.合成 辨識 鑑定 處理
Synthetic Sounds 1. FM (Frequency Modulation): x(t) = A(t) cos[ M(t) ] one approach to generating sound: x(t) = A(t) cos[ M(t) ] 2. Wave table or wave sound A more accurate way of generating sounds from digital signals.
Issues 數位化 格式 1.取樣 2.量化 5.記錄 6.傳輸 F.T. 4.濾波 3.合成 辨識 鑑定 處理
Digital Filter DEMO Homework? DFT/DCT (see DFTDCT.ppt)
Issues 數位化 格式 1.取樣 2.量化 5.記錄 6.傳輸 F.T. 4.濾波 3.合成 辨識 鑑定 處理
WAV File Format ‘RIFF’ 4 bytes RIFF file identification (Resource Interchange File Format) <length> Length field (afterwards) ‘WAVE’ WAVE chunk identification ‘fmt’ Format sub-chunk identification flength Length of format sub-chunk (afterwards) format 2 bytes Format specifier (Linear-quantization PCM = 1) Chans Number of channels sampsRate Sampling rate in Hz Bpsec Bytes per second = sampsRate x Bpsample Bpsample Bytes per sample = chans x bpchan/8 bpchan bits per channel ‘data’ Data sub-chunk identificatoin dlength Length of data sub-chunk (afterwards) Values Digital Audio Data … Other possible data chunk in the tail
Binary Code (Sec1.wav) Dlength=(001A6904)h= 1730820 =1730904 -44 -40 <length>=(001A6950)h= 1730896 = 1730904 -8 flength=(00 00 00 10)h=16 format = (00 01)h = 1 … PCM chans = (00 01)h = 1 sampsRate = (00 00 AC 44)h = 44100 Bpsec = (00 00 AC 44)h = 44100 Bpsample = (00 01)h = 1 bpchan = (00 08)h = 8 檔頭到dlength 欄位結束 共44 bytes, 檔尾40 bytes
Binary Code (Sec2.wav) <length>=(0059EBA8)h= 5893032 = 5893040 -8 Dlength=(0059EB5C)h= 5892956 =5893040 -44 -40 flength=(00 00 00 10)h=16 format = (00 01)h = 1 … PCM chans = (00 02)h = 2 sampsRate = (00 00 AC 44)h = 44100 Bpsec = (00 02 B1 10)h = 17640 Bpsample = (00 04)h = 4 bpchan = (00 10)h = 16 檔頭到dlength 欄位結束 共44 bytes, 檔尾40 bytes
(break)
Issues 數位化 格式 1.取樣 2.量化 5.記錄 6.傳輸 F.T. 4.濾波 3.合成 辨識 鑑定 處理
Coding of Audio Pulse Code Modulation: PCM (脈碼調變) The basic coding method Producing quantized sampled output for audio The differences version: DPCM (差值脈碼調變) A crude but efficient variant (delta): DM. The adaptive version: ADPCM. Example: WAV 是一種 PCM 編碼 Skype 採用 ADPCM, 32kbps
Pulse Code Modulation: PCM Original analog signal & corresponding PCM signals. (b) Decoded staircase signal. (c) Reconstructed signal after low-pass filtering. Fig 6.13
PCM in Telephony System 如果有所謂的壓縮 (Compression) 其實是指 Nonlinear Quantization 8-bit, 8 kHz 64 kbps
Coding of Audio Pulse Code Modulation: PCM (脈碼調變) The basic coding method Producing quantized sampled output for audio The differences version: DPCM (差值脈碼調變) A crude but efficient variant (delta): DM. The adaptive version: ADPCM. Example: WAV 是一種 PCM 編碼 Skype 採用 ADPCM, 32kbps
Three-Stages Compression Every compression scheme has three stages: (A) The input data is transformed to a new representation that is easier or more efficient to compress. (B) We may introduce loss of information. Quantization is the main lossy step we use a limited number of reconstruction levels, fewer than in the original signal. (C) Coding. Assign a codeword (thus forming a binary bitstream) to each output level or symbol. This could be a fixed-length code, or a variable length code such as Human coding (Chap. 7). DPCM (next page) e.g. Hoffman code
Example: DPCM codec module B C A
Huffman Code (Lossless Compression) Symbol @ # $ & Frequency 1/8 1/4 1/2 Original Encoding 00 01 10 11 2 bits Huffman Encoding 110 111 3 bits 1 bit Expected length Original 1/82 + 1/42 + 1/22 + 1/82 = 2 bits / symbol Huffman 1/83 + 1/42 + 1/21 + 1/83 = 1.75 bits / symbol
Huffman Tree Construction 1 B C D E A 2 5 8 7 3
Huffman Tree Construction 2 D E A B 5 8 7 3 2 5
Huffman Tree Construction 3 D E A B 8 7 3 2 C 5 5 10
Huffman Tree Construction 4 D E A B 8 7 3 2 C 15 5 5 10
Huffman Tree Construction 5 010001110101110001 =DEDBCAED A B 3 2 E = 00 D = 01 C = 10 B = 110 A = 111 C D E 1 5 8 7 5 1 1 15 10 Average Length: 3x3/25 +3x2/25 +2x5/25 + 2x8/25 +2x7/27 = 2.2 (bits) 1 25
Differential Coding of Audio Audio is often stored not in simple PCM Instead in a form that exploits differences – which are generally smaller numbers, so offer the possibility of using fewer bits to store. (6.12) 最簡單的預估公式
Histogram of digital speech signal Signal Values v.s. Signal Differences Fig 6.15
Predictive Coding f0=f1, e0=0
Problem in Predictive Coding f0=f1, e0=0 ?!
DPCM codec module 重建 (訊號) 引入 Quantization 已不是 lossless 必須用重建的訊號預估 而不可用真實訊號 真實 預估 重建 (訊號)
DPCM Formulae (6.16) "^" hat (預估) "~" tilde (重建)
Example (DPCM, formulae) Let Quantization Steps Be { … -24, -8, 8, 24, 40, 56, …}
Example (DPCM, results) (2) (3) (1) 130 Encoder: (1) (2) (3) Decoder: (1) (3)
DM (Delta Modulation) Formulae (6.21)
Example (DM, results) ~ k=4, f1=f1=10
ADPCM codec module
End of Chap #6