視訊串流\Streaming Video Part-2-3 Compression Digital image/video

視訊串流\Streaming Video Part-2-3 Compression Digital image/video
Lossless compression Image compression Video compression

Video Compression Video compression developed in the late of 1980’s and 1990’s Image compression: JPEG, JPEG2000 Video compression: H.261, H.263, H.26L, H.264, H.265, MPEG-1, MPEG-2, MPEG-4, MPEG-7, MPEG-21 Applications Video storage: VCD, DVD Video transmission: DTV, Video on Demand (VOD), satellite, video conference, video phone…

Video Sequence A video consists of a time-ordered sequence of frames or images

Why Video Compressed Transmission and storage 720x480, 30 frame/sec
Uncompressed video: 166Mb/s 720x480, 30 frame/sec Digital TV: 4-6 Mbits/s, Requires 41 times compression ratio CD-ROM: 1.5Mbits/s Video compression requires 110 times compression ratio

Compression - the basic concepts
!!!!!!!!! Compression - the basic concepts Compression techniques lossless compression lossy compression Redundancies spatial redundancy: similarities between adjacent pixels in plain area of picture statistical redundancy: same symbols occurs frequently temporal redundancy: similarity between consecutive pictures

Temporal Redundancy Temporal redundancy: similar between the consecutive frames in a video scene Not necessary to encode each frame of a video independently Encoding the difference between the current frame and other frames in the sequence Small values and low entropy

Removing Redundancies
Removing spatial redundancy JPEG-like coding scheme Removing temporal redundancy By Motion-Compensation (MC) based coding method Motion estimation Motion vector search by deriving the minimum prediction error

動作補償預測編碼 !!!!!!!!! 方塊圖

動作估計演算法完全搜尋

完全搜尋假設目前畫面的巨方塊之像素為 C(x+k, y+l)，而參考畫面中的像素為 R(x+i+k, y+j+l)，我們定義
其中 -p≤u, v≤p 這個誤差準據一般稱為平均絕對誤差（MAE）或者平均絕對差（MAD）

完全搜尋每一個巨方塊的整體計算複雜度為 (2p+1)2 × MN × 3
假設視訊的畫面率為 F，每一張畫面的解析度為 I×J，則整體計算複雜度為完全搜尋相當費時，但是保證可以找到最小的MAE 值運算秒

Computation Complexity of Motion Estimation
!!!!!!!!! Computation Complexity of Motion Estimation High computation loading of video encoder on DCT and motion estimation Lowering computational complexity DCT Simplified computation architecture Data analysis to calculate DCT for partial coefficients Motion estimation Full search to obtain the minimum error Quick search perform non-optimal measurement result 3-step search Diamond search …

二維對數搜尋

三步驟搜尋

Historic Standards of Video Compression
Object-based coding scheme H.261, H.263, H.26L, H.264 MPEG-1, MPEG-2, MPEG-4, MPEG-7, MPEG-21 Multimedia description User interface H.264(AVC) / MPEG-4(part 10) Enhancing the coding efficiency

MPEG 視訊 Moving Pictures Expert Group (MPEG)
ISO/IEC JTC1/SC29/WG11 1988 成立 MPEG-1 (ISO/IEC 11172, 11/92) Compression standard for progressive frame-based video in SIF(360240), targeted at 1.5 Mbits/s 視訊 ~1.2 Mbps，音訊 ~250 Kbps 應用 : VCD, MP3

MPEG-2 MPEG-2 (ISO/IEC 13818, 11/94)
Compression standard for interlaced frame-based video in CCIR-601(720480) and high definition format(1920 1088), wide range of bit rates 4 to 80 Mbits/s 在 4 Mbps 左右最佳化應用 : DVD、HDTV 等

MPEG-4 MPEG-4 (ISO/IEC 14496, 10/98)
Multimedia standard for object-based video for nature or synthetic source 不同頻寬的編碼 (5 Kbps ~270 Mbps) 應用: 網際網路、有線電視、 3G 無線通訊等

MPEG-7 & MPEG-21 MPEG-7 (ISO/IEC 15938, Sept 2001) MPEG-21
Multimedia content description interface 應用:網際網路、視訊搜尋引擎、數位圖書館 MPEG-21 E-commerce 只規範位元串語法與解碼器

MPEG-1 應用範圍輸入視訊一般為 Y(720×480)、CbCr (360×480)
交談式多媒體應用、CD-ROM之儲存、電影（VCD）、KTV、以及購物等輸入視訊一般為 Y(720×480)、CbCr (360×480) 處理的程序是先將 Y、 Cb、Cr 經過次取樣成為 SIF 的格式：Y(360×240)、 CbCr (180×120)，然後才做編碼輸入視訊一般為每秒 30 張畫面的訊號

MPEG-1 的 Parts ISO/IEC 11172-1：系統 ISO/IEC 11172-2：視訊

參數一般規格參數強制參數畫面的解析度可以高達 4096 × 4096 長寬比：14 種選擇
一般是 360 × 240 長寬比：14 種選擇畫面率： , 24, 25, 29.97, 30, 50, 59.94, 60 4:2:0 (與 H.26x一樣) 強制參數要跟 MPEG-1 相容，必須至少做到的所有與 MPEG-1 相容的解碼器至少必須能解碼符合強制參數集的位元串

MPEG-1 的三種畫面 MPEG-1 採用 3 種方式來壓縮一張畫面：I 畫面、P 畫面、B 畫面
I 畫面的編碼方式是採用類似於J PEG DCT 的處理方式它並不考慮與其他畫面間的關係，所儲存的是一張完整的畫面 P 畫面是利用前面的 I 或 P 畫面為參考畫面做前向的動作補償編碼畫面中不動的部分就不儲存，只儲存不一樣的部分 B 畫面的原理和 P 畫面一樣，只不過畫面可以參考前面的畫面，也可以參考後面的畫面

Example for Bidirectional Prediction
I-frame B-frame P-frame

畫面種類與巨方塊(MB; MacroBlock)種類
I 畫面(Intra-coding frame) 全部是 I 巨方塊隨機取得 FF/FR P 畫面(Predictive coding frame) P 巨方塊 I 巨方塊跳過 B 畫面(Bidirectional predictive coding frame) B 巨方塊前向預測逆向預測雙向預測 I 巨方塊跳過

編碼器之方塊圖 Removing spatial redundancy statistical redundancy
temporal redundancy

I-frame (Intra-coding frame)
巨方塊：四個 8×8 的 Y 方塊、一個 8×8的 Cb 方塊、及一個 8×8 的 Cr方塊編碼的方法與 JPEG 類似和 JPEG 略有不同的是：MPEG-1 所使用的 Huffman表都是固定的，不像 JPEG 有幾個 Huffman表可以選擇

P-frame (Predictive coding frame)
順向動作補償編碼

B-frame (Bidirectional predictive coding frame)
做兩次的動作估計，一次是針對過去畫面，另一次是針對未來畫面共產生兩個動作向量雙向動作補償優點壓縮效率高無錯誤傳遞問題缺點記憶體延遲

畫面順序與位元率分佈通常如果畫面率是 30 而且 GOP 的安排是 IBBPBBPBBPBBPBBIBB…，那麼
I 畫面：156 Kb/p P 畫面：62 Kb/p B畫面：15 Kb/p 如果畫面率是 30 而且 GOP 的安排是 IBBPBBPBBPBBPBBIBB…，那麼每秒有 I 畫面兩張、P 畫面八張、與 B 畫面二十張整個 MPEG-1 系統的視訊位元率為 : 1562+628+1520 = = 1,108Kbps  1.1Mbps

解碼器編碼器反相程序編碼誤差發生於解量化過程移動補償動作(motion compensation)

位元率控制 (bit-rate control)
調整圖量化 DCT 係數的量化區間值 MPEG-1 標準允許編碼器針對每一個編碼巨方塊選擇不同的量化區間值可以針對不同巨方塊的複雜度與視覺重要性適當地分配位元可以讓我們選擇是要固定位元率（CBR）還是非固定位元率（VBR）暫存器 (frame buffer) : CBR  VBR

位元率控制暫存器夠大可以避免發生滿溢的情況但是，價格上的考量不利於大暫存器的使用愈大的暫存器意味著愈大的延遲
MPEG 定義了一個所有解碼器實作必須支援的最小暫存器容量它的容量等於一個編碼器可以用來產生一個位元串的最大可能暫存器值

MPEG-1 視訊之位元串定義六個layer位元串語法序列GOP畫面切片(slice)MB方塊

MPEG-2 ISO/IEC 13818-2 (or ITU-T H.262) 廣播電視、有線/衛星電視、 HDTV 等
4~9 Mbits/s、交錯視訊、以及可調式編碼 Parts of MPEG-2 ISO/IEC ：系統 ISO/IEC ：視訊 ISO/IEC ：音訊 ISO/IEC ：相容測試 ISO/IEC ：軟體 ISO/IEC ：DSM-CC ISO/IEC ：NBC 音訊 ISO/IEC ：即時介面 ISO/IEC ：DSM-CC 相容

MPEG-2 的 Parts ISO/IEC 13818-1：系統 ISO/IEC 13818-2：視訊
ISO/IEC ：DSM-CC ISO/IEC ：NBC 音訊 ISO/IEC ：即時介面 ISO/IEC ：DSM-CC 相容

MPEG-2 MPEG-2 的壓縮位元串格式分為兩類：單純從演算法的觀點來看 MPEG-1 與MPEG-2，這兩者其實是一樣的
可調的格式，這個格式的壓縮位元串允許解碼器選擇不同的訊號品質等級播出單純從演算法的觀點來看 MPEG-1 與MPEG-2，這兩者其實是一樣的規格之一：MPEG-2 必須與 MPEG-1 有最大的互動性與相容性大部分的不同都直接或間接地源自於輸入格式的不同

MPEG-1 與 MPEG-2 之編碼參數

交錯畫面 MPEG-2 的輸入畫面可以是交錯畫面或者非交錯畫面，而 MPEG-1 則只接受非交錯畫面
交錯畫面指的是一張畫面分兩次送，每一次送一張場畫面 MPEG-1 因為不接受交錯畫面而必須在編碼前先將電視訊號轉換成非交錯畫面

漸進式與交錯式掃瞄 (progressive and interlaced scanning)
Progressive scanning: display moving images with the lines of each frame drawn in sequence Interlaced scanning: display a frame by tracing odd-numbered lines first and then the even-numbered lines Odd field and even field Two fields make up one frame

解交錯畫面例子(de-interlaced)
Subject of interlaced scanning is to reduce the bandwidth required for video transmission Anti-aliasing occurred when merging the two fields to a frame

輸入視訊格式 MPEG-2 的主要輸入視訊格式 CCIR 601又分成4:2:0、4:2:2、及 4:4:4 三種次取樣格式

預測模式與動作補償

預測模式與動作補償動作向量都使用半像素精確度支援另外兩種動作補償模式：第一種一般稱為 16×8 動作補償模式，雙首位動作補償
將一個 16×16 的巨方塊視為上下兩個 16×8 的矩形。每一個 16×8 的矩形分別獨立地做動作補償 16×8 動作補償模式只能在場畫面中使用雙首位動作補償只能用在 P 畫面，以及在 P 畫面與參考畫面間沒有 B 畫面的 GOP 上

雙首位動作補償利用兩個動作向量所預測出的兩個場巨方塊，將這兩個場巨方塊的平均值做為最後我們要的場巨方塊預測值

檔案與階級

MPEG-2 與 MPEG-1 的其他不同 MPEG-2 可以接受不同長寬比的輸入視訊
輸入視訊如果是 CCIR 601 的交錯畫面，則通常在位元率為 4~9 Mbps 時可以得到最佳的視訊品質一律採用半像素動作補償使用了新的 DCT 係數量化選項以及另外一種鋸齒形掃描，畫面品質因此得到了改善壓縮位元串採用可調的格式 MPEG-2 被採納為高品質電視的壓縮演算法

MPEG-1/2 編碼結果統計

MPEG-4 之前標準可以做的： MPEG-1：以非交錯畫面為基礎的視訊編碼 (1.5 Mbps)
MPEG-2：以（非）交錯畫面為基礎的視訊編碼 (4 Mbps ~ 270 Mbps) H.261：低位元率視訊會議編碼 (64p Kbps) H.263：超低位元率視訊會議編碼 (10 Kbps)

MPEG-4 之前標準不能做的：以視訊的內容資訊（metadata）來編碼視訊物件
配合不同的頻寬與媒體（5 Kbps~270 Mbps）編碼多媒體資訊互動性

MPEG-4：主要功能以內容為基礎的互動性普遍的存取壓縮

!!!!!!!!! MPEG-4：主要功能以內容為基礎的互動性將一張畫面視為物件的組合不同物件可以用不同的編碼方法做壓縮
不是像素或移動中的方塊之組合物件指的可以是一部車子、一段音樂、文字物件可以是方形、也可以是任意形狀可以是自然的、也可以是合成的可能是二維的、也可能是三維的不同物件可以用不同的編碼方法做壓縮在解碼器則有一個組合器負責將所有的物件再組合成重建畫面

MPEG-4：主要功能普遍的存取適合於各種應用它還意味著以內容為基礎之可調性包括有線網路與無線網路因此有可能發生嚴重的錯誤
可以視情況彈性地調整畫面內容、品質、以及複雜度

MPEG-4：主要功能壓縮在相同的位元率下，MPEG-4 可以得到比之前的任何一個視訊編碼標準都還要好的視覺品質它的位元率可高可低
它的位元率可以低到 5~64 Kbps 以配合行動通訊的需要也可以高到 20 Mbps 以配合電視、電影的需要可以編碼多個同步視訊，例如立體視訊

視訊物件平面（VOP）

視訊物件（VO）將影片中所有屬於同一個實際物件的連續 VOP 集合起來我們稱之為視訊物件（VO）

切割的方法：分開的與重疊的 = +

切割的方法線上（即時）切割與離線（非即時）切割自動切割與半自動切割（有人員介入）視訊會議一般的視訊
即時且自動切割；一般的視訊離線及半自動切割通常離線及半自動切割都可以得到比即時且自動切割更好的切割效果，但是比較費時

物件之個別處理切割後，每一個 VO 除了可以各自編碼外，還允許各自做一些處理。例如可以改變一個 VOP 的位置
等

各別處理 VO 之例

VOP 之描述二元平面

MPEG-4 視訊解碼器

MPEG-4 視訊解碼器形狀編碼動作向量估計及紋理編碼

VOP 之編碼 VOP 的形狀先用一個長寬都是 16 整倍數的最小矩形圍起來並切割成 1616 的方塊，BAB

形狀編碼二元  平面提供給解碼器的是在某一個時間點的 VOP 形狀灰階  平面是二元  平面的推廣一般常用位元圖來表示
除了提供形狀資訊外也提供透明度資訊使用八個位元來表示每一個像素值

動作向量估計與補償工具 VOP 可以完全不參考其他 VOP：I-VOP VOP 可以利用另外一個剛剛解碼出來的VOP 預測得到：P-VOP
VOP 可以利用過去與未來的 VOP 一起預測得到：B-VOP

動作向量估計與補償工具

填補工具外插

填補工具

紋理編碼工具 I-VOP 以及做動作補償編碼後所產生之誤差都用 88 的 DCT 做編碼
編碼的方法與 MPEG-1、MPEG-2、H.261、及 H.263 所使用的方法類似如果 88 的方塊跨越 VOP 的邊界，那麼必須先做填補動作補償編碼後所產生的誤差方塊－填補的方法是補 0 I-VOP 方塊－填補的方法是低通外插法

紋理編碼工具

靜態全景編碼工具全景指的是在同一個場景裡鏡頭所掃描過的所有背景之聯集只要全景有了，任何時間的背景就可以用影像處理中的扭曲與切割技術取得

靜態全景編碼工具

MPEG-4 之合成物件編碼由電腦繪圖與動畫軟體所產生合成物件與自然物件或天然場景結合播出二維網格化編碼三維模式化編碼

二維網格化物件編碼

二維網格化物件編碼二維網格幾何編碼均勻網格 Delaunay 網格

二維網格化物件編碼二維網格的動作編碼不管是在均勻或 Delaunay 網格裡的每一個MOP 三角形，它的動作都是以它的三個頂點的動作向量來描述

二維網格化物件編碼二維物件動畫網格紋理的對映－它必須負責將所對應的參考網格三角形內之紋理做扭曲處理以產生目標網格三角形內的紋理仿射轉換

二維網格化物件編碼二維網格物件之編/解碼器其中視訊編碼器提供網格物件的紋理

二維網格化物件編碼一個 VOP 的網格表示，以及經過網格編碼所得之 MOP

三維模式化編碼臉部物件的三維模式化編碼身體物件的三維模式化編碼 (Ver. 2)

臉部物件的三維模式化編碼中性臉孔

臉部物件的三維模式化編碼臉部的特徵點

臉部物件的三維模式化編碼定義六種基本的臉部表情包括喜、怒、哀、怕、驚訝、以及討厭

MPEG-4 的檔案與階級簡單檔案這個檔案可以解碼不使用任何附錄選項的H.263位元串 I 與 P VOP AC/DC預測四個動作向量
無限制動作向量切片同步資料切割（data partition RVLC（reversible VLC）這個檔案可以解碼不使用任何附錄選項的H.263位元串 A B C X

MPEG-4 的檔案與階級簡單可調檔案進階即時簡單檔案進階簡單檔案簡單檔案上加入 1/4-像素動作補償總體動作補償 B-VOP

MPEG-4 的檔案與階級精細顆粒可調性檔案核心檔案主檔案 N-位元檔案核心可調檔案進階編碼效率檔案簡單工作室檔案
核心工作室檔案另外還針對臉部、身體、以及網格動畫定義了許多檔案

!!!!!!!!! MPEG-7 與 MPEG-21 MPEG-7 在第 12 章詳細介紹
它的官方名稱為 multimedia content description interface 描述多媒體內容的一個介面標準以內容為主的搜尋引擎在第 12 章詳細介紹

H.261 ITU-T Study Group 15, An earlier digital video compression standard Using motion-compensation-based compression, which is very much adopted in all later video compression standards Designed for videophone, video-conferencing, and other audiovisual services over ISDN telephone lines Videophone and video conferencing (failed on application) Low bit rates and low delay Originally for m×384 kbits/s (m=1...5), changed to p×64 kbits/s (p = ) in 1988 Also called “p×64” 40 kbits/s to 2 Mbits/s

Video Format Supporting QCIF and CIF formats Color components
Chroma subsampling 4:2:0 + 16 MacroBlock (MB) Luminance, Y Chrominance, Cb and Cr 8

Types for Encoding Frames
Two types of image frames defined Intra-frame (I-frame) Treated as independent images Using only information within the frame for encoding Applying a transform coding similar to JPEG Only performing spatial redundancy removal Inter-frame (P-frame) Using information from the current frame and the frames already been encoded Encoded by a forward predictive coding method in which current MBs are predicted from similar MBs in the preceding I- or P-frame Removing temporal redundancy

Frame Sequence Variable interval between pairs of I-frames, which being defined by the encoder

Encoding Block Diagram
Block diagram for general H.261

I-frame Coding MacroBlocks (MBs)
16×16 pixels for Y frame 8×8 pixels for Cb and Cr frame since Consists of four Y blocks, one Cb, and one Cr, 8×8 blocks (total six 8×8-pixel blocks) Applying DCT, Quantization and Entropy Coding to each 8×8 block

P-frame (Predictive) Coding
After prediction, a difference MB being derived to measure the prediction error Motion vector being also coded

Motion Estimation Difference between two MBs measured by their Mean Absolute Difference (MAD) or Sum of Absolute Difference (SAD) To find a vector (i , j) as the motion vector MV=(u , v), such that MAD(i , j) being minimum

Computation Complexity of Motion Estimation
High computation loading of video encoder on DCT and motion estimation Lowering computational complexity DCT Simplified computation architecture Data analysis to calculate DCT for partial coefficients Motion estimation Full search to obtain the minimum error Quick search perform non-optimal measurement result 3-step search Diamond search …

H.261 Decoder

H.263 An improved video coding standard for video conferencing and other audio-visual services transmitted on Public Switched Telephone Networks (PSTN) Aiming at low bit-rate communications at bit-rates of less than 64 kbps Similar to H.261 to reduce temporal redundancy by predictive coding for inter-frames to reduce spatial redundancy by transform coding for the residual signal

Functional Block of B-frame Coding

Group of Picture (GOP) To avoid propagation from prediction and transmission errors

Encoding Vs Display Order

Slices Instead of GOBs in H.261, an MPEG-1 picture can be divided into one or more slices Slices containing variable numbers of MBs in a single picture Slice encoded independently for error recovery

Rate Control One tool to control bit allocation for the encoded frames
P- and B- frame using fewer bits than an I-frame Encoder producing a variable-rate stream to go into a buffer, and a constant transmission rate empty the buffer Buffer underflow and overflow A measure of buffer fullness to control the quantization scale factor to adjust size of the encoded stream

Scalable Coding Scalable coding (also known as layered coding)
Able to define a base layer and one or more enhancement layers Obtaining basic video quality from encoding and decoding base layer based on the base layer to encode and decode the enhancement layers Applications Applied on networks with very different bit-rates Applied on networks with noisy connections Applied on networks with variable bit rate (VBR) channels

SNR Scalability Refers to the enhancement/refinement over the base layer to improve the signal-noise-ratio (SNR) Base layer employing a coarse quantization to the DCT coefficients to result in fewer bits and a low quality vide Enhancement layer finely quantizing the DCT coefficients

Spatial Scalability Base layer to generate bit-stream of reduced-resolution pictures Producing pictures of original resolution by adding the enhancement layer

Temporal Scalability

Applications Not specifying how to implement the encoder and decoder
Architecture simplification Computation reduction Applications Error resilience Data embedding-watermarking Transcoding

Error Resilience Applied to multimedia communications to combat bit errors and packet loss Two categories: Error concealment To minimize the effect of error to the bitstream Resynchronization and Data recovery To localize the error and recover the lost data as much as possible With/without error resilience

Digital Watermarking Embedding a signal into digital data (audio, video, images and text) that could be detected or extracted later Applications Copyright protection Fingerprinting

Transcoding To convert a previously compressed video signal into another one with different format, such as different bit rate, frame rate, frame size, or even compression standard decoder Video source encoder_01 encoder_0n encoder_03 encoder_02 . user group_01 user group_02 user group_03 user group_0n Transcoder encoder Parameter setting

MPEG-4 Content-Based Video Coding

Applying MPEG-4 Dying of MPEG-4 Content-Based Video Coding…
Weather reporting Digital learning … video objects segmentation & object mask separate decoding content-based bitstream access & manipulation scalability layered encoding VOP1 VOP2 VOP3 bitstream (VOP3) (VOP2) (VOP1) Dying of MPEG-4 Content-Based Video Coding…

H.264/AVC (MPEG-4 Part 10) 4x4-pixel Integer Transform
Intra prediction Inter prediction Multiple reference frames Deblocking filter Good coding efficiency with using high computation power

Memory Reduction for Storing Multi-frames in H.264
Adopting Multiple reference frames to motion estimation/compensation Bits increased for representing MVs Computational complexity increased for motion estimation Memory cost increased for storing multi-frames Overcame by high-speed processor and simplified algorithm Compensated by effective coding Reducing the memory requirement

Storage-Type Determination in H.264 Decoder

What We Do to Image and Video Compressions
An example of MPEG-21

視訊串流\Streaming Video Part-2-3 Compression Digital image/video

Similar presentations

Presentation on theme: "視訊串流\Streaming Video Part-2-3 Compression Digital image/video"— Presentation transcript:

Similar presentations

About project

反馈

请登录

Auth with social network:

視訊串流\Streaming Video Part-2-3 Compression Digital image/video

Similar presentations

Presentation on theme: "視訊串流\Streaming Video Part-2-3 Compression Digital image/video"— Presentation transcript:

Similar presentations

About project

反馈