Ten Years of Pedestrian Detection, What Have We Learned?

Slides:



Advertisements
Similar presentations
高考英语阅读分析 —— 七选五. 题型解读: 试题模式: 给出一篇缺少 5 个句子的文章, 对应有七个选项,要求同学们根据文章结构、 内容,选出正确的句子,填入相应的空白处。 考查重点: 主要考查考生对文章的整体内容 和结构以及上下文逻辑意义的理解和掌握。 (考试说明) 选项特点: 主旨概括句(文章整体内容)
Advertisements

期末考试作文讲解 % 的同学赞成住校 30% 的学生反对住校 1. 有利于培养我们良好的学 习和生活习惯; 1. 学生住校不利于了解外 界信息; 2 可与老师及同学充分交流有 利于共同进步。 2. 和家人交流少。 在寄宿制高中,大部分学生住校,但仍有一部分学生选 择走读。你校就就此开展了一次问卷调查,主题为.
高中英语教材分析与教学建议 福建教育学院外语研修部特级教师:周大明. 课程目录  一、理论创新与教材发展  二、现行教材的理论基础和编写体系  三、图式理论与 “ 话题教学 ”  四、课例分析与教学建议.
第十章 產品訂價: 訂價的考量與策略.
刘立明 江南大学生物工程学院环境生物技术室
牙齒共振頻率之臨床探討 論 文 摘 要 論文名稱:牙齒共振頻率之臨床探討 私立台北醫學院口腔復健醫學研究所 研究生姓名:王茂生 畢業時間:八十八學年度第二學期 指導教授:李勝揚 博士 林哲堂 博士 在口腔醫學的臨床診斷上,到目前為止仍缺乏有效的設備或方法可以評估或檢測牙周之邊界狀態。臨床上有關牙周病的檢查及其病變之診斷工具,
专题八 书面表达.
如何在Elsevier期刊上发表文章 china.elsevier.com
自衛消防編組任務職責 講 義 This template can be used as a starter file for presenting training materials in a group setting. Sections Right-click on a slide to add.
雅思大作文的结构 Presented by: 总统秘书王富贵.
B型肝炎帶原之肝細胞癌患者接受肝動脈栓塞治療後血液中DNA之定量分析
分析抗焦慮劑/安眠劑之使用的影響因子在重度憂鬱症及廣泛性焦慮症病人和一般大眾的處方形態
一个独特智库的要素 Arthur Hanson.
Chapter 8 Liner Regression and Correlation 第八章 直线回归和相关
Chaoping Li, Zhejiang University
Academic Year TFC EFL Data Collection Outline 学年美丽中国英语测试数据收集概述
透過成長在矽基板上的氧化緩衝層與嵌入式氧化釔分佈布拉格反射鏡去增強氮化鎵系列的uv光檢測器響應
XI. Hilbert Huang Transform (HHT)
Leftmost Longest Regular Expression Matching in Reconfigurable Logic
A Question Answering Approach to Emotion Cause Extraction
深層學習 暑期訓練 (2017).
Euler’s method of construction of the Exponential function
Homework 4 an innovative design process model TEAM 7
Unit 4 I used to be afraid of the dark.
What water is more suitable for nurturing the goldfish
Improving classification models with taxonomy information
Thinking of Instrumentation Survivability Under Severe Accident
指導教授:許子衡 教授 報告學生:翁偉傑 Qiangyuan Yu , Geert Heijenk
毕业论文报告 孙悦明
NLP Group, Dept. of CS&T, Tsinghua University
Good Afternoon. 威世智中国员工及 组织化赛事介绍 WotC China Staff and Organized Play Instruction Shane Xu
教師的成長 與 教師專業能力理念架構 教育局 專業發展及培訓分部 TCF, how much you know about it?
The Empirical Study on the Correlation between Equity Incentive and Enterprise Performance for Listed Companies 上市公司股权激励与企业绩效相关性的实证研究 汇报人:白欣蓉 学 号:
Logistics 物流 昭安國際物流園區 總經理 曾玉勤.
创建型设计模式.
5.3 USE OF PREVIOUS RESEARCH
971研究方法課程第九次上課 認識、理解及選擇一項適當的研究策略
作者: DALE GOODHUE 來源: JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION
塑膠材料的種類 塑膠在模具內的流動模式 流動性質的影響 溫度性質的影響
21st Century Teaching & Learning
IBM SWG Overall Introduction
Version Control System Based DSNs
研究技巧與論文撰寫方法 中央大學資管系 陳彥良.
校園地震預警系統的建置與應用 林沛暘.
高性能计算与天文技术联合实验室 智能与计算学部 天津大学
Guide to a successful PowerPoint design – simple is best
虚 拟 仪 器 virtual instrument
关联词 Writing.
Inter-band calibration for atmosphere
績效考核 一.績效考核: 1.意義 2.目的 3.影響績效的因素 二.要考核什麼? 三.誰來負責考核? 四.運用什麼工具與方法?
高考应试作文写作训练 5. 正反观点对比.
TEEN CHALLENGE Next Steps 核心价值观总结 CORE VALUES 青年挑战核心价值观
An Efficient MSB Prediction-based Method for High-capacity Reversible Data Hiding in Encrypted Images 基于有效MSB预测的加密图像大容量可逆数据隐藏方法。 本文目的: 做到既有较高的藏量(1bpp),
An organizational learning approach to information systems development
李宏毅專題 Track A, B, C 的時間、地點開學前通知
Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨
Introduction of this course
More About Auto-encoder
Speaker : YI-CHENG HUNG
Chapter 9 Validation Prof. Dehan Luo
Class imbalance in Classification
簡單迴歸分析與相關分析 莊文忠 副教授 世新大學行政管理學系 計量分析一(莊文忠副教授) 2019/8/3.
健康按摩法 請開音樂.
Principle and application of optical information technology
如何在Elsevier期刊上发表文章 china.elsevier.com
之前都是分类的蒸馏很简单。然后从分类到分割也是一样,下一篇是检测的蒸馏
WiFi is a powerful sensing medium
Unit 1 Book 8 A land of diversity
Gaussian Process Ruohua Shi Meeting
Hybrid fractal zerotree wavelet image coding
Presentation transcript:

Ten Years of Pedestrian Detection, What Have We Learned? Rodrigo Benenson, Mohamed Omran, Jan Hosang, Bernt Schiele Max Planck Institut for Informatics Saarbrücken, Germany 2014, Computer Vision for Road Scene Understanding and Autonomous Driving (CVRSUAD, ECCV workshop)

agenda Abstract Introduction Datasets Main approaches to improve pedestrian detection Experiments Conclusion

Abstract Discuss the main ideas explored in the 40+ detectors currently present in the Caltech pedestrian detection benchmark. Observe that there exist three families of approaches. We study the complementarity of the most promising ideas by combining multiple published strategies. This new decision forest detector achieves the current best known performance on the challenging Caltech-USA dataset. pedestrian detection 行人檢測 all currently reaching similar detection quality Based on our analysis, we study the complementarity of the most promising ideas by combining multiple published strategies

Introduction Pedestrian detection is direct applications in car safety, surveillance, and robotics. It has served as a playground to explore different ideas for object detection. The main paradigms for object detection “Viola & Jones variants”, HOG+SVM rigid templates, deformable part detectors (DPM), and convolutional neural networks (ConvNets) have all been explored for this task. The aim of this paper is to review progress over the last decade of pedestrian detection (40+ methods), identify the main ideas explored, and try to quantify which ideas had the most impact on final detection quality. Do not aim to introduce a novel technique, by putting together existing methods we report the best known detection results on the challenging Caltech-USA dataset. 行人檢測具有極其廣泛的應用:智能輔助駕駛,智能監控,行人分析以及智能機器人等領域 it has attracted much attention in the last years. it is a well defined problem with established benchmarks基準 and evaluation metrics評價指標 Playground 平台 paradigms 模型  

Introduction The last decade has shown tremendous progress on pedestrian detection. What have we learned out of the 40+ proposed methods?

Datasets Multiple public pedestrian datasets have been collected over the years; INRIA, ETH, TUD-Brussels, Daimler, Caltech-USA, and KITTI are the most commonly used ones. 這張圖表示利用 特征:SquaresChnFtrs (输入图像转换为一系列特征图,通过对一大群矩形区域进行sum-pooling得到最终的特征向量)

Datasets INRIA : ETH and TUD-Brussels Daimler the oldest and as such has comparatively few images It benefits however from high quality annotations of pedestrians in diverse settings. ETH and TUD-Brussels mid-sized video datasets Daimler It is not considered by all methods because it lacks colour channels. Daimler stereo, ETH, and KITTI provide stereo information All datasets but INRIA are obtained from video, and thus enable the use of optical flow as an additional cue. 擁有比較豐富的背景環境(如城市,沙灘,山)所以被使用的比較多 Daimler缺乏彩色信息 Stereo 立體 光流Optical Flow :火車上,然後往窗外看。你可以看到樹、地面、建築等等,他們都在往後退。這個運動就是光流。

Datasets Caltech-USA and KITTI are the predominant benchmarks for pedestrian detection. Both are comparatively large and challenging Caltech-USA : the large number of methods that have been evaluated side-by side. KITTI : its test set is slightly more diverse, but is not yet used as frequently Predominant優越的 Caltech-USA有大量的方法還提供了相應的Matlab工具包使用因而比較起來比較方便 這篇文章主要是以Caltech數據集作為標準,以INRIA和KITTI作為輔助。

Datasets In this paper we use primarily Caltech-USA for comparing methods, INRIA and KITTI secondarily. Caltech-USA and INRIA results are measured in log-average miss-rate (MR, lower is better), while KITTI uses area under the precision-recall curve (AUC, higher is better). 這張圖表示利用 特征:SquaresChnFtrs (输入图像转换为一系列特征图,通过对一大群矩形区域进行sum-pooling得到最终的特征向量) 這些miss rate 求log平均,從而得到log-average miss rate.

Main approaches to improve pedestrian detection “Training” data column: I→INRIA, C → Caltech, I+/C+ →INRIA/Caltech and additional data, P → Pascal, T→TUD-Motion, I & C →both INRIA and Caltech. Listing of methods considered on Caltech-USA, sorted by log-average miss-rate (lower is better). Consult sections 3.1 to 3.9 for details of each column. See also matching figure 3. “HOG” indicates HOG-like [1]. Ticks indicate salient aspects of each method.

Main approaches to improve pedestrian detection Listing of methods considered on Caltech-USA, sorted by log-average miss-rate (lower is better). Consult sections 3.1 to 3.9 for details of each column. See also matching figure 3. “HOG” indicates HOG-like [1]. Ticks indicate salient aspects of each method.

Training data Methods trained on Caltech-USA systematically perform better than methods that generalize from INRIA. High performing methods with “other training” use extended versions of Caltech-USA. Listing of methods considered on Caltech-USA, sorted by log-average miss-rate (lower is better). Consult sections 3.1 to 3.9 for details of each column. See also matching figure 3. “HOG” indicates HOG-like [1]. Ticks indicate salient aspects of each method.

Main approaches to improve pedestrian detection Solution Families Discern three families: 1 DPM variants(DPM), 2 Deep networks(DN) and 3 Decision forests(DF) Based on raw numbers alone boosted decision trees (DF) seem particularly suited for pedestrian detection. Better classifiers Since the original proposal of HOG+SVM, linear and non-linear kernels have been considered. The distinction between features and classifiers is not clear-cut anymore whether non-linear kernels provide meaningful gains over linear kernels whether one particular type of classifier (e.g. SVM or decision forests) is better suited for pedestrian detection than another. 上圖中表現最好的幾種方法都是DF 並沒有經驗性的證據表明非線性核比線性核的性能更好。也沒有證據表明某種分類器是最適合做行人檢測的。

Main approaches to improve pedestrian detection Additional data The core problem of pedestrian detection focuses on individual monocular colour image frames. Some methods explore leveraging additional information at training and test time to improve detections. Ex: stereo images [45], optical flow (using previous frames, e.g. MultiFtr+Motion [22] and ACF+SDt [42]), tracking [46], or data from other sensors stereo images立體像 現在的技術可以使僅基於單鏡頭彩色圖像的方法已經能與使用了額外信息進行增強的方法不相上下。

Main approaches to improve pedestrian detection Using additional data provides meaningful improvements Show the performance improvement for methods incorporating context. Exploiting Context AFS+Geo :The evaluation metrics changed from per-window (FPPW) to per-image (FPPI) Context provides consistent improvements for pedestrian detection, although the scale of improvement is lower compared to additional test data. 根據周圍的環境信息來改進檢測得到的結果不如增加訓練的樣本數目和深度結構那麼明顯。

Main approaches to improve pedestrian detection Deformable parts The DPM detector [19] was originally motivated for pedestrian detection. For pedestrian detection the results are competitive, but not salient (LatSvm [50,12], MultiResC [33], MT-DPM [39]). For pedestrian detection there is still no clear evidence for the necessity of components and parts, beyond the case of occlusion handling. Multi-scale models improve performance by 1 ∼ 2 MR percent points Despite consistent improvements, their contribution to the final quality is rather minor. HOG論文中訓練出來的人形模型。它是單模型,對直立的正面和背面人檢測效果很好,較以前取得了重大的突破。也是目前為止最好的的特徵(最近被CVPR20 13年的一篇論文 《Histograms of Sparse Codes for Object Detection》 超過了)。但是, 如果是側面呢?所以自然我們會想到用多模型來做。 DPM就使用了2個模型,主頁上最新版本Versio5的程序使用了12個模型。 多模型就能解決視角 可變形的部分的性能提升被單一成分的detector系統性的超過了,因此現在沒有明確的證據顯示有使用這種方法的必要。 多解析度的模型 在提取features前解析度不同會有影響 Multi-scale models provide a simple and generic extension to existing detectors.

Main approaches to improve pedestrian detection Deep architectures Convnet:ImageNet Classification with Deep CNN a mix of unsupervised and supervised training to create a convolutional neural network trained on INRIA. Another line of work focuses on using deep architectures to jointly model parts and occlusions these works use edge and colour features [40,34,28], or initialise network weights to edge-sensitive filters Despite the common narrative there is still no clear evidence that deep networks are good at learning features for pedestrian detection. Most successful methods use such architectures to model higher level aspects of parts, occlusions, and context. 隨著數據量的增加和計算能力的增強,在計算機視覺領域(包括行人檢測方面)使用深度網絡(尤其是CNN)變得流行。 ConvNet結構混合了監督的和無監督的訓練來搭建卷積神經網絡 而另一些結構(DBNJointDeepSDN)將part model和遮擋結合起來 都放進了深度結構 使用了邊緣和色彩特徵,或者將網絡權重初始化時設置為edge detector.

Main approaches to improve pedestrian detection Better features A large set of feature types have been explored: edge information [1,26,58,41], colour information [26,22], texture information [17], local shape information [38], covariance features [24], amongst others. More and more diverse features have been shown to systematically improve performance. some papers have considered up to an order of magnitude more channels [16,58,24,30,38]. Despite the improvements by adding many channels, top performance is still reached with only 10 channels The next scientific step will be to develop a more profound understanding of the what makes good features good, and how to design even better ones The most popular approach (about 30 % of the considered methods) 在改進行人檢測的工作中,做的最多的就是增加或者多樣化輸入圖像的特徵 很多decision forest方法採用10個feature channel 更好的特徵表示可以提升性能

Experiment Three aspects seem to be the most promising in terms of impact on detection quality: better features (§3.9), additional data (§3.4), and context information (§3.5). choose the Integral Channels Features framework [26] (a decision forest) for conducting our experiments. Methods from this family have shown good performance, train in minutes∼hours, and lend themselves to the analyses we aim. 從前面的討論中可以得知,有三個最有希望提升性能的方面 We thus conduct experiments on the complementarity of these aspects.

Experiment Reviewing the effect of features In this section, we evaluate the impact of increasing feature complexity EX:expand the 10 HOG+LUV channels into 40 channels by convolving each channel with three DCT (discrete cosine transform) basis functions (of 7 × 7 pixels), and storing the absolute value of the filter responses as additional feature channels. We name this variant SquaresChnFtrs+DCT INRIA training set Caltech-USA reasonable test set. Simple tweaks to these well known features 從VJ以來,性能的提升多半可以歸功於採用了更好的特徵,梯度方向和顏色信息等。即使是在已有特徵基礎上加入的一點點微調也能產生顯著的提升(如SquaresChnFtrs加入DCT變換)。

Experiment Complementarity of approaches consider the complementary of better features (HOG+LUV+DCT), additional data (via optical flow), and context (via person-to- person interactions). We encode the optical flow using the same SDt features from ACF+SDt The context information is injected using the +2Ped re-weighting strategy The combination SquaresChnFtrs+DCT+SDt+2Ped is called Katamari-v1. 在上文SquaresChnFtrs+DCT的基礎上,作者用和ACF+SDt中同樣的方法將光流信息編碼,同時用+2Ped中的re-weighting技巧把環境信息加入。這種SquaresChnFtrs+DCT+SDt+2Ped的方法被稱為Katamari-v1。 如圖7 所示,Katamari-v1方法達到了在Caltech上的最好結果,圖7還顯示了其他方法所獲得最好效果。 show that adding extra features, flow, and context information are largely complementary (12 % gain, instead of 3 + 7 + 5 %), even when starting from a strong detector.

Experiment How much model capacity is needed? we consider a necessary condition for high quality detection: is the learned model performing well on the training set? Caltech-USA training set performance. None of these methods performs perfectly on the training set we do not observe yet symptoms of over-fitting. Our results indicate that research on increasing the discriminative power of detectors is likely to further improve detection quality. More discriminative power can originate from more and better features or more complex classifiers. 所以,我們還是應該研究更有區分力的檢測子來提升檢測結果。這些更有區分力的檢測子可以通過尋找更好的features和更複雜的分類器來實現。

Experiment Generalisation across datasets For real world application beyond a specific benchmark, the generalization capability of a model is key shows the performance of SquaresChnFtrs over Caltech-USA when using different training sets(MR for INRIA/Caltech/ETH, AUC for KITTI) While detectors learned on one dataset may not necessarily transfer well to others, their ranking is stable across datasets, suggesting that insights can be learned from well-performing methods regardless of the benchmark 泛化誤差(Generalization error),是一個描述學生機器在從樣品數據中學習之後,離教師機器之間的差距的函數。使用這個名字是因為這個函數表明一個機器的推理能力,即從樣品數據中推導出的規則能夠適用於新的數據的能力。 用不同的訓練集訓練(INRIA、Caltech、KITTI),然後用不同的測試集測試(INRIA、Caltech、KITTI、ETH) 使用的INRIA訓練的模型,在各個測試集上都表現良好,兩個第1(INRIA、ETH),兩個第二(Caltech、KITTI);只要方法好, 在不同的benchmark上的表現都是穩定的。

Conclusion Pedestrian detection can be attributed to the improvement in features alone. Our experiment combining the detector ingredients that our retrospective analysis found to work well (better features, optical flow, and context) shows that these ingredients are mostly complementary The main challenge ahead seems to develop a deeper understanding of what makes good features good, so as to enable the design of even better ones. 而且這些特徵大部都是經過人工反复實驗(hand-crafted with trial and error)得到的