Shape(Structure) From X 解决的是从2D图像到2.5D表面形状(场景深度) 的问题 Shape from motion Shape from stereo Shape from monocular cues(shading, vanishing point, defocus, texture,….)
第七章 基于运动视觉的场景复原
三维运动估计 三维运动估计是指从二维图象序列来估计物体三维 运动参数以及三维结构。 SFM (Structure From Motion)
三维刚体运动
小角度旋转 小角度旋转矩阵
1. 基于正交投影的三维运动估计 小角度旋转矩阵 6个未知数,3对点
基于正交投影的三维运动估计 Aizawa, 1989 1. 根据对应点和深度估计值,计算运动参数 2. 根据运动参数和对应点,重新估计深度 交替直到稳定 2. 根据运动参数和对应点,重新估计深度
基于正交投影的三维运动估计 Bozdagi, 1994 利用深度估计值的随机扰动,跳出局部最优 1. 根据对应点和深度估计值,计算运动参数 2. 根据运动参数和深度估计值,估计对应点坐标 3. 计算估计误差
基于正交投影的三维运动估计 4. 随机扰动深度估计值 5. 重复以上步骤 实验证明,这种改进的迭代算法在初始深度值有50%的误差的情况下,也能很好地收敛到正确的运动参数值。
2 基于透视投影模型的三维运动估计 规范化焦距F=1,分子分母同除以Zk
3 基于外极线的三维运动估计 外极线方程几何意义
基于外极线的三维运动估计 外极线方程 三维刚体运动 引进一个反对称矩阵:
基于外极线的三维运动估计 基本矩阵(essential matrix) 平移矢量乘以不为零的系数,不影响外极线方程成立 所恢复的运动参数是关于比例系数的解
本质矩阵的应用 可被用于 简化匹配问题 检测错误的匹配
基于外极线的三维运动估计 外极线方程
基于外极线的三维运动估计 基本矩阵的性质 外极线方程的待求参数 5个未知的独立的参数,这也和运动参数的自由 度数量相一致,即三个旋转自由度,二个平移自由 度(或三个关于一个比例系数的平移自由度).
(1) 根据基本矩阵估计运动 1. 计算基本矩阵 8对以上对应点求稳定解(实际经常使用RANSAC算法)
(1) 根据基本矩阵估计运动 1. 计算基本矩阵 In reality, instead of solving , we seek E to minimize , least eigenvector of .
8-point algorithm It is achieved by SVD. Let , where , let To enforce that E is of rank 2, E is replaced by E’ that minimizes subject to . It is achieved by SVD. Let , where , let then is the solution.
8-point algorithm % Build the constraint matrix A = [x2(1,:)‘.*x1(1,:)' x2(1,:)'.*x1(2,:)' x2(1,:)' ... x2(2,:)'.*x1(1,:)' x2(2,:)'.*x1(2,:)' x2(2,:)' ... x1(1,:)' x1(2,:)' ones(npts,1) ]; [U,D,V] = svd(A); % Extract fundamental matrix from the column of V % corresponding to the smallest singular value. E = reshape(V(:,9),3,3)'; % Enforce rank2 constraint [U,D,V] = svd(E); E = U*diag([D(1,1) D(2,2) 0])*V';
Problem with 8-point algorithm ~10000 ~10000 ~100 ~10000 ~10000 ~100 ~100 ~100 1 Orders of magnitude difference between column of data matrix least-squares yields poor results !
Normalized 8-point algorithm normalized least squares yields good results Transform image to ~[-1,1]x[-1,1] (0,500) (700,500) (-1,1) (1,1) (0,0) (0,0) (700,0) (-1,-1) (1,-1)
Normalized 8-point algorithm [x1, T1] = normalise2dpts(x1); [x2, T2] = normalise2dpts(x2); A = [x2(1,:)‘.*x1(1,:)' x2(1,:)'.*x1(2,:)' x2(1,:)' ... x2(2,:)'.*x1(1,:)' x2(2,:)'.*x1(2,:)' x2(2,:)' ... x1(1,:)' x1(2,:)' ones(npts,1) ]; [U,D,V] = svd(A); E = reshape(V(:,9),3,3)'; [U,D,V] = svd(E); E = U*diag([D(1,1) D(2,2) 0])*V'; % Denormalise E = T2'*E*T1;
Normalization function [newpts, T] = normalise2dpts(pts) c = mean(pts(1:2,:)')'; % Centroid newp(1,:) = pts(1,:)-c(1); % Shift origin to centroid. newp(2,:) = pts(2,:)-c(2); meandist = mean(sqrt(newp(1,:).^2 + newp(2,:).^2)); scale = sqrt(2)/meandist; T = [scale 0 -scale*c(1) 0 scale -scale*c(2) 0 0 1 ]; newpts = T*pts;
RANSAC compute E based on all inliers repeat select minimal sample (8 matches) compute solution(s) for F determine inliers until (#inliers,#samples)<95% || too many times compute E based on all inliers
根据基本矩阵估计运动 2. 估计运动参数 T: 根据基本矩阵的性质 R: 根据
(2) 直接根据外极线方程估计运动 理想情况下: 由于误差,改求:
Structure from motion
Structure from motion Unknown camera viewpoints structure for motion: automatic recovery of camera motion and scene structure from two or more images. It is a self calibration technique and called automatic camera tracking or matchmoving.
坐标转换 Model-view Transformation Camera Coordinate System World Coordinate System
Camera Projection Matrix 世界坐标系 相机坐标系 Camera Parameter Camera Projection Matrix Intrinsic Extrinsic
对于同一场景点,拍摄一张图像
对于同一场景点,使用同样的相机设置拍摄两张图像
对于同一场景点,使用同样的相机设置拍摄三张图像
Image 1 Image 3 R1,t1 R3,t3 Image 2 R2,t2
Same Camera Same Setting = Same Point 1 Point 2 Point 3 Image 1 Image 2 Image 3 Same Camera Same Setting = Same
Triangulation Image 1 Image 3 R1,t1 R3,t3 Image 2 R2,t2
Triangulation Image 1 Image 3 R1,t1 R3,t3 Image 2 R2,t2
相机内部参数矩阵 Principle point offset Skew especially when images are cropped (Internet) Skew Radial distortion (due to optics of the lens)
Steps Images Points: Structure from Motion Points More points: Multiple View Stereo Points Meshes: Model Fitting Meshes Models: Texture Mapping Images Models: Image-based Modeling + =
Steps Images Points: Structure from Motion Points More points: Multiple View Stereo Points Meshes: Model Fitting Meshes Models: Texture Mapping Images Models: Image-based Modeling + = = +
Steps Images Points: Structure from Motion Points More points: Multiple View Stereo Points Meshes: Model Fitting Meshes Models: Texture Mapping Images Models: Image-based Modeling + = = +
Steps Images Points: Structure from Motion Points More points: Multiple View Stereo Points Meshes: Model Fitting Meshes Models: Texture Mapping Images Models: Image-based Modeling + = + =
Steps Images Points: Structure from Motion Points More points: Multiple View Stereo Points Meshes: Model Fitting Meshes Models: Texture Mapping Images Models: Image-based Modeling + = = +
Steps Images Points: Structure from Motion Points More points: Multiple View Stereo Points Meshes: Model Fitting Meshes Models: Texture Mapping Images Models: Image-based Modeling + = = +
Steps Images Points: Structure from Motion Points More points: Multiple View Stereo Points Meshes: Model Fitting Meshes Models: Texture Mapping Images Models: Image-based Modeling + = = +
Pipeline Structure from Motion (SFM) Multi-view Stereo (MVS)
Pipeline Structure from Motion (SFM) Multi-view Stereo (MVS)
Two-view Reconstruction
Two-view Reconstruction
Two-view Reconstruction keypoints match fundamental matrix essential [R|t] triangulation
Keypoints Detection keypoints match fundamental matrix essential [R|t] triangulation
Descriptor for each point SIFT descriptor SIFT descriptor keypoints match fundamental matrix essential [R|t] triangulation
Same for the other images SIFT descriptor SIFT descriptor SIFT descriptor SIFT descriptor keypoints match fundamental matrix essential [R|t] triangulation
Point Match for correspondences SIFT descriptor SIFT descriptor SIFT descriptor SIFT descriptor keypoints match fundamental matrix essential [R|t] triangulation
Point Match for correspondences SIFT descriptor SIFT descriptor SIFT descriptor SIFT descriptor keypoints match fundamental matrix essential [R|t] triangulation
Fundamental Matrix Image 1 R1,t1 Image 2 R2,t2
Estimating Fundamental Matrix Given a correspondence The basic incidence relation is Need 8 points
Estimating Fundamental Matrix for 8 point correspondences: Direct Linear Transformation (DLT)
RANSAC to Estimate Fundamental Matrix For many times Pick 8 points Compute a solution for using these 8 points Count number of inliers Pick the one with the largest number of inliers
Fundamental Matrix Essential Matrix Image 1 R1,t1 Image 2 R2,t2
Essential Matrix For a given essential matrix and the first camera matrix , there are four possible choices for the second camera matrix :
Four Possible Solutions
Triangulation Image 1 R1,t1 Image 2 R2,t2
Two-view Reconstruction keypoints match fundamental matrix essential [R|t] triangulation
Pipeline Structure from Motion (SFM) Multi-view Stereo (MVS)
Pipeline Taught Next
Merge Two Point Cloud
Merge Two Point Cloud There can be only one
Merge Two Point Cloud
Oops See From a Different Angle
Bundle Adjustment
Point 1 Point 2 Point 3 Image 1 Image 2 Image 3
= Bundle Adjustment A valid solution and must let Re-projection Observation
Bundle Adjustment A valid solution and must let the Re-projection close to the Observation, i.e. to minimize the reprojection error
Bundle Adjustment A valid solution and must let the Re-projection close to the Observation, i.e. to minimize the reprojection error Camera Points
Initialization Matters Input: Observed 2D image position Output: Unknown Camera Parameters (with some guess) Unknown Point 3D coordinate (with some guess)
Descriptor: ZNCC (Zero-mean Normalized Cross-Correlation) Invariant to linear radiometric changes More conservative than others such as sum of absolute or square differences in uniform regions More tolerant in textured areas where noise might be important http://www.lasmea.univ-bpclermont.fr/Personnel/Maxime.Lhuillier/Papers/Eccv02.pdf http://www.lasmea.univ-bpclermont.fr/Personnel/Maxime.Lhuillier/Papers/Icpr00.pdf http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1641048
Descriptor: ZNCC (Zero-mean Normalized Cross-Correlation) Invariant to linear radiometric changes More conservative than others such as sum of absolute or square differences in uniform regions More tolerant in textured areas where noise might be important http://www.lasmea.univ-bpclermont.fr/Personnel/Maxime.Lhuillier/Papers/Eccv02.pdf http://www.lasmea.univ-bpclermont.fr/Personnel/Maxime.Lhuillier/Papers/Icpr00.pdf http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1641048
Seed for propagation
Matching Propagation (propagate.m) Maintain a priority queue Q Initialize: Put all seeds into Q with their ZNCC values as scores For each iteration: Pop the match with best ZNCC score from Q Add new potential matches in their immediate spatial neighborhood into Q Safety: handle uniqueness, and propagate only on matchable area
the area with maximal gradience > threshold Matchable Area the area with maximal gradience > threshold
Result (denseMath/run.m)
Triangulation Image 1 Image 3 R1,t1 R3,t3 Image 2 R2,t2
Final Result
Colorize the Point Cloud