Chinese
Adv Search
Home | Accepted | Article In Press | Current Issue | Archive | Special Issues | Collections | Featured Articles | Statistics

2019, 1(5): 500-510 Published Date:2019-10-20

DOI: 10.1016/j.vrih.2019.09.004

Bags of tricks for learning depth and camera motion from monocular videos

Full Text: PDF (9) HTML (74)

Export: EndNote | Reference Manager | ProCite | BibTex | RefWorks

Abstract:

Background
Based on the seminal work proposed by Zhou et al., much of the recent progress in learning monocular visual odometry, i.e., depth and camera motion from monocular videos, can be attributed to the tricks in the training procedure, such as data augmentation and learning objectives.
Methods
Herein, we categorize a collection of such tricks through the theoretical examination and empirical evaluation of their effects on the final accuracy of the visual odometry.
Results/Conclusions
By combining the aforementioned tricks, we were able to significantly improve a baseline model adapted from SfMLearner without additional inference costs. Furthermore, we analyzed the principles of these tricks and the reason for their success. Practical guidelines for future research are also presented.
Keywords: Unsupervised learning ; Monocular visual odometry

Cite this article:

Bowen DONG, Lu SHENG. Bags of tricks for learning depth and camera motion from monocular videos. Virtual Reality & Intelligent Hardware, 2019, 1(5): 500-510 DOI:10.1016/j.vrih.2019.09.004

1. Zhou T H, Brown M, Snavely N, Lowe D G. Unsupervised learning of depth and ego-motion from video. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, IEEE, 2017 DOI:10.1109/cvpr.2017.700

2. Wang C Y, Buenaposada J M, Zhu R, Lucey S. Learning depth from monocular videos using direct methods. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, IEEE, 2018 DOI:10.1109/cvpr.2018.00216

3. Tang C, Tan P. BA-Net: Dense Bundle Adjustment Networks. In: International Conference on Learning Representation (ICLR). 2019

4. Geiger A, Lenz P, Stiller C, Urtasun R. Vision meets robotics: The KITTI dataset. The International Journal of Robotics Research, 2013, 32(11): 1231–1237 DOI:10.1177/0278364913491297

5. Eigen D, Puhrsch C, Fergus R. Depth Map Prediction from a Single Image using a Multi-scale Deep Network. Advances in Neural Information Processing Systems (NIPS), 2014, 2366–2374

6. Wang Z, Bovik A C, Sheikh H R, Simoncelli E P. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 2004, 13(4): 600–612 DOI:10.1109/tip.2003.819861

7. Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-scale Image Recognition. arXiv preprint arXiv:1409.1556, 2014

8. Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution//Computer Vision–ECCV 2016. Cham: Springer International Publishing, 2016, 694–711 DOI:10.1007/978-3-319-46475-6_43

9. Casser V, Pirk S, Mahjourian R, Angelova A. Depth prediction without the sensors: leveraging structure for unsupervised learning from monocular videos. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33, 8001–8008 DOI:10.1609/aaai.v33i01.33018001

10. Godard C, Aodha O M, Brostow G J. Unsupervised monocular depth estimation with left-right consistency. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, IEEE, 2017 DOI:10.1109/cvpr.2017.699

11. Yin Z C, Shi J P. GeoNet: unsupervised learning of dense depth, optical flow and camera pose. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, IEEE, 2018 DOI:10.1109/cvpr.2018.00212

12. Meister S, Hur J, Roth S. UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss. In: Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), 2018

13. Cao Z, Kar A, Hane C, Malik J. Learning Independent Object Motion from Unlabelled Stereoscopic Videos. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, 5594–5603

14. Lv Z, Kim K, Troccoli A, Sun D Q, Rehg J M, Kautz J. Learning rigidity in dynamic scenes with a moving camera for 3D motion field estimation//Computer Vision–ECCV 2018. Cham: Springer International Publishing, 2018, 484–501 DOI:10.1007/978-3-030-01228-1_29

15. Yang Z H, Wang P, Wang Y, Xu W, Nevatia R. Every pixel counts: unsupervised geometry learning with holistic 3D motion understanding//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2019, 691–709 DOI:10.1007/978-3-030-11021-5_43

16. Ranjan A, Jampani V, Balles L, Kim K, Sun D, Wulff J, Black M J. Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, 12240–12249

17. Zou Y L, Luo Z L, Huang J B. DF-net: unsupervised joint learning of depth and flow using cross-task consistency//Computer Vision–ECCV 2018. Cham: Springer International Publishing, 2018, 38–55 DOI:10.1007/978-3-030-01228-1_3

18. Sun D Q, Yang X D, Liu M Y, Kautz J. PWC-net: CNNs for optical flow using pyramid, warping, and cost volume. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, IEEE, 2018 DOI:10.1109/cvpr.2018.00931

19. Liu F Y, Shen C H, Lin G S, Reid I. Learning Depth from Single Monocular Images using Deep Convolutional Neural Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2016, 38(10), 2024–2039

20. Garg R, Vijay K B G, Carneiro G, Reid I. Unsupervised CNN for single view depth estimation: geometry to the rescue//Computer Vision–ECCV 2016. Cham: Springer International Publishing, 2016, 740–756 DOI:10.1007/978-3-319-46484-8_45

21. Luo H, Yang Z, Wang Y, Xu W, Nevatia R, Yuille A L. Every Pixel Counts++: Joint Learning of Geometry and Motion with 3D Holistic Understanding. arXiv preprint arXiv:1810.06125, 2018

email E-mail this page

Articles by authors

VRIH

BAIDU SCHOLAR

WANFANG DATA