Chinese
Adv Search
Home | Accepted | Article In Press | Current Issue | Archive | Special Issues | Collections | Featured Articles | Statistics

2019, 1(5): 511-524 Published Date:2019-10-20

DOI: 10.1016/j.vrih.2019.08.002

Real-time human segmentation by BowtieNet and a SLAM-based human AR system

Full Text: PDF (14) HTML (82)

Export: EndNote | Reference Manager | ProCite | BibTex | RefWorks

Abstract:

Background Generally, it is difficult to obtain accurate pose and depth for a non-rigid moving object from a single RGB camera to create augmented reality (AR). In this study, we build an augmented reality system from a single RGB camera for a non-rigid moving human by accurately computing pose and depth, for which two key tasks are segmentation and monocular Simultaneous Localization and Mapping (SLAM). Most existing monocular SLAM systems are designed for static scenes, while in this AR system, the human body is always moving and non-rigid.
Methods
In order to make the SLAM system suitable for a moving human, we first segment the rigid part of the human in each frame. A segmented moving body part can be regarded as a static object, and the relative motions between each moving body part and the camera can be considered the motion of the camera. Typical SLAM systems designed for static scenes can then be applied. In the segmentation step of this AR system, we first employ the proposed BowtieNet, which adds the atrous spatial pyramid pooling (ASPP) of DeepLab between the encoder and decoder of SegNet to segment the human in the original frame, and then we use color information to extract the face from the segmented human area.
Results
Based on the human segmentation results and a monocular SLAM, this system can change the video background and add a virtual object to humans.
Conclusions
The experiments on the human image segmentation datasets show that BowtieNet obtains state-of-the-art human image segmentation performance and enough speed for real-time segmentation. The experiments on videos show that the proposed AR system can robustly add a virtual object to humans and can accurately change the video background.
Keywords: Augmented reality ; Moving object ; Reconstruction and tracking ; Camera pose ; Human segmentation

Cite this article:

Xiaomei ZHAO, Fulin TANG, Yihong WU. Real-time human segmentation by BowtieNet and a SLAM-based human AR system. Virtual Reality & Intelligent Hardware, 2019, 1(5): 511-524 DOI:10.1016/j.vrih.2019.08.002

1. Klein G, Murray D. Parallel tracking and mapping for small AR workspaces. In: Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality. IEEE Computer Society, 2007, 1–10 DOI:10.1109/ismar.2007.4538852

2. Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics, 2015, 31(5): 1147–1163 DOI:10.1109/tro.2015.2463671

3. Mur-Artal R, Tardos J D. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Transactions on Robotics, 2017, 33(5): 1255–1262 DOI:10.1109/tro.2017.2705103

4. Park Y, Lepetit V, Woo W. Texture-less object tracking with online training using an RGB-D camera. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality. New York, USA, IEEE, 2011 DOI:10.1109/ismar.2011.6162879

5. Ren C Y, Prisacariu V, Murray D, Reid I. STAR3D: simultaneous tracking and reconstruction of 3D objects using RGB-D data. In: 2013 IEEE International Conference on Computer Vision. Sydney, Australia. New York, USA, IEEE, 2013 DOI:10.1109/iccv.2013.197

6. Feng Y J, Wu Y H, Fan L X. On-line object reconstruction and tracking for 3D interaction. In: 2012 IEEE International Conference on Multimedia and Expo. Melbourne, Australia, IEEE, 2012, 711–716 DOI:10.1109/icme.2012.144

7. Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640–651 DOI:10.1109/tpami.2016.2572683

8. Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481–2495 DOI:10.1109/tpami.2016.2644615

9. Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015, 234–241 DOI:10.1007/978-3-319-24574-4_28

10. Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille A L. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848 DOI:10.1109/tpami.2017.2699184

11. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA, IEEE, 2014, 580–587 DOI:10.1109/cvpr.2014.81

12. Mostajabi M, Yadollahpour P, Shakhnarovich G. Feedforward semantic segmentation with zoom-out features. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA, IEEE, 2015, 3376–3385 DOI:10.1109/cvpr.2015.7298959

13. Chen L C, Zhu Y K, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation//Computer Vision–ECCV 2018. Cham: Springer International Publishing, 2018, 833–851 DOI:10.1007/978-3-030-01234-2_49

14. Chen L C, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. 2017, arXiv preprint arXiv:1706.05587

15. Jianbo S, Tomasi C. Good features to track. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Seattle, WA, 1994, 593–600 DOI:10.1109/cvpr.1994.323794

16. Hartley R, Zisserman A. Multiple view geometry in computer vision. Cambridge university press, 2003

17. Huber P J. Robust statistics. Springer Berlin Heidelberg, 2011

18. Wu Z, Huang Y, Yu Y, Wang L, Tan T. Early hierarchical contexts learned by convolutional networks for image segmentation. In: Proceedings of the 22nd International Conference on Pattern Recognition. Stockholm, Sweden, 2014, 1538–1543 DOI:10.1109/icpr.2014.273

19. Song C F, Huang Y Z, Wang Z Y, Wang L. 1000fps human segmentation with deep convolutional neural networks. In: Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition. Kuala Lumpur, Malaysia, 2015, 474–478 DOI:10.1109/acpr.2015.7486548

20. Tesema F B, Wu H, Zhu W. Human segmentation with deep contour-aware network. In: Proceedings of the 2018 International Conference on Computing and Artificial Intelligence. Medan, Indonesia, 2018, 98–103 DOI:10.1145/3194452.3194471

21. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature fmbedding. In: Proceedings of the 22nd ACM international conference on Multimedia. Orlando, Florida, USA, ACM, 2014, 675–678 DOI:10.1145/2647868.2654889

22. Milletari F, Navab N, Ahmadi S A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV). Stanford, CA, 2016, 565–571 DOI:10.1109/3DV.2016.79

email E-mail this page

Articles by authors

VRIH