Chinese
Adv Search
Home | Accepted | Article In Press | Current Issue | Archive | Special Issues | Collections | Featured Articles | Statistics

Virtual Reality & Intelligent Hardware

Virtual Reality & Intelligent Hardware

Current Issue

2019 Vol. 1 No. 5 Previous

View Abstracts Download Citations

EndNote

Reference Manager

ProCite

BiteTex

RefWorks

Editorial

3D vision for virtual reality and augmented reality

DOI:10.3724/SP.J.2096-5796.2019.00002

2019, 1(5) : 1-2

PDF (20) | HTML (26)

Review

Flow-based SLAM: From geometry computation to learning

DOI:10.1016/j.vrih.2019.09.001

2019, 1(5) : 435-460

Abstract (51) | PDF (8) | HTML (21)
Simultaneous localization and mapping (SLAM) has attracted considerable research interest from the robotics and computer-vision communities for >30 years. With steady and progressive efforts being made, modern SLAM systems allow robust and online applications in real-world scenes. We examined the evolution of this powerful perception tool in detail and noticed that the insights concerning incremental computation and temporal guidance are persistently retained. Herein, we denote this temporal continuity as a flow basis and present for the first time a survey that specifically focuses on the flow-based nature, ranging from geometric computation to the emerging learning techniques. We start by reviewing two essential stages for geometric computation, presenting the de facto standard pipeline and problem formulation, along with the utilization of temporal cues. The recently emerging techniques are then summarized, covering a wide range of areas, such as learning techniques, sensor fusion, and continuous-time trajectory modeling. This survey aims at arousing public attention on how robust SLAM systems benefit from a continuously observing nature, as well as the topics worthy of further investigation for better utilizing the temporal cues.
Collaborative visual SLAM for multiple agents: A brief survey

DOI:10.1016/j.vrih.2019.09.002

2019, 1(5) : 461-482

Abstract (44) | PDF (10) | HTML (19)
This article presents a brief survey to visual simultaneous localization and mapping (SLAM) systems applied to multiple independently moving agents, such as a team of ground or aerial vehicles, a group of users holding augmented or virtual reality devices. Such visual SLAM system, name as collaborative visual SLAM, is different from a typical visual SLAM deployed on a single agent in that information is exchanged or shared among different agents to achieve better robustness, efficiency, and accuracy. We review the representative works on this topic proposed in the past ten years and describe the key components involved in designing such a system including collaborative pose estimation and mapping tasks, as well as the emerging topic of decentralized architecture. We believe this brief survey could be helpful to someone who are working on this topic or developing multi-agent applications, particularly micro-aerial vehicle swarm or collaborative augmented/virtual reality.
Survey of 3D modeling using depth cameras

DOI:10.1016/j.vrih.2019.09.003

2019, 1(5) : 483-499

Abstract (39) | PDF (14) | HTML (21)
Three-dimensional (3D) modeling is an important topic in computer graphics and computer vision. In recent years, the introduction of consumer-grade depth cameras has resulted in profound advances in 3D modeling. Starting with the basic data structure, this survey reviews the latest developments of 3D modeling based on depth cameras, including research works on camera tracking, 3D object and scene reconstruction, and high-quality texture reconstruction. We also discuss the future work and possible solutions for 3D modeling based on the depth camera.

Article

Bags of tricks for learning depth and camera motion from monocular videos

DOI:10.1016/j.vrih.2019.09.004

2019, 1(5) : 500-510

Abstract (28) | PDF (6) | HTML (16)
Background
Based on the seminal work proposed by Zhou et al., much of the recent progress in learning monocular visual odometry, i.e., depth and camera motion from monocular videos, can be attributed to the tricks in the training procedure, such as data augmentation and learning objectives.
Methods
Herein, we categorize a collection of such tricks through the theoretical examination and empirical evaluation of their effects on the final accuracy of the visual odometry.
Results/Conclusions
By combining the aforementioned tricks, we were able to significantly improve a baseline model adapted from SfMLearner without additional inference costs. Furthermore, we analyzed the principles of these tricks and the reason for their success. Practical guidelines for future research are also presented.
Real-time human segmentation by BowtieNet and a SLAM-based human AR system

DOI:10.1016/j.vrih.2019.08.002

2019, 1(5) : 511-524

Abstract (34) | PDF (10) | HTML (21)
Background Generally, it is difficult to obtain accurate pose and depth for a non-rigid moving object from a single RGB camera to create augmented reality (AR). In this study, we build an augmented reality system from a single RGB camera for a non-rigid moving human by accurately computing pose and depth, for which two key tasks are segmentation and monocular Simultaneous Localization and Mapping (SLAM). Most existing monocular SLAM systems are designed for static scenes, while in this AR system, the human body is always moving and non-rigid.
Methods
In order to make the SLAM system suitable for a moving human, we first segment the rigid part of the human in each frame. A segmented moving body part can be regarded as a static object, and the relative motions between each moving body part and the camera can be considered the motion of the camera. Typical SLAM systems designed for static scenes can then be applied. In the segmentation step of this AR system, we first employ the proposed BowtieNet, which adds the atrous spatial pyramid pooling (ASPP) of DeepLab between the encoder and decoder of SegNet to segment the human in the original frame, and then we use color information to extract the face from the segmented human area.
Results
Based on the human segmentation results and a monocular SLAM, this system can change the video background and add a virtual object to humans.
Conclusions
The experiments on the human image segmentation datasets show that BowtieNet obtains state-of-the-art human image segmentation performance and enough speed for real-time segmentation. The experiments on videos show that the proposed AR system can robustly add a virtual object to humans and can accurately change the video background.

Case Report

Multi-source data-based 3D digital preservation of large-scale ancient chinese architecture: A case report

DOI:10.1016/j.vrih.2019.08.003

2019, 1(5) : 525-541

Abstract (42) | PDF (11) | HTML (32)
The 3D digitalization and documentation of ancient Chinese architecture is challenging because of architectural complexity and structural delicacy. To generate complete and detailed models of this architecture, it is better to acquire, process, and fuse multi-source data instead of single-source data. In this paper, we describe our work on 3D digital preservation of ancient Chinese architecture based on multi-source data. We first briefly introduce two surveyed ancient Chinese temples, Foguang Temple and Nanchan Temple. Then, we report the data acquisition equipment we used and the multi-source data we acquired. Finally, we provide an overview of several applications we conducted based on the acquired data, including ground and aerial image fusion, image and LiDAR (light detection and ranging) data fusion, and architectural scene surface reconstruction and semantic modeling. We believe that it is necessary to involve multi-source data for the 3D digital preservation of ancient Chinese architecture, and that the work in this paper will serve as a heuristic guideline for the related research communities.