Virtual Reality & Intelligent Hardware

Virtual Reality & Intelligent Hardware

2020-01-15

2021-01-06

2019-04-23

2018-11-22

2018-11-12

EndNote

Reference Manager

ProCite

BiteTex

RefWorks

Review of dynamic gesture recognition

DOI：10.1016/j.vrih.2021.05.001

2021, 3(3) : 183-206

In recent years, gesture recognition has been widely used in the fields of intelligent driving, virtual reality, and human-computer interaction. With the development of artificial intelligence, deep learning has achieved remarkable success in computer vision. To help researchers better understanding the development status of gesture recognition in video, this article provides a detailed survey of the latest developments in gesture recognition technology for videos based on deep learning. The reviewed methods are broadly categorized into three groups based on the type of neural networks used for recognition: two-stream convolutional neural networks, 3D convolutional neural networks, and Long-short Term Memory (LSTM) networks. In this review, we discuss the advantages and limitations of existing technologies, focusing on the feature extraction method of the spatiotemporal structure information in a video sequence, and consider future research directions.
Data-driven simulation in fluids animation: A survey

DOI：10.1016/j.vrih.2021.02.002

2021, 3(2) : 87-104

The field of fluid simulation is developing rapidly, and data-driven methods provide many frameworks and techniques for fluid simulation. This paper presents a survey of data-driven methods used in fluid simulation in computer graphics in recent years. First, we provide a brief introduction of physical-based fluid simulation methods based on their spatial discretization, including Lagrangian, Eulerian, and hybrid methods. The characteristics of these underlying structures and their inherent connection with data-driven methodologies are then analyzed. Subsequently, we review studies pertaining to a wide range of applications, including data-driven solvers, detail enhancement, animation synthesis, fluid control, and differentiable simulation. Finally, we discuss some related issues and potential directions in data-driven fluid simulation. We conclude that the fluid simulation combined with data-driven methods has some advantages, such as higher simulation efficiency, rich details and different pattern styles, compared with traditional methods under the same parameters. It can be seen that the data-driven fluid simulation is feasible and has broad prospects.
Review of micro-expression spotting and recognition in video sequences

DOI：10.1016/j.vrih.2020.10.003

2021, 3(1) : 1-17

Facial micro-expressions are short and imperceptible expressions that involuntarily reveal the true emotions that a person may be attempting to suppress, hide, disguise, or conceal. Such expressions can reflect a person's real emotions and have a wide range of application in public safety and clinical diagnosis. The analysis of facial micro-expressions in video sequences through computer vision is still relatively recent. In this research, a comprehensive review on the topic of spotting and recognition used in micro-expression analysis databases and methods, is conducted, and advanced technologies in this area are summarized. In addition, we discuss challenges that remain unresolved alongside future work to be completed in the field of micro-expression analysis.
A survey on monocular 3D human pose estimation

DOI：10.1016/j.vrih.2020.04.005

2020, 2(6) : 471-500

Recovering human pose from RGB images and videos has drawn increasing attention in recent years owing to minimum sensor requirements and applicability in diverse fields such as human-computer interaction, robotics, video analytics, and augmented reality. Although a large amount of work has been devoted to this field, 3D human pose estimation based on monocular images or videos remains a very challenging task due to a variety of difficulties such as depth ambiguities, occlusion, background clutters, and lack of training data. In this survey, we summarize recent advances in monocular 3D human pose estimation. We provide a general taxonomy to cover existing approaches and analyze their capabilities and limitations. We also present a summary of extensively used datasets and metrics, and provide a quantitative comparison of some representative methods. Finally, we conclude with a discussion on realistic challenges and open problems for future research directions.
VR and AR in human performance researchAn NUS experience

DOI：10.1016/j.vrih.2020.07.009

2020, 2(5) : 381-393

With the mindset of constant improvement in efficiency and safety in the workspace and training in Singapore, there is a need to explore varying technologies and their capabilities to fulfil this need. The ability of Virtual Reality (VR) and Augmented Reality (AR) to create an immersive experience of tying the virtual and physical environments coupled with information filtering capabilities brings a possibility of introducing this technology into the training process and workspace. This paper surveys current research trends, findings and limitation of VR and AR in its effect on human performance, specifically in Singapore, and our experience in the National University of Singapore (NUS).

Multimodal interaction design and application in augmented reality for chemical experiment

DOI：10.1016/j.vrih.2020.07.005

2020, 2(4) : 291-304

Background
Augmented reality classrooms have become an interesting research topic in the field of education, but there are some limitations. Firstly, most researchers use cards to operate experiments, and a large number of cards cause difficulty and inconvenience for users. Secondly, most users conduct experiments only in the visual modal, and such single-modal interaction greatly reduces the users’ real sense of interaction. In order to solve these problems, we propose the Multimodal Interaction Algorithm based on Augmented Reality (ARGEV), which is based on visual and tactile feedback in Augmented Reality. In addition, we design a Virtual and Real Fusion Interactive Tool Suite (VRFITS) with gesture recognition and intelligent equipment.
Methods
The ARGVE method fuses gesture, intelligent equipment, and virtual models. We use a gesture recognition model trained by a convolutional neural network to recognize the gestures in AR, and to trigger a vibration feedback after a recognizing a five-finger grasp gesture. We establish a coordinate mapping relationship between real hands and the virtual model to achieve the fusion of gestures and the virtual model.
Results
The average accuracy rate of gesture recognition was 99.04%. We verify and apply VRFITS in the Augmented Reality Chemistry Lab (ARCL), and the overall operation load of ARCL is thus reduced by 29.42%, in comparison to traditional simulation virtual experiments.
Conclusions
We achieve real-time fusion of the gesture, virtual model, and intelligent equipment in ARCL. Compared with the NOBOOK virtual simulation experiment, ARCL improves the users’ real sense of operation and interaction efficiency.

Urban 3D modeling using mobile laser scanning: a review

DOI：10.1016/j.vrih.2020.05.003

2020, 2(3) : 175-212

Mobile laser scanning (MLS) systems mainly comprise laser scanners and mobile mapping platforms. Typical MLS systems can acquire three-dimensional point clouds with 1-10cm point spacings at a normal driving or walking speed in streets or indoor environments. The efficiency and stability of these systems make them extremely useful for application in three-dimensional urban modeling. This paper reviews the latest advances of the LiDAR-based mobile mapping system (MMS) point cloud in the field of 3D modeling, including LiDAR simultaneous localization and mapping, point cloud registration, feature extraction, object extraction, semantic segmentation, and processing using deep learning. Furthermore, typical urban modeling applications based on MMS are also discussed.

Intelligent virtualization of crane lifting using laser scanning technology

DOI：10.1016/j.vrih.2020.04.003

2020, 2(2) : 87-103

Background
This paper presents an intelligent path planner for lifting tasks by tower cranes in highly complex environments, such as old industrial plants that were built many decades ago and sites used as tentative storage spaces. Generally, these environments do not have workable digital models and 3D representations are impractical.
Methods
The current investigation introduces the use of cutting-edge laser scanning technology to convert real environments into virtualized versions of the construction sites or plants in the form of point clouds. The challenge is in dealing with the large point cloud datasets from the multiple scans needed to produce a complete virtualized model. The tower crane is also virtualized for the purpose of path planning. A parallelized genetic algorithm is employed to achieve intelligent path planning for the lifting task performed by tower cranes in complicated environments taking advantage of graphics processing unit technology, which has high computing performance yet low cost.
Results
Optimal lifting paths are generated in several seconds.
View Synthesis from multi-view RGB data using multi-layered representation and volumetric estimation

DOI：10.1016/j.vrih.2019.12.001

2020, 2(1) : 43-55

Background Aiming at free-view exploration of complicated scenes, this paper presents a method for interpolating views among multi RGB cameras.Methods In this study, we combine the idea of cost volume, which represent 3D information, and 2D semantic segmentation of the scene, to accomplish view synthesis of complicated scenes. We use the idea of cost volume to estimate the depth and confidence map of the scene, and use a multi-layer representation and resolution of the data to optimize the view synthesis of the main object. Results/Conclusions By applying different treatment methods on different layers of the volume, we can handle complicated scenes containing multiple persons and plentiful occlusions. We also propose the view-interpolation
$→$
multi-view reconstruction
$→$
view interpolation pipeline to iteratively optimize the result. We test our method on varying data of multi-view scenes and generate decent results.
Three-dimensional virtual-real mapping of aircraft autom-atic spray operation and online simulation monitoring

DOI：10.1016/j.vrih.2019.10.003

2019, 1(6) : 611-621

Background
This study aims at addressing the lack of closed-loop feedback optimization-enabling tool in aircraft automatic spraying systems; we systematically analyze a three-dimensional (3D) virtual-real mapping technique, namely the digital twin technique, used by the automatic spraying system.
Methods
With the sensors installed in the spraying system, the spraying working parameters are collected online and are used for driving the 3D virtual spraying system to realize the total-factor monitoring of the spraying operation. Furthermore, the operation-evaluation model is applied for analyzing and managing the key indexes of the spraying quality; once the data value of the key indexes exceeds a threshold, the operation shall be optimized automatically.
Results
This approach can effectively support the high-efficiency analysis, evaluation, and optimization of the spraying process.

Flow-based SLAM: From geometry computation to learning

DOI：10.1016/j.vrih.2019.09.001

2019, 1(5) : 435-460

Simultaneous localization and mapping (SLAM) has attracted considerable research interest from the robotics and computer-vision communities for >30 years. With steady and progressive efforts being made, modern SLAM systems allow robust and online applications in real-world scenes. We examined the evolution of this powerful perception tool in detail and noticed that the insights concerning incremental computation and temporal guidance are persistently retained. Herein, we denote this temporal continuity as a flow basis and present for the first time a survey that specifically focuses on the flow-based nature, ranging from geometric computation to the emerging learning techniques. We start by reviewing two essential stages for geometric computation, presenting the de facto standard pipeline and problem formulation, along with the utilization of temporal cues. The recently emerging techniques are then summarized, covering a wide range of areas, such as learning techniques, sensor fusion, and continuous-time trajectory modeling. This survey aims at arousing public attention on how robust SLAM systems benefit from a continuously observing nature, as well as the topics worthy of further investigation for better utilizing the temporal cues.

Effect of haptic feedback on a virtual lab about friction

DOI：10.1016/j.vrih.2019.07.001

2019, 1(4) : 428-434

Background
With the increase in recent years of the utilization of multimedia devices in education, new haptic devices for education have been gradually adopted and developed. As compared with visual and auditory channels, the development of applications with a haptic channel is still in the initial stages. For example, it is unclear how force feedback influences an instructional effect of an educational application and the subjective feeling of users.
Methods
In this study, we designed an educational application with a haptic device (Haply) to explore the effects of force feedback on self-learning. Subjects in an experiment group used a designed application to study friction by themselves using force feedback, whereas subjects in a control group studied the same knowledge without force feedback. A post-test and questionnaire were designed to assess the learning outcomes.
Results/Conclusions
The experimental result indicates that force feedback is beneficial to an educational application, and using a haptic device can improve the effect of the application and motivate students.

Review of studies on target acquisition in virtual reality based on the crossing paradigm

DOI：10.3724/SP.J.2096-5796.2019.0006

2019, 1(3) : 251-264

Crossing is a fundamental paradigm for target selection in human-computer interaction systems. This paradigm was first introduced to virtual reality (VR) interactions by Tu et al., who investigated its performance in comparison to pointing, and concluded that crossing is generally no less effective than pointing and has unique advantages. However, owing to the characteristics of VR interactions, there are still many factors to consider when applying crossing to a VR environment. Thus, this review summarizes the main techniques for object selection in VR and crossing-related studies. Then, factors that may affect crossing interactions are analyzed from the perspectives of the input space and visual space. The aim of this study is to provide a reference for future studies on target selection based on the crossing paradigm in virtual reality.

Haptic interface using tendon electrical stimulation with consideration of multimodal presentation

DOI：10.3724/SP.J.2096-5796.2019.0011

2019, 1(2) : 163-175

Background
Our previous studies have shown that electrical stimulation from the skin surface to the tendon region (Tendon Electrical Stimulation: TES) can elicit a force sensation, and adjusting the current parameters can control the amount of the sensation. TES is thought to present a proprioceptive force sensation by stimulating receptors or sensory nerves responsible for recognizing the magnitude of the muscle contraction existing inside the tendon, so it can be a proprioceptive module of a small-size, low-cost force feedback device. But there is also suspect that TES presents only strong, noisy skin sensation. From previous study, it was found that TES has some limitation on varying sensations.
Methods
In this study, in addition to characterizing the proprioceptive sensation induced by TES, we constructed a multimodal presentation system reproducing a situation in which force is applied to the hand was offered, so as to investigate whether TES contributed to the reproduction of haptics cooperating with other modalities, rather than disturbing them. Specifically, we used vibration to present a cutaneous sensation and a visual head mounted display (HMD) system to present simultaneous images. Using this system, we also evaluated the efficacy of TES itself and that of the multimodal system involving TES.
Results
We found that TES, along with visual and vibration stimulation, contributed to the perception of a certain force.
Conclusions
Thus, TES appears to be an effective component of multimodal force sense presentation systems.