Adv Search
Home | Accepted | Article In Press | Current Issue | Archive | Special Issues | Collections | Featured Articles | Statistics

2020,  2 (4):   291 - 304   Published Date:2020-8-20

DOI: 10.1016/j.vrih.2020.07.005
1 Introduction2 Multimodal interaction method based on AR 2.1 The framework of VRFITS 2.2 Design and interaction method for intelligent equipment 2.3 Gesture recognition method 2.3.1   Gesture data preprocessing 2.3.2   Gesture recognition method based on CNN 2.4 Multimodal interaction method 3 Experimental results and analysis 3.1 The AlexNet_gesture model training results 3.2 Comparative experiment 3.3 Effectiveness of the ARGEV algorithm 3.4 Application of VRFITS in ARCL 3.5 User evaluation comparison 4 Conclusion and discussion


Augmented reality classrooms have become an interesting research topic in the field of education, but there are some limitations. Firstly, most researchers use cards to operate experiments, and a large number of cards cause difficulty and inconvenience for users. Secondly, most users conduct experiments only in the visual modal, and such single-modal interaction greatly reduces the users’ real sense of interaction. In order to solve these problems, we propose the Multimodal Interaction Algorithm based on Augmented Reality (ARGEV), which is based on visual and tactile feedback in Augmented Reality. In addition, we design a Virtual and Real Fusion Interactive Tool Suite (VRFITS) with gesture recognition and intelligent equipment.
The ARGVE method fuses gesture, intelligent equipment, and virtual models. We use a gesture recognition model trained by a convolutional neural network to recognize the gestures in AR, and to trigger a vibration feedback after a recognizing a five-finger grasp gesture. We establish a coordinate mapping relationship between real hands and the virtual model to achieve the fusion of gestures and the virtual model.
The average accuracy rate of gesture recognition was 99.04%. We verify and apply VRFITS in the Augmented Reality Chemistry Lab (ARCL), and the overall operation load of ARCL is thus reduced by 29.42%, in comparison to traditional simulation virtual experiments.
We achieve real-time fusion of the gesture, virtual model, and intelligent equipment in ARCL. Compared with the NOBOOK virtual simulation experiment, ARCL improves the users’ real sense of operation and interaction efficiency.


1 Introduction
Virtual experiments are important in field of information intelligence[1]. It it is also an important research area in Human-Computer Interaction (HCI). Virtual teaching methods adopt Augmented Reality (AR) simulations that realize transformations from two-dimensional (2D) space to three-dimensional (3D) space. Such methods are conducive in improving students’ interests, and they also enrich the learning experience[2]. AR technologies include Simultaneous Localization and Mapping (SLAM)[3], card mark recognition[4], and gesture interaction technologies[5]. In the field of education, most AR studies are based on mobile phones, iPads or computers, and do not need any Head Mounted Displays (HMDs). This is conducive to user-friendly operation. Sun used the Vuforia Software Development Kit (SDK) to identify marked cards, and constructed an AR learning environment with sound animation system, gesture interaction, particle effects, real-time color mapping, and game interaction functions[6]. It has been shown in the study that learning in AR stimulates users’ imaginations. Fidan et al. developed a marked AR technology based software called FenAR to support learning activities in the classroom[7]. Studies have shown that incorporating AR into learning activities can improve the academic performance of students. However, users operate multiple cards in experiments, which makes the operation is complicated, and interaction is limited to that with the virtual model by means of external objects. Interactivity is ignored, especially in the study and understanding of middle school chemistry experiments, which cannot be learned without actual hands-on operation.
AR methods based on gesture recognition have been proposed to enhance hands-on operation. Dave et al. combined AR and gestures, and used adaptive hand segmentation and pose estimation methods to recognize gestures in virtual experiments, where users could simulate experiments with their own hands[8]. İbili et al. developed an AR geometry teaching system with gesture operations, and verified the individual differences in 3D thinking skills of middle school students[9]. They pointed out that personalized, dynamic, and intelligent learning environments are particularly important. Rani et al. proposed an interactive AR system that enabled students to naturally manipulate 3D objects directly using gestures[10]. Studies have shown that systems based on gesture and AR interaction are easy to use and attractive. However, these studies focus only on single modal gesture interaction with heavy operation loads, and the interaction efficiencies are also low. For example, if only gestures are used in simulated chemistry experiments, complex and similar gestures might be misrecognized, thus increasing the operation load. This leads to a reduction in the interaction efficiency and operation.
Gesture recognition is an essential part of achieving AR effects. Gesture recognition methods include deep learning based methods[11,12], Hidden Markov Models (HMM)[13,14], and geometric features[15,16]. These methods are used to implement gestures and information transmission between visions. Such methods have also been used to realize information transfer between gesture and vision. Wu et al. used the Deep Belief Network (DBN) and 3D Convolutional Neural Network (3DCNN) fusion, and provided the gesture classification probability as input into the HMM model to realize gesture recognition[17]. Elmezain et al. used the HMM model, and recognized the dynamic trajectories of gestures[18]. Priyal et al. used matrix feature normalization to identify the geometry of gestures[19]. Liang et al. used the random forest method to classify gestures, and operated gestures on 3D virtual objects to achieve a seamless fusion of real scenes and virtual objects[20]. According to their research, deep learning methods have promising performance in gesture recognition problems due to their strong fit, and are advantageous over other methods. However, the efficiency of gesture recognition is relatively low in practical applications.
In virtual experiments, the virtual effect presented by AR has strong immersion. The fusion of physical and virtual objects and the fusion interaction of multiple modalities can further improve user interactivity. However, the experiments used in previous studies are not easy to operate, and gesture recognition is inaccurate with heavy operation loads. To address these challenges, we present the following contributions of this study:
(1) We use gesture recognition instead of card mark recognition in AR, and propose Multimodal Interaction Algorithm based on Augmented Reality (ARGEV), based on gestures and sensors. Using Microsoft Kinect, ARGEV turns complex AR tasks into simple coordinate transformation problems, achieving the fusion of users’ hands, virtual scenes, and sensors. It solves real-time interaction of gestures and virtual objects, and improves users’ interaction efficiency.
(2) We combine gestures and intelligent equipment to design a virtual and real fusion interaction tool suite (VRFITS), which solves the interaction between physical objects and virtual models, and then triggers perceptual feedback when gestures interact with the virtual model, enhancing the user’s sense of real operation.
(3) An Augmented Reality Chemistry Lab (ARCL) is designed, which is operated through the interaction of users’ hands, intelligent equipment, and a virtual model.
2 Multimodal interaction method based on AR
VRFITS includes intelligent equipment and multimodal interaction. With visual and tactile feedback, the intelligent equipment can enhance the realism of user operations, and the gesture interaction method is used to identify the users’ gestures. Then, a vibrational feedback is triggered by the intelligent equipment.
2.1 The framework of VRFITS
The structure of the framework includes the model establishment stage, gesture recognition and interaction stage, and system application phase stage (Figure 1). In the model building stage, we process the gesture depth map, and deep learning is used to train the gesture recognition model through a Convolutional Neural Network (CNN). In the gesture recognition and interaction phase, we use the gesture recognition model to recognize gestures, and input the gesture depth map to the model. Then, we achieve consistent gesture recognition results through coordinate binding in virtual scenes and real scenes. Fusion of the gestures and virtual model triggers vibration in the intelligent equipment. In the system display stage, we implement ARCL.
2.2 Design and interaction method for intelligent equipment
Intelligent sensing equipment such as somatosensory devices, Google glass, Microsoft HoloLens and Kinect etc. have become abundant in recent times. Users can use such sensing devices for learning or home entertainment. Researchers can also develop new research based on sensing devices, but the price is very high, which is unrealistic for a large number of teaching classrooms in the Midwest of China. In addition, most equipment cannot be reused in real experiments, which results in wastage. Our design addresses this by using intelligent equipment with inexpensive sensors.
VRFITS uses sensors in the intelligent equipment (Figure 2) to detect changes in external signals. It connects to the I/O port of the STM32103 main control chip through the signal output port, and uses the serial transmission mode to transmit signals to a computer through the EIA-RS232 module. Finally, the serial data are read in the Unity3D platform. In the processing of the vibrator, the serial port sends the data from the Unity3D platform, and the information is transmitted to the STM32103 main control chip through the serial port to control the vibrator.
The intelligent equipment includes an intelligent ring and two touch sensors, and the approximate cost to make it is 30 Yuan. We set up a vibrator on the intelligent ring, and during the experiment, we put the ring on the little thumb finger of the user’s right hand. When the user grabs a virtual object, the sensor induces vibration. The specific perceptual processes (in no particular no order) are as follows:
(1) If all five fingers of a user grab a virtual object, the gesture is recognized. The system sends an input “00” to the serial port, and the information is transmitted to the STM32103 main control chip through the serial port. The vibrator sensor shock is set for 5 seconds, and no vibration is induced if all five fingers do not grab the virtual object.
(2) When the user’s hand touches the tactile sensor, the perceived touch intensity of is calculated, and the average touch intensity is obtained through repeated measurements. If a touch intensity is greater than the average value, the system receive signal Noe, if the Noe=1 indicate the buttons of start experiment and the Noe=2 indicate the buttons end experiment. If there no signal received, it means that semantics are not expressed.
The intelligent equipment is suitable for AR and Virtual Reality (VR) experimental scenes with gesture recognition, and combines gesture behavior with the vibrator to trigger tactile feedback.
2.3 Gesture recognition method
2.3.1   Gesture data preprocessing
We take a chemistry experiment as an example of the virtual experiment system, and count the types of natural gestures used by students or teachers in conducting the experiments. We investigate and design six gestures commonly used in the experiment. First, we use Kinect to collect depth maps of the human body, and collect 10000 pieces of six type. Then, we obtain the depth information and the coordinates of the centroid position of the human hand by Kinect to segment the gesture depth map from the collected body image. A point 3cm from the centroid position is used as the threshold value. If the distance is larger than the threshold value, the human hand area will be exceeded. Then, the human hand area is cut to obtain a gesture depth map of 200
200 pixels. For simplicity, we consider similar gestures such as a two-finger spread and a three-finger stretch as representing the same semantics. The pre-processed six gesture depth maps and their definitions are shown in Table 1.
Definition of six static gestures
Number Name Depth map Gesture state Presentation semantics
1 Five-finger grab
The five fingers are fists Grab
2 One-finger stretch
The forefinger extended Click
3 Two-finger spread
Open thumb and index finger Amplify
4 Two-finger stretch
Forefinger and middle finger are opened like scissors Whirl
5 Three-finger stretch
Open thumb, index and middle fingers Amplify
6 Five-finger spread
Five fingers are open Put down
2.3.2   Gesture recognition method based on CNN
Then, we build the AlexNet network structure model. The AlexNet structure of CNNs is advantageous, as it can learn richer and higher-dimensional image features. AlexNet uses random inactivation dropout and data enhancement methods to effectively suppress overfitting, and uses the Rectified Learning Units (ReLU) function instead of the sigmoid function as the activation function. Therefore, for the six kinds of static gesture depth maps, we train a gesture recognition model based on AlexNet. The network structure includes five convolutional layers, three pooling layers, three fully connected layers, and a Softmax classification function. We choose each gesture 10000 depth maps, and each gesture depth map is initially set to resolution of 200
200 pixels. Then, we obtain the gesture recognition model through the AlexNet network. The network structure is shown in Figure 3.
In AlexNet, the optimal number of iterations epoch set 20000, the number of each training or test batch size is 20, the type of padding is SAME, the value dropout is 0.8, and the generalization performance is evaluated every 20 depth maps. The step size and convolution kernel size of each layer is shown in Figure 3. The process of training the gesture recognition model is as follows:
(1) Using Kinect, we obtain the depth information and depth maps of human bone nodes, and evaluate the human hand area according to the threshold value to generate a gesture depth map.
(2) We divide the dataset into a training set and a test set in the ratio is 7 3.
(3) The training set is input into AlexNet, and the gesture depth features are extracted by continuously updating the weights according to the following equation:
x i m = f ( j = 1 n x j m - 1 w i j m + b i m )
is the number of current layers, n is the number of neurons in the previous layer,
w i j m
is the connection weight of a neuron
in layer
and a neuron
in the previous layer, and
b i m
is the
feature bias after
convolution layers.
(4) After calculating the Softmax layer, the vector v is obtained. Each dimension of the vector v represents the probability of the prediction type. The prediction probability is given by:
P l = e v l k e v k
v l
represents the l-th element of the vector v, and
P l
is the predicted probability of the l-th element in the vector v.
i ( 0,6 ]
Finally, we obtain a trained gesture recognition model (AlexNet_gesture), and encapsulate and import it into ARCL.
2.4 Multimodal interaction method
We use the Kinect RGB sensor to build a real environment. The Kinect depth camera captures the depth map of the hand. We realize AR by establishing a coordinate calibration of the real space and virtual space in Unity3D. The ARGVE method achieves tactile and visual fusion interaction. While conducting the experiment, the user directly observes the experimental phenomena and scenes in AR through a computer screen, without wearing any HMDs. If the user makes a five-finger grab gesture, the vibrational feedback is triggered. The process is shown in Figure 4.
First, we encapsulate the AlexNet_gesture model and denote right hand gesture as
G e s _ R
, and left hand gesture as
G e s _ L
, which are given by:
G e s _ R { R 1 , R 2 , R 3 , R 4 , R 5 , R 6 }
G e s _ L { L 1 , L 2 , L 3 , L 4 , L 5 , L 6 }
R 1   -   R 6
L 1   - L 6
represent the Number 1-6 in Table 1. Then, we call AlexNet_gesture in ARCL to establish consistent hand coordinates and virtual coordinates, given by:
θ = ( k x , k y , k z )
is the depth 3D coordinate under Kinect. According to the mapping of the coordinates of the hand joints in real space and depth 3D coordinates, the mapping relationship between the joint point coordinates and the virtual scene is determined as:
k x k y k z = t U x U y U z + d x d y d z
( U x , U y , U z )
is the virtual scene coordinate in Unity3D,
is the ratio corresponding to the 3D coordinates of the real-world scene and the virtual scene, and
( d x , d y , d z )
is the intercept value at the virtual scene coordinates.
In the ARGEV algorithm, we input gesture depth maps and sensor signals, perform multimodal interaction, and output vibrational feedback and visual effects. Visual effects include designed animations, particle effects, and dumping effects of virtual beakers in Unity3D. The specific gesture interaction algorithm is as follows:

Algorithm 1: Multimodal Interaction Algorithm based on AR (ARGEV algorithm)

Input: gesture depth map, sensor signal Noe;

Output: vibration feedback, visual effects;

1. Use the Kinect depth camera to obtain the (n-1) frame gesture depth map and input it into the AlexNet_gesture model for gesture recognition;

2. The n frame gesture depth map is obtained, the joint point coordinates

S n - 1 ( θ n - 1 )
S n ( θ n )
of frame (n-1) and n, respectively, are recorded;

3. if (Noe = 1) then the experiment begins, and virtual equipment appears in the scene.

4. if (Noe = 2) then the experiment ends.

5. if (

S n - 1 θ n - 1 = S n ( θ n )
) then

  while (

G e s _ R G e s _ L
) do

    if (R1) then set the

P i s _ v
as three-dimensional coordinate of the virtual model


S n θ n = P i s _ v
, send the “00” data to the microcontroller to trigger the intelligent ring

    vibration end if

   if (L1) then the effect of the prompt box of the selected experimental equipment appears on the system interface end if

    if (R4) then the current virtual equipment is dumping end if

    if (R6) then drop the current virtual equipment end if

   end while

end if

6. if (

S n - 1 θ n - 1 S n ( θ n )
) then return 1 end if

3 Experimental results and analysis
3.1 The AlexNet_gesture model training results
During the training process, this article sets 20 iterations to detect the accuracy and loss value changes, and visualize it through the Tensorboard method. The AlexNet_gesture model accuracy and loss curve are shown in Figure 5.
In Figure 5, the accuracy rate in the training process gradually tends to 1, and the Loss value changes from large values to stable values, and then tends to 0. This proves that the trained model is continuously optimized effectively.
3.2 Comparative experiment
We design two sets of comparative experiments. The first set is the comparison of the accuracy of each gesture before and after model optimization, and the second set is the comparison of the training results of AlexNet, GoogleNet and VGG16Net models. We use the pre-processed gesture depth map and 3000 test pictures in the comparative sets. The experimental results are shown in Figure 6.
It can be seen that the average accuracy of gesture recognition after optimization is 99.04%, which is about 2% higher than that before optimization. The recognition effect for similar gestures 2 and 3 is also better than that before optimization. The accuracy of the gesture recognition model obtained by the AlexNet model training is better than the other two network models before or after optimization, and the optimized model is improved by about 1%-3%.
3.3 Effectiveness of the ARGEV algorithm
We verify the effectiveness of the ARGEV algorithm by evaluating whether the coordinates of the gesture and the coordinates of the virtual model are consistent. When the user makes a five-finger grab gesture, we record the three-dimensional coordinates of the hand and the virtual model simultaneously.
In Figure 7, we label the 3D coordinates of the gesture trajectories with different colors. The 3D coordinates of Figures 5a and 5b are identical at the same time period, which proves that the gesture recognition algorithm is effective.
3.4 Application of VRFITS in ARCL
We build an interactive virtual and real fusion environment in the intelligent equipment, real hands, and virtual models. We use an Intel® Core™ i7-8550 CPU, Kinect 3.0, and intelligent equipment. We set up experimental scenarios in Unity 3D, and use C# as the programming language. Finally, we apply the ARGEV algorithm and intelligent equipment in ARCL.
VRFITS can be used repeatedly, and it can also help students avoid the dangers in conducting experiments unsupervised. We aim to help students focus on conducting chemistry experiments, enhance observation and learning, and to solve problems such as difficulties and fears associated with experiments, and lack of experimental reagents. We validate ARGEV through an example natrium and water reaction chemistry experiment in ARCL. In the natrium and water experiments, it is considered that the user makes the following gestures: grab, putting down, and dumping. Therefore, in order to facilitate operation in ARCL, we choose three gestures for experimental verification. The flowchart of ARCL is shown in Figure 8.
Natrium and water experiments are the main chemistry experiments in middle school chemistry classes. However, the appropriate amount of natrium and water reaction to generate gas during the user experiment, and a large amount of natrium and water reaction to explode, making the experiment difficult to observe and operate. In order to allow students to better experience the experimental process, this paper takes the natrium and water reaction as an example to present the experimental mechanism in VRCL. In the virtual scene, we add prompt windows, virtual experimental equipment, particle effects, and animation effects to enhance the immersiveness during the experiment. The effect of the experimental operation is shown below.
In Figure 9, the red box indicates operation prompts and scene effects, and the yellow box indicates user operation behavior. After the user touches the start key to begin the experiment, the system presents the AR scene. In the prompt box, the user selects the experimental equipment with the five-finger grab gesture, and the intelligent ring vibrates (a) User dumps the virtual breaker (b) then take out the virtual equipment needed for the experiment.
In Figure 10, it is seen that (a) the user uses a virtual knife to cut the natrium block (b) uses tweezers and takes a small piece of natrium and puts it into a beaker filled with water (c) the user can observe five phenomena of natrium and water reaction and add real video verify authenticity (d) the user selects the beaker again, and (e) puts it on the table, and takes a large piece of natrium into the beaker with tweezers, explosion scenes can be seen (f) finally, the user ends the experiment by pressing the end key.
3.5 User evaluation comparison
We choose natrium and water experiments of the NOBOOK virtual experiment platform[21] and ARCL to compare performance according to user evaluation (Figure 11). The natrium and water experiment of NOBOOK is a VR simulation experiment, and the system uses a mouse as the input device.
We invited ten teachers and thirty students to complete the evaluation for NOBOOK and ARCL. In order to verify that the experimental system conforms to teaching applications, we set the following seven comparative aspects: “teaching evaluation,” “experimental interest,” “experimental interaction,” “learning effect,” “system stability,” “experimental hints,” and “ease of operation” as VET_P, and set the VET_P to VET_P1-VET_P7 in order. After the operation, ten teachers compared the two systems using VET_P, and each score was divided into 5 levels, by the contrast score was increased in order (Figure 12). The NOBOOK virtual experiment platform as A and the ARCL as B. We use the ANOVA to assess the significance of each factor (Table 2). The significance
is 0.05.
ANOVA statistical result
VET_P Experiment platform average value variance SS MS F P-value F crit
VET_P1 A 2.2 0.844 18.05 18.05 25.992 7.51 4.419
B 4.1 0.544
VET_P2 A 2.8 0.622 11.25 11.25 26.299 7.04 4.419
B 4.3 0.233
VET_P3 A 2.7 0.456 8.45 8.45 10.787 0.004 4.419
B 4 1.111
VET_P4 A 1.9 0.767 20 20 26.087 7.36 4.419
B 3.9 0.767
VET_P5 A 4.2 0.711 0.2 0.2 0.409 0.530 4.419
B 4.6 0.267
VET_P6 A 1.2 0.177 28.8 28.8 51.84 1.06 4.419
B 3.6 0.933
VET_P7 A 2.7 0.455 5 5 14.516 0.012 4.419
B 3.7 0.233
The teachers were of the opinion that the results of both methods were very satisfactory, and expressed that they could correctly operate the experiments and observe the experimental phenomena. However, the teachers’ evaluation of the two experimental patterns are very different, the average value of the ARCL is 29.42% higher than NOBOOK experiment. The teachers’ evaluation on VET_P5 are different, and it can be seen in Table 2 that F is less than F crit, which shows that the two systems are relatively stable. For VET_P1, VET_P3, and VET_P7, the overall evaluation of ARCL was 28% higher than that for NOBOOK, and it can be seen that the ARCL operation is simpler and more convenient for teachers to teach than the NOBOOK operation. In addition, in the evaluation of VET_P4 and VET_P6, it can be seen that ARCL is 40% and 50% higher than NOBOOK, respectively. It further illustrates that in ARCL with multimodal interactions, users are more immersed in the experiment, with better learning effects. From VET_P6, the NOBOOK experiment needs to be explored, and ARCL can be operated according to the prompt box, even if the user is not familiar with the virtual experimental environment. The user does not need to waste much time during the experiment, which improves the efficiency of experimental interaction.
In addition, we also evaluate the operating load of this system based on the National Aeronautics and Space Administration Task Load Index (NASA-TLX)[22] cognitive load assessment. We use the NASA-TLX evaluation criteria for the “mental demand (MD),” “physical demand (PD),” “temporary demand (TD),” “effort (E),” “performance (P),” and “frustration level (FD)” to evaluate the scores of the two systems. According to the NASA-TLX evaluation, all users experienced NOBOOK and ARCL separately, and statistically evaluated the average score. The comparative score is the same as that in the VET_P assessment using a five-point scale. The NASA-TLX model evaluation of the two systems is shown in Figure 13.
It can be seen that the two experiments on the evaluation and of MD are different, which means that the user consumes less mental in the operation experiment. However, in the other five indicators, the evaluation score of ARCL is significantly lower than that in NOBOOK, and the overall cognitive operation load of ARCL is reduced by 27.42%. This proves that the VRFITS interaction efficiency is improved. Thus, virtual and real fusion interaction improves the users’ interaction with the virtual model, and the immersion of the operation experiment.
4 Conclusion and discussion
We propose VRFITS, which contains an intelligent equipment and gesture interaction method. The suite is suitable for any AR experiment with gesture operation. We achieve the combination of gestures, sensors, and virtual models in AR. An intelligent equipment and a gesture interaction method assist each other, and the gesture can trigger vibrational feedback. In addition, we design and implement a prototype system called ARCL. According to user evaluation, ARCL increases the interactivity and real sense of operation in virtual experiments, compared to NOBOOK, while reducing the user operation load, and improving the user interaction efficiency. In addition, compared with AR card recognition of the Vuforia SDK, ARCL discards multiple card operations, and uses different gesture commands to trigger different virtual models, which makes the operation more convenient and effective.
However, our work has certain limitations. On the one hand, there are relatively few types in gesture recognition, so there is a lack of gesture types in the interactive process of users in virtual experiments. On the other hand, in the virtual chemistry experiment system, the particle effect, animation effect, and virtual model rendering effect in the experimental scene are not prominent, and the interface effect of the system should be improved in the future.



Collazos C A, Merchan L. Human-computer interaction in Colombia: bridging the gap between education and industry. IT Professional, 2015, 17(1): 5–9 DOI:10.1109/mitp.2015.8


Desai K, Belmonte U H H, Jin R, Prabhakaran B, Diehl P, Ramirez V A, Johnson V, Gans M. Experiences with multi-modal collaborative virtual laboratory (MMCVL). In: 2017 IEEE Third International Conference on Multimedia Big Data (BigMM). Laguna Hills, CA, USA, IEEE, 2017, 376–383 DOI:10.1109/bigmm.2017.62


Chen L, Tang W, John N W, Wan T R, Zhang J J. SLAM-based dense surface reconstruction in monocular Minimally Invasive Surgery and its application to Augmented Reality. Computer Methods and Programs in Biomedicine, 2018, 158: 135–146 DOI:10.1016/j.cmpb.2018.02.006


Huynh B, Orlosky J, Höllerer T. In-situ labeling for augmented reality language learning. In: 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). Osaka, Japan, IEEE, 2019, 1606–1611 DOI:10.1109/vr.2019.8798358


Karambakhsh A, Kamel A, Sheng B, Li P, Yang P, Feng D D. Deep gesture interaction for augmented anatomy learning. International Journal of Information Management, 2019, 45: 328–336 DOI:10.1016/j.ijinfomgt.2018.03.004


Sun C X. The design and implementation of children's education system based on augmented reality. The Shandong University, 2017


Fidan M, Tuncel M. Integrating augmented reality into problem based learning: the effects on learning achievement and attitude in physics education. Computers & Education, 2019, 142: 103635 DOI:10.1016/j.compedu.2019.103635


Dave I R, Chaudhary V, Upla K P. Simulation of analytical chemistry experiments on augmented reality platform. In: Advances in Intelligent Systems and Computing. Singapore: Springer Singapore, 2018, 393–403 DOI:10.1007/978-981-13-0224-4_35


İbili E, Çat M, Resnyansky D, Şahin S, Billinghurst M. An assessment of geometry teaching supported with augmented reality teaching materials to enhance students' 3D geometry thinking skills. International Journal of Mathematical Education in Science and Technology, 2020, 51(2): 224–246 DOI:10.1080/0020739x.2019.1583382


Rani S S, Dhrisya K J, Ahalyadas M. Hand gesture control of virtual object in augmented reality. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI). Udupi, India, IEEE, 2017, 1500–1505 DOI:10.1109/icacci.2017.8126053


Skaria S, Al-Hourani A, Lech M, Evans R J. Hand-gesture recognition using two-antenna Doppler radar with deep convolutional neural networks. IEEE Sensors Journal, 2019, 19(8): 3041–3048 DOI:10.1109/jsen.2019.2892073


Côté-Allard U, Fall C L, Drouin A, Campeau-Lecours A, Gosselin C, Glette K, Laviolette F, Gosselin B. Deep learning for electromyographic hand gesture signal classification using transfer learning. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2019, 27(4): 760–771 DOI:10.1109/tnsre.2019.2896269


Sinha K, Kumari R, Priya A, Paul P. A computer vision-based gesture recognition using hidden Markov model. In: Innovations in Soft Computing and Information Technology. Singapore: Springer Singapore, 2019, 55–67 DOI:10.1007/978-981-13-3185-5_6


Zhang L Z, Zhang Y R, Niu L D, Zhao Z J, Han X W. HMM static hand gesture recognition based on combination of shape features and wavelet texture features. Wireless and Satellite Systems, 2019, 187–197 DOI:10.1007/978-3-030-19156-6_18


Ahmad S U D, Akhter S. Real time rotation invariant static hand gesture recognition using an orientation based hash code. 2013 International Conference on Informatics, Electronics and Vision (ICIEV), 2013, 1–6 DOI:10.1109/iciev.2013.6572620


Saba T, Rehman A, Harouni M. Cursive multilingual characters recognition based on hard geometric features. International Journal of Computational Vision and Robotics, 2020, 10(3): 213 DOI:10.1504/ijcvr.2020.10029034


Wu D, Pigou L, Kindermans P J, Le N D H, Shao L, Dambre J, Odobez J M. Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(8): 1583–1597 DOI:10.1109/tpami.2016.2537340


Elmezain M, Al-Hamadi A, Michaelis B. Hand trajectory-based gesture spotting and recognition using HMM. In: 2009 16th IEEE International Conference on Image Processing (ICIP). Cairo, Egypt, IEEE, 2009, 3577–3580 DOI:10.1109/icip.2009.5414322


Padam Priyal S, Bora P K. A robust static hand gesture recognition system using geometry based normalizations and Krawtchouk moments. Pattern Recognition, 2013, 46(8): 2202–2219 DOI:10.1016/j.patcog.2013.01.033


Liang H, Yuan J S, Thalmann D, Thalmann N M. AR in hand: egocentric palm pose tracking and gesture recognition for augmented reality applications. In: Proceedings of the 23rd ACM International Conference on Multimedia-MM'15. Brisbane, Australia, York New, Press ACM, 2015, 743–744 DOI:10.1145/2733373.2807972


Wang J. Research on the application of virtual simulation experiment in physics experiment teaching of senior high school. 2018


Law K E, Lowndes B R, Kelley S R, Blocker R C, Larson D W, Hallbeck M S, Nelson H. NASA-task load index differentiates surgical approach: opportunities for improvement in colon and rectal surgery. Annals of Surgery, 2020, 271(5): 906–912 DOI:10.1097/sla.0000000000003173