Review of dynamic gesture recognition
Department of Computer and Technology, University of Xidian, Xi'an 710071, China
Abstract
Keywords: Video-based gesture recognition ; Deep learning ; Convolutional neural networks ; Human-computer interaction
Content

Dataset | Year | Acquisition device | Modality | #Classes | #Subjects | #Samples | #Scenes | Metric |
---|---|---|---|---|---|---|---|---|
20BN-jester dataset | 2019 | Laptop camera or webcam | RGB | 27 | 1376 | 148092 | - | Accuracy |
Montalbano dataset (V2) | 2014 | Kinect v1 | RGB, D S, UM | 20 | 27 | 13858 | - | Jaccard index |
ChaLearn LAP IsoGD | 2016 | Kinect v1 | RGB, D | 249 | 21 | 47933 | - | Accuracy |
DVS128 Gesture dataset | 2017 | DVS128 and webcam | RGB | 11 | 29 | 1342 | 1 | Accuracy |
SKIG | 2013 | Kinect v1 | RGB, D | 10 | 6 | 2160 | 3 | Accuracy |
EgoGesture dataset | 2018 | Intel RealSense | RGB, D | 83 | 50 | 24161 | 6 | Accuracy |







Fusion levels | Literature | Strategy | Dataset | Accuracy/ Jaccard index(%) |
---|---|---|---|---|
Data-level | [54] | Motion fused | 20BN Jester dataset V1 | 96.28 |
ChaLearn LAP IsoGD | 57.40 | |||
Feature-level | [57] | AvgFusion | SKIG | 99.5 |
ChaLearn LAP IsoGD | 58.65 | |||
[58] | Concatenation | ChaLearn LAP IsoGD | 64.40 | |
[59] | CCA | ChaLearn LAP IsoGD | 68.14 | |
[60] | Concatenation | ChaLearn LAP ConGD | 26.55 | |
[61] | Concatenation | Montalbano | 88.90 | |
Decision-level | [62] | Sparse fusion | ChaLearn LAP IsoGD | 80.96 |
[63] | ModDrop fusion | Montalbano | 85.00 |






Literature | Strategy | Dataset | Accuracy/Jaccard index(%) |
---|---|---|---|
[85] | CNN | Montalbano | 78.90 |
[92] | DNN+DCNN | Montalbano | 81.62 |
[86] | Two-stream+RNN | Montalbano | 91.70 |
[34] | Two-stream | 20BNJester dataset V1 | 96.28 |
ChaLearn LAP IsoGD | 57.40 | ||
[35] | C3D | ChaLearn LAP IsoGD | 49.20 |
[40] | C3D+Pyramid | ChaLearn LAP IsoGD | 50.93 |
[38] | ResC3D | ChaLearn LAP IsoGD | 67.71 |
[90] | RNN | Montalbano | 90.6 |
[39] | ResC3D+Attention | SKIG | 100.0 |
ChaLearn LAP IsoGD | 68.14 | ||
[36] | R3DCNN+RNN | SKIG | 98.60 |
Montalbano | 97.40 | ||
[33] | C3D+LSTM | SKIG | 98.89 |
ChaLearn LAP IsoGD | 51.02 | ||
[37] | C3D+LSTM | SKIG | 98.89 |
ChaLearn LAP IsoGD | 51.02 |
Reference
Card S K, Moran T P, Newell A. The psychology of human-computer interaction. Hillsdale, New Jersey, Lawrence Erlbaum Associates, 1983
Pollick A S, de Waal F B M. Ape gestures and language evolution. PNAS, 2007, 104(19): 8184–8189 DOI:10.1073/pnas.0702624104
Chen L, Ma N, Wang P, Li J H, Wang P F, Pang G L, Shi X J. Survey of pedestrian action recognition techniques for autonomous driving. Tsinghua Science and Technology, 2020, 25(4): 458–470 DOI:10.26599/tst.2019.9010018
D'Sa A G, Prasad B G. A survey on vision based activity recognition, its applications and challenges. In: 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP). Gangtok, India, IEEE, 2019, 1–8 DOI:10.1109/icaccp.2019.8882896
Devi M, Saharia S, Bhattacharyya D K. Dance gesture recognition: a survey. International Journal of Computer Applications, 2015, 122(5): 19–26 DOI:10.5120/21696-4803
Wang Z J, Hou Y S, Jiang K K, Dou W W, Zhang C M, Huang Z H, Guo Y J. Hand gesture recognition based on active ultrasonic sensing of smartphone: a survey. IEEE Access, 2019, 7: 111897–111922 DOI:10.1109/access.2019.2933987
Xia Z W, Lei Q J, Yang Y, Zhang H D, He Y, Wang W J, Huang M H. Vision-based hand gesture recognition for human-robot collaboration: a survey. In: 2019 5th International Conference on Control, Automation and Robotics (ICCAR). Beijing, China, IEEE, 2019, 198–205 DOI:10.1109/iccar.2019.8813509
Martínez B M, Modolo D, Xiong Y J, Tighe J. Action recognition with spatial-temporal discriminative filter banks. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South), IEEE, 2019, 5481–5490 DOI:10.1109/iccv.2019.00558
Diba A, Sharma V, Van Gool L, Stiefelhagen R. DynamoNet: dynamic action and motion network. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South), IEEE, 2019, 6191–6200 DOI:10.1109/iccv.2019.00629
Feichtenhofer C, Fan H Q, Malik J, He K M. SlowFast networks for video recognition. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South), IEEE, 2019, 6201–6210 DOI:10.1109/iccv.2019.00630
Bhowmick S, Talukdar A K, Sarma K K. Continuous hand gesture recognition for English alphabets. In: 2015 2nd International Conference on Signal Processing and Integrated Networks (SPIN). Noida, India, IEEE, 2015, 443–446 DOI:10.1109/spin.2015.7095264
Lu Z Y, Chen X, Li Q, Zhang X, Zhou P. A hand gesture recognition framework and wearable gesture-based interaction prototype for mobile devices. IEEE Transactions on Human-Machine Systems, 2014, 44(2): 293–299 DOI:10.1109/thms.2014.2302794
Zhang Y, Harrison C. Tomo: wearable, low-cost electrical impedance tomography for hand gesture recognition. UIST'15: Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 2015, 167–173 DOI:10.1145/2807442.2807480
Bobick A F, Davis J W. The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(3): 257–267 DOI:10.1109/34.910878
Konečný J, Hagara M. One-shot-learning gesture recognition using HOG-HOF features. In: Gesture Recognition. Cham: Springer International Publishing. 2017, 365–385 DOI:10.1007/978-3-319-57021-1_12
Donahue J, Hendricks L A, Guadarrama S, Rohrbach M, Venugopalan S, Darrell T, Saenko K. Long-term recurrent convolutional networks for visual recognition and description. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA, IEEE, 2015, 2625–2634 DOI:10.1109/cvpr.2015.7298878
Huang Y J, Yang X C, Li Y F, Zhou D L, He K S, Liu H H. Ultrasound-based sensing models for finger motion classification. IEEE Journal of Biomedical and Health Informatics, 2018, 22(5): 1395–1405 DOI:10.1109/jbhi.2017.2766249
Yang X C, Sun X L, Zhou D L, Li Y F, Liu H H. Towards wearable A-mode ultrasound sensing for real-time finger motion recognition. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2018, 26(6): 1199–1208 DOI:10.1109/tnsre.2018.2829913
Manawadu U E, Kamezaki M, Ishikawa M, Kawano T, Sugano S. A hand gesture based driver-vehicle interface to control lateral and longitudinal motions of an autonomous vehicle. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Budapest, Hungary, IEEE, 2016, 001785–001790 DOI:10.1109/smc.2016.7844497
Kim J, Jung H, Kang M, Chung K. 3D human-gesture interface for fighting games using motion recognition sensor. Wireless Personal Communications, 2016, 89(3): 927–940 DOI:10.1007/s11277-016-3294-9
Yuan X, Dai S, Fang Y Y. A natural immersive closed-loop interaction method for human-robot “rock-paper-scissors” game. Recent Trends in Intelligent Computing, Communication and Devices, 2020, 103–111 DOI:10.1007/978-981-13-9406-5_14
Lichtenauer J F, Hendriks E A, Reinders M J T. Sign language recognition by combining statistical DTW and independent classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(11): 2040–2046 DOI:10.1109/tpami.2008.123
Cooper H, Ong E, Pugeault N, Bowden R. Sign language recognition using sub-units. Journal of Machine Learning Research, 2012, 2205–2231
Yang D, Lim J K, Choi Y. Early childhood education by hand gesture recognition using a smartphone based robot. In: The 23rd IEEE International Symposium on Robot and Human Interactive Communication. Edinburgh, UK, IEEE, 2014, 987–992 DOI:10.1109/roman.2014.6926381
Ismail Fawaz H, Forestier G, Weber J, Petitjean F, Idoumghar L, Muller P A. Automatic alignment of surgical videos using kinematic data. In: Artificial Intelligence in Medicine. Cham: Springer International Publishing, 2019, 104–113 DOI:10.1007/978-3-030-21642-9_14
Lu X Z, Shen J, Perugini S, Yang J J. An immersive telepresence system using RGB-D sensors and head mounted display. In: 2015 IEEE International Symposium on Multimedia (ISM). Miami, FL, USA, IEEE, 2015, 453–458 DOI:10.1109/ism.2015.108
Cheng K, Ye N, Malekian R, Wang R C. In-air gesture interaction: real time hand posture recognition using passive RFID tags. IEEE Access, 2019, 7: 94460–94472 DOI:10.1109/access.2019.2928318
Trong K N, Bui H, Pham C. Recognizing hand gestures for controlling home appliances with mobile sensors. In: 2019 11th International Conference on Knowledge and Systems Engineering (KSE). Da Nang, Vietnam, IEEE, 2019, 1–7 DOI:10.1109/kse.2019.8919419
Escalera S, Athitsos V, Guyon I. Challenges in multimodal gesture recognition. Gesture recognition, 2017, 1–60
D’Orazio T, Marani R, Renò V, Cicirelli G. Recent trends in gesture recognition: how depth data has improved classical approaches. Image and Vision Computing, 2016, 52: 56–72 DOI:10.1016/j.imavis.2016.05.007
Nyaga C, Wario R. A Review of Sign Language Hand Gesture Recognition Algorithms. In: Advances in Artificial Intelligence, Software and Systems Engineering. Cham, Springer International Publishing, 2021, 207–216
Rautaray S S, Agrawal A. Vision based hand gesture recognition for human computer interaction: a survey. Artificial Intelligence Review, 2015, 43(1): 1–54 DOI:10.1007/s10462-012-9356-9
Cheng H, Yang L, Liu Z C. Survey on 3D hand gesture recognition. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 26(9): 1659–1673 DOI:10.1109/tcsvt.2015.2469551
Khan R Z, Ibraheem N A. Survey on gesture recognition for hand image postures. Computer and Information Science, 2012, 5(3): 110–121 DOI:10.5539/cis.v5n3p110
Devi M, Saharia S, Bhattacharyya D K. Dance gesture recognition: a survey. International Journal of Computer Applications, 2015, 122(5): 19–26 DOI:10.5120/21696-4803
Gao Z M, Wang P C, Wang H G, Xu M L, Li W Q. A review of dynamic maps for 3D human motion recognition using ConvNets and its improvement. Neural Processing Letters, 2020, 52(2): 1501–1515 DOI:10.1007/s11063-020-10320-w
Sun J H, Ji T T, Zhang S B, Yang J K, Ji G R. Research on the hand gesture recognition based on deep learning. In: 2018 12th International Symposium on Antennas, Propagation and EM Theory (ISAPE). Hangzhou, China, IEEE, 2018, 1–4 DOI:10.1109/isape.2018.8634348
Jiang D, Li G F, Sun Y, Kong J Y, Tao B, Chen D S. Grip strength forecast and rehabilitative guidance based on adaptive neural fuzzy inference system using sEMG. Personal and Ubiquitous Computing, 2019, 1–10 DOI:10.1007/s00779-019-01268-3
Guo X, Xu W, Tang W Q, Wen C. Research on optimization of static gesture recognition based on convolution neural network. In: 2019 4th International Conference on Mechanical, Control and Computer Engineering (ICMCCE). Hohhot, China, IEEE, 2019, 398–3982 DOI:10.1109/icmcce48743.2019.00095
Sharma P, Anand R S. Depth data and fusion of feature descriptors for static gesture recognition. IET Image Processing, 2020, 14(5): 909–920 DOI:10.1049/iet-ipr.2019.0230
Jiang D, Li G F, Sun Y, Kong J Y, Tao B. Gesture recognition based on skeletonization algorithm and CNN with ASL database. Multimedia Tools and Applications, 2019, 78(21): 29953–29970 DOI:10.1007/s11042-018-6748-0
Lai K, Yanushkevich S N. CNN+RNN depth and skeleton based dynamic hand gesture recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR). Beijing, China, IEEE, 2018, 3451–3456 DOI:10.1109/icpr.2018.8545718
Kajan S, Goga J, Zsíros O. Comparison of algorithms for dynamic hand gesture recognition. In: 2020 Cybernetics & Informatics (K&I). Velke Karlovice, Czech Republic, IEEE, 2020, 1–5 DOI:10.1109/ki48306.2020.9039850
Li G F, Wu H, Jiang G Z, Xu S, Liu H H. Dynamic gesture recognition in the Internet of Things. IEEE Access, 2018, 7: 23713–23724 DOI:10.1109/access.2018.2887223
Materzynska J, Berger G, Bax I, Memisevic R. The jester dataset: a large-scale video dataset of human gestures. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul, Korea (South), IEEE, 2019, 2874–2882 DOI:10.1109/iccvw.2019.00349
Escalera S, Baró X, Gonzàlez J, Bautista M A, Madadi M, Reyes M, Ponce-López V, Escalante H J, Shotton J, Guyon I. ChaLearn Looking at People Challenge 2014: Dataset and Results. In: Computer Vision-ECCV 2014 Workshops. Cham, Springer International Publishing, 2015, 45–47
Wan J, Li S Z, Zhao Y B, Zhou S, Guyon I, Escalera S. ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Las Vegas, NV, USA, IEEE, 2016, 761–769 DOI:10.1109/cvprw.2016.100
Amir A, Taba B, Berg D, Melano T, McKinstry J, Di Nolfo C, Nayak T, Andreopoulos A, Garreau G, Mendoza M, Kusnitz J, Debole M, Esser S, Delbruck T, Flickner M, Modha D. A low power, fully event-based gesture recognition system. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, IEEE, 2017, 7388–7397 DOI:10.1109/cvpr.2017.781
Liu L, Shao L. Learning discriminative representations from RGB-D video data. In: Proceedings of the Twenty-Third international joint conference on Artificial Intelligence. Beijing, China, AAAI Press, 2013, 1493–1500
Zhang Y F, Cao C Q, Cheng J, Lu H Q. EgoGesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Transactions on Multimedia, 2018, 20(5): 1038–1050 DOI:10.1109/tmm.2018.2808769
Jiang D, Zheng Z J, Li G F, Sun Y, Kong J Y, Jiang G Z, Xiong H G, Tao B, Xu S, Yu H, Liu H H, Ju Z J. Gesture recognition based on binocular vision. Cluster Computing, 2019, 22(6): 13261–13271 DOI:10.1007/s10586-018-1844-5
Wang H G, Wang P C, Song Z J, Li W Q. Large-scale multimodal gesture recognition using heterogeneous networks. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). Venice, Italy, IEEE, 2017, 3129–3137 DOI:10.1109/iccvw.2017.370
Zhu G M, Zhang L, Shen P Y, Song J. Multimodal gesture recognition using 3D convolution and convolutional LSTM. IEEE Access, 2017, 5: 4517–4524 DOI:10.1109/access.2017.2684186
Köpüklü O, Köse N, Rigoll G. Motion fused frames: data level fusion strategy for hand gesture recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Salt Lake City, UT, USA, IEEE, 2018, 2184–21848 DOI:10.1109/cvprw.2018.00284
Li Y N, Miao Q G, Tian K, Fan Y Y, Xu X, Li R, Song J F. Large-scale gesture recognition with a fusion of RGB-D data based on the C3D model. In: 2016 23rd International Conference on Pattern Recognition (ICPR). Cancun, Mexico, IEEE, 2016, 25–30 DOI:10.1109/icpr.2016.7899602
Molchanov P, Yang X D, Gupta S, Kim K, Tyree S, Kautz J. Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA, IEEE, 2016, 4207–4215 DOI:10.1109/cvpr.2016.456
Zhang L, Zhu G M, Shen P Y, Song J, Shah S A, Bennamoun M. Learning spatiotemporal features using 3DCNN and convolutional LSTM for gesture recognition. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). Venice, Italy, IEEE, 2017, 3120–3128 DOI:10.1109/iccvw.2017.369
Miao Q G, Li Y N, Ouyang W L, Ma Z X, Xu X, Shi W K, Cao X C. Multimodal gesture recognition based on the ResC3D network. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). Venice, Italy, IEEE, 2017, 3047–3055 DOI:10.1109/iccvw.2017.360
Li Y N, Miao Q G, Qi X D, Ma Z X, Ouyang W L. A spatiotemporal attention-based ResC3D model for large-scale gesture recognition. Machine Vision and Applications, 2019, 30(5): 875–888 DOI:10.1007/s00138-018-0996-x
Chai X J, Liu Z P, Yin F, Liu Z, Chen X L. Two streams recurrent neural networks for large-scale continuous gesture recognition. In: 2016 23rd International Conference on Pattern Recognition (ICPR). Cancun, Mexico, IEEE, 2016, 31–36 DOI:10.1109/icpr.2016.7899603
Sydney, Australia, NewYork, ACMPress, 2013 DOI:10.1145/2522848.2532589
Narayana P, Beveridge J R, Draper B A. Gesture recognition: focus on the hands. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, IEEE, 2018, 5235–5244 DOI:10.1109/cvpr.2018.00549
Neverova N, Wolf C, Taylor G, Nebout F. ModDrop: adaptive multi-modal gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(8): 1692–1706 DOI:10.1109/tpami.2015.2461544
Zhu G M, Zhang L, Mei L, Shao J, Song J, Shen P Y. Large-scale Isolated Gesture Recognition using pyramidal 3D convolutional networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR). Cancun, Mexico, IEEE, 2016,19–24 DOI:10.1109/icpr.2016.7899601
Wan J, Li S Z, Zhao Y B, Zhou S, Guyon I, Escalera S. ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Las Vegas, NV, USA, IEEE, 2016, 761–769 DOI:10.1109/cvprw.2016.100
Wang P C, Li W Q, Liu S, Gao Z M, Tang C, Ogunbona P. Large-scale isolated gesture recognition using convolutional neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR). Cancun, Mexico, IEEE, 2016, 7–12 DOI:10.1109/icpr.2016.7899599
Zhan F. Hand gesture recognition with convolution neural networks. In: 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI). Los Angeles, CA, USA, IEEE, 2019, 295–298 DOI:10.1109/iri.2019.00054
Du T, Ren X M, Li H C. Gesture recognition method based on deep learning. In: 2018 33rd Youth Academic Annual Conference of Chinese Association of Automation (YAC). Nanjing, China, IEEE, 2018, 782–787 DOI:10.1109/yac.2018.8406477
Hong J Y, Park S H, Baek J G. Segmented dynamic time warping based signal pattern classification. In: 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC). New York, NY, USA, IEEE, 2019, 263–265 DOI:10.1109/cse/euc.2019.00058
Plouffe G, Cretu A M. Static and dynamic hand gesture recognition in depth data using dynamic time warping. IEEE Transactions on Instrumentation and Measurement, 2016, 65(2): 305–316 DOI:10.1109/tim.2015.2498560
Fine S, Singer Y, Tishby N. The hierarchical hidden Markov model: analysis and applications. Machine Learning, 1998, 32(1): 41–62 DOI:10.1023/a:1007469218079
Haid M, Budaker B, Geiger M, Husfeldt D, Hartmann M, Berezowski N. Inertial-based gesture recognition for artificial intelligent cockpit control using hidden Markov models. In: 2019 IEEE International Conference on Consumer Electronics (ICCE). Las Vegas, NV, USA, IEEE, 2019, 1–4 DOI:10.1109/icce.2019.8662036
Corradini A. Dynamic time warping for off-line recognition of a small gesture vocabulary. In: Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. Vancouver, BC, Canada, IEEE, 2001, 82–89 DOI:10.1109/ratfg.2001.938914
Saha S, Lahiri R, Konar A, Banerjee B, Nagar A K. HMM-based gesture recognition system using kinect sensor for improvised human-computer interaction. In: 2017 International Joint Conference on Neural Networks (IJCNN). Anchorage, AK, USA, IEEE, 2017, 2776–2783 DOI:10.1109/ijcnn.2017.7966198
Yang Z, Li Y, Chen W D, Zheng Y. Dynamic hand gesture recognition using hidden Markov models. In: 2012 7th International Conference on Computer Science & Education (ICCSE). Melbourne, VIC, Australia, IEEE, 2012, 360–365 DOI:10.1109/iccse.2012.6295092
Murphy, Kevin P. Machine learning: a probabilistic perspective. MIT Press, 2012
Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. 2014
Wang L M, Xiong Y J, Wang Z, Qiao Y, Lin D H, Tang X O, van Gool L. Temporal segment networks: towards good practices for deep action recognition. In: Computer Vision–ECCV 2016. Cham: Springer International Publishing, 2016, 20–36 DOI:10.1007/978-3-319-46484-8_2
Feichtenhofer C, Pinz A, Zisserman A. Convolutional two-stream network fusion for video action recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA, IEEE, 2016, 1933–1941 DOI:10.1109/cvpr.2016.213
Zhu Y, Lan Z Z, Newsam S, Hauptmann A. Hidden two-stream convolutional networks for action recognition. In: Computer Vision–ACCV 2018. Cham: Springer International Publishing, 2019, 363–378 DOI:10.1007/978-3-030-20893-6_23
Wu D, Pigou L, Kindermans P J, Le N D H, Shao L, Dambre J, Odobez J M. Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(8): 1583–1597 DOI:10.1109/tpami.2016.2537340
Xu P. A real-time hand gesture recognition and human-computer interaction system. 2017
Pigou L, Dieleman S, Kindermans P J, Schrauwen B. Sign language recognition using convolutional neural networks. In: Computer Vision-ECCV 2014 Workshops. Cham: Springer International Publishing, 2015, 572–578 DOI:10.1007/978-3-319-16178-5_40
Soomro K, Zamir A R, Shah M. UCF101: a dataset of 101 human actions classes from videos in the wild. 2012
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T. HMDB: a large video database for human motion recognition. In: 2011 International Conference on Computer Vision. Barcelona, Spain, IEEE, 2011, 2556–2563 DOI:10.1109/iccv.2011.6126543
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA, IEEE, 2016, 2818–2826 DOI:10.1109/cvpr.2016.308
Wang H S, Wang L. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, IEEE, 2017, 3633–3642 DOI:10.1109/cvpr.2017.387
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M. Learning spatiotemporal features with 3D convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile, IEEE, 2015, 4489–4497 DOI:10.1109/iccv.2015.510
Tran D, Ray J, Shou Z, Chang S F, Paluri M. ConvNet architecture search for spatiotemporal feature learning. 2017
Pigou L, Oord A, Dieleman S, Herreweghe M, Dambre J. Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. International Journal of Computer Vision, 2018, 126(2/3/4): 430–439 DOI:10.1007/s11263-016-0957-7
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780 DOI:10.1162/neco.1997.9.8.1735
Zhao S Y, Yang W K, Wang Y G. A new hand segmentation method based on fully convolutional network. In: 2018 Chinese Control And Decision Conference (CCDC). Shenyang, China, IEEE, 2018, 5966–5970 DOI:10.1109/ccdc.2018.8408176