Chinese
Adv Search
Home | Accepted | Article In Press | Current Issue | Archive | Special Issues | Collections | Featured Articles | Statistics

2019, 1(1): 21-38 Published Date:2019-2-20

DOI: 10.3724/SP.J.2096-5796.2018.0010

Data fusion methods in multimodal human computer dialog

Full Text: PDF (55) HTML (783)

Export: EndNote | Reference Manager | ProCite | BibTex | RefWorks

Abstract:

In multimodal human computer dialog, non-verbal channels, such as facial expression, posture, gesture, etc, combined with spoken information, are also important in the procedure of dialogue. Nowadays, in spite of high performance of users’ single channel behavior computing, it is still great challenge to understand users’ intention accurately from their multimodal behaviors. One reason for this challenge is that we still need to improve multimodal information fusion in theories, methodologies and practical systems. This paper presents a review of data fusion methods in multimodal human computer dialog. We first introduce the cognitive assumption of single channel processing, and then discuss its implementation methods in human computer dialog; for the task of multi-modal information fusion, serval computing models are presented after we introduce the principle description of multiple data fusion. Finally, some practical examples of multimodal information fusion methods are introduced and the possible and important breakthroughs of the data fusion methods in future multimodal human-computer interaction applications are discussed.
Keywords: Intention understanding ; Multimodal human computer dialog

Cite this article:

Ming-Hao YANG, Jian-Hua TAO. Data fusion methods in multimodal human computer dialog. Virtual Reality & Intelligent Hardware, 2019, 1(1): 21-38 DOI:10.3724/SP.J.2096-5796.2018.0010

1. Olyanitch A V. Information technologies in economics: semiolinguistic aspect. In: Perspectives on the Use of New Information and Communication Technology (ICT) in the Modern Economy. Cham: Springer International Publishing, 2019, 630–638 DOI:10.1007/978-3-319-90835-9_73

2. Brustoloni J C. Autonomous agents: characterization and requirements. School of Computer Science, Carnegie Mellon University, 1991

3. Engwall O, Bälter O. Pronunciation feedback from real and virtual language teachers. Computer Assisted Language Learning, 2007, 20(3): 235–262 DOI:10.1080/09588220701489507

4. Wik P, Hjalmarsson A. Embodied conversational agents in computer assisted language learning. Speech Communication, 2009, 51(10): 1024–1037 DOI:10.1016/j.specom.2009.05.006

5. Yang M, Tao J, Chao L, Li H, Zhang D, Che H, Gao T, Liu B. User behavior fusion in dialog management with multi-modal history cues. Multimedia Tools and Applications, 2015, 74(22): 10025–10051 DOI:10.1007/s11042-014-2161-5

6. Cohen P R, McGee D R. Tangible multimodal interfaces for safety-critical applications. ACM, 2004, 47(1): 41–46 DOI:10.1145/962081.962103

7. Jaimes A, Sebe N. Multimodal human–computer interaction: A survey. Computer Vision and Image Understanding, 2007, 108(1): 116–134 DOI:10.1016/j.cviu.2006.10.019

8. Meyer S, Rakotonirainy A. A survey of research on context-aware homes. In: Proceedings of the Australasian information security workshop conference on ACSW frontiers 2003. Adelaide, Australia: Australian Computer Society, Inc. , 2003, 21: 159–168

9. Yang M, Tao J, Gao T, Zhang D, Sun M, Li H, Chao L. The error analysis of intention classification and speech recognition in speech man-machine conversation. In: The 11th Joint Conference on Harmonious Human Machine Environment. Huludao, Liaoning, China: 2015

10. Yang M, Tao J, Li H, Chao L. A nature multimodal human-computer-interaction dialog system. In: the 9th Joint Conference on Harmonious Human Machine Environment. Nanchang, Jiangxi, China: 2013

11. Duric Z, Gray W D, Heishman R, Fayin L, Rosenfeld A, Schoelles M J, Schunn C, Wechsler H. Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction. Proceedings of the IEEE, 2002, 90(7): 1272–1289 DOI:10.1109/JPROC.2002.801449

12. Wang L, Hu W, Tan T. Recent developments in human motion analysis. Pattern Recognition, 2003, 36(3): 585–601 DOI:10.1016/S0031-3203(02)00100-0

13. Seely R D, Goffredo M, Carter J N, Nixon M S. View invariant gait recognition. In: Handbook of Remote Biometrics: for Surveillance and Security. Tistarelli M, Li S Z, Chellappa R, eds. London: Springer London; 2009: 61–81 DOI:10.1007/978-1-84882-385-3_3

14. Chin K, Hong Z, Chen Y. Impact of using an educational robot-based learning system on students’ motivation in elementary education. IEEE Transactions on Learning Technologies, 2014, 7(4): 333–345 DOI:10.1109/TLT.2014.2346756

15. Pierre-Yves O. The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies, 2003, 59(1): 157–183 DOI:10.1016/S1071-5819(02)00141-6

16. Chorowski J, Bahdanau D, Serdyuk D, Cho K, Bengio Y. Attention-based models for speech recognition. Neural Information Processing Systems (NIPS), 2015

17. Ming-Hsuan Y, Kriegman D J, Ahuja N. Detecting faces in images: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(1): 34–58 DOI:10.1109/34.982883

18. Zhao W, Chellappa R, Phillips P J, Rosenfeld A. Face recognition: A literature survey. ACM Computing Surveys, 2003, 35(4): 399–458 DOI:10.1145/954339.954342

19. Pantic M, Rothkrantz L J M. Automatic analysis of facial expressions: the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(12): 1424–1445 DOI:10.1109/34.895976

20. Tao J, Tan T. Affective Computing: A Review. In: Affective Computing and Intelligent Interaction. Berlin, Heidelberg: Springer Berlin Heidelberg; 2005: 981–995 DOI:10.1007/11573548_125

21. Chao L, Tao J, Yang M, Li Y, Wen Z. Long short term memory recurrent neural network based multimodal dimensional emotion recognition. In: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge. Brisbane, Australia: ACM, 2015: 65–72 DOI:10.1145/2808196.2811634

22. Wang S, Yan W, Li X, Zhao G, Zhou C, Fu X, Yang M, Tao J. Micro-expression recognition using color spaces. IEEE Transactions on Image Processing, 2015, 24(12): 6034–6047 DOI:10.1109/TIP.2015.2496314

23. He L, Jiang D, Yang L, Pei E, Wu P, Sahli H. Multimodal Affective Dimension Prediction Using Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks. In: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge. Brisbane, Australia: ACM; 2015: 73–80 DOI:10.1145/2808196.2811641

24. Ge L, Liang H, Yuan J, Thalmann D. Robust 3D hand pose estimation from single depth images using multi-view CNNs. IEEE Transactions on Image Processing, 2018, 27(9): 4422–4436 DOI:10.1109/TIP.2018.2834824

25. Zimmermann C, Brox T. Learning to estimate 3D hand pose from single RGB images. In: International Conference on Computer Vision. Venice, Italy: 2017, 1, 3 DOI:10.1109/ICCV.2017.525

26. Ruffieux S, Lalanne D, Mugellini E, Abou Khaled O. A Survey of datasets for human gesture recognition. In: Human-Computer Interaction Advanced Interaction Modalities and Techniques. Cham: Springer International Publishing, 2014: 337–348 DOI:10.1007/978-3-319-07230-2_33

27. Hasan H S, Kareem S A. Human computer interaction for vision based hand gesture recognition: a survey. In: 2012 International Conference on Advanced Computer Science Applications and Technologies. Kuala Lumpur, Malaysia: 2012, 55–60 DOI:10.1109/ACSAT.2012.37

28. Weiming H, Tieniu T, Liang W, Maybank S. A survey on visual surveillance of object motion and behaviors. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2004, 34(3): 334–352 DOI:10.1109/TSMCC.2004.829274

29. Fagiani C, Betke M, Gips J. Evaluation of tracking methods for human-computer interaction. In: Sixth IEEE Workshop on Applications of Computer Vision, 2002 (WACV 2002) Proceedings. Orlando, FL, USA: IEEE, 2002, 121–126 DOI:10.1109/ACV.2002.1182168

30. Oviatt S, Cohen P, Wu L, Duncan L, Suhm B, Bers J, Holzman T, Winograd T, Landay J, Larson J, Ferro D. Designing the user interface for multimodal speech and pen-based gesture applications: state-of-the-art systems and future research directions. Human–Computer Interaction, 2000, 15(4): 263–322 DOI:10.1207/S15327051HCI1504_1

31. Tian F, Xu L, Wang H, Zhang X, Liu Y, Setlur V, Dai G. Tilt menu: using the 3D orientation information of pen devices to extend the selection capability of pen-based user interfaces. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Florence, Italy: ACM, 2008: 1371–1380 DOI:10.1145/1357054.1357269

32. Tian F, Lu F, Jiang Y, Zhang X, Cao X, Dai G, Wang H. An exploration of pen tail gestures for interactions. International Journal of Human-Computer Studies, 2013, 71(5): 551–569 DOI:10.1016/j.ijhcs.2012.12.004

33. Pelz J B. Portable eyetracking in natural behavior. Journal of Vision, 2004, 4(11): 14–14 DOI:10.1167/4.11.14

34. Santella A, Decarlo D. Robust clustering of eye movement recordings. Eye Tracking Research and Applications (ETRA), 2003, 27–34 DOI:10.1145/968363.968368

35. Cheng S, Sun Z, Sun L, Yee K, Dey A K. Gaze-based annotations for reading comprehension. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. Seoul, Republic of Korea: ACM, 2015: 1569–1572 DOI:10.1145/2702123.2702271

36. Yu C, Sun K, Zhong M, Li X, Zhao P, Shi Y. One-Dimensional handwriting: inputting letters and words on smart glasses. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. San Jose, California, USA: ACM, 2016, 71–82 DOI:10.1145/2858036.2858542

37. Yu C, Wen H, Xiong W, Bi X, Shi Y. Investigating effects of post-selection feedback for acquiring ultra-small targets on touchscreen. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. San Jose, California, USA: ACM, 2016, 4699–4710 DOI:10.1145/2858036.2858593

38. Wang D, Zhao X, Shi Y, Zhang Y, Hou J, Xiao J. Six Degree-of-Freedom haptic simulation of probing dental caries within a narrow oral cavity. IEEE Transactions on Haptics, 2016, 9(2): 279–291 DOI:10.1109/TOH.2016.2531660

39. Yang W, Jiang Z, Huang X, Wu X, Zhu Z. Tactile perception of digital images. In: Haptic Interaction. Singapore: Springer Singapore, 2018, 445–447 DOI:10.1007/978-981-10-4157-0_74

40. Paivio A. Mental representation: A dual coding approach. New York: Oxford University Press, 1990 DOI:10.1093/acprof:oso/9780195066661.001.0001

41. Baddeley A D. Working memory. Oxford: Clarendon Press, 1986

42. Cowan N. What are the differences between long-term, short-term, and working memory? In: Sossin W S, LacailleJ-C, Castellucci V F, Belleville S, eds. Progress in Brain Research. Elsevier, 2008, 323–338 DOI:10.1016/S0079-6123(07)00020-9

43. Baddeley A. Working memory: looking back and looking forward. Nature Reviews Neuroscience, 2003, 4: 829 DOI:10.1038/nrn1201

44. Service E. The effect of word length on immediate serial recall depends on phonological complexity, not articulatory duration. The Quarterly Journal of Experimental Psychology Section A, 1998, 51(2): 283–304 DOI:10.1080/713755759

45. Just M A, Carpenter P A. A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 1992, 99(1), 122–149 DOI:10.1037/0033-295X.99.1.122

46. Nelson C. The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 2001, 24, 87–185 DOI:10.1017/S0140525X01003922

47. ChooiW-T, Thompson L A. Working memory training does not improve intelligence in healthy young adults. Intelligence, 2012, 40(6): 531–542 DOI:10.1016/j.intell.2012.07.004

48. Barrouillet P, Bernardin S, Camos V. Time constraints and resource sharing in adults’ working memory spans. Journal of Experimental Psychology: General, 2004, 133(1): 83 DOI:10.1037/0096-3445.133.1.83

49. Maehara Y, Saito S. The relationship between processing and storage in working memory span: Not two sides of the same coin. Journal of Memory and Language, 2007, 56(2): 212–228 DOI:10.1016/j.jml.2006.07.009

50. Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. 2006, 313(5786): 504–507 DOI:10.1126/science.1127647

51. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzago P A. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research, 2010, 3371–3408

52. Chao L, Tao J, Yang M, Li Y. Bayesian fusion based temporal modeling for naturalistic audio affective expression classification. In: The 5th International Conference on Affective Computing and Intelligent Interaction. Geneva, Switzerland: IEEE, 2013, 173–178 DOI:10.1109/ACII.2013.35

53. Miao Y, Gowayyed M, Metze F. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). Scottsdale, USA: IEEE, 2015, 167–174 DOI:10.1109/ASRU.2015.7404790

54. Caramiaux B, Montecchio N, Tanaka A, Bevilacqua R. Adaptive gesture recognition with variation estimation for interactive systems. ACM Transactions on Interactive Intelligent Systems (TiiS), 2015, 4(4): 18 DOI:10.1145/2643204

55. Mayer R E. Multimedia learning. In: Psychology of Learning and Motivation. Academic Press, 2002, 41: 85–139 DOI:10.1016/S0079-7421(02)80005-6

56. Revlin R. Cognition: theory and practice: Worth Publishers, 2012.

57. Fournet N, RoulinJ-L, Vallet F, Beaudoin M, Agrigoroaei S, Paignon A, Dantzer C, Desrichard O. Evaluating short-term and working memory in older adults: French normative data. Aging & Mental Health, 2012, 16(7): 922–930 DOI:10.1080/13607863.2012.674487

58. Maehara Y, Saito S. The relationship between processing and storage in working memory span: Not two sides of the same coin. Journal of Memory and Language, 2007, 56(2): 212–228 DOI:10.1016/j.jml.2006.07.009

59. Ernst M O, Banks M S. Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 2002, 415: 429 DOI:10.1038/415429a

60. Gunes H, Piccardi M. Affect recognition from face and body: early fusion vs. late fusion. In: 2005 IEEE International Conference on Systems, Man and Cybernetics. Waikoloa, USA: IEEE, 2005, 3437–3443 DOI:10.1109/ICSMC.2005.1571679

61. Yang M, Tao J, Chao L, Li H, Zhang D, Che H, Gao T, Liu B. User behavior fusion in dialog management with multi-modal history cues. Multimedia Tools and Applications, 2015, 74(22): 10025–10051 DOI:10.1007/s11042-014-2161-5

62. Li X, Gao F, Wang J, Strahler A. A priori knowledge accumulation and its application to linear BRDF model inversion. Journal of Geophysical Research: Atmospheres, 2001, 106(D11), 11925–11935 DOI:10.1029/2000JD900639

63. Fang L, Xueyin L, Li S Z, Yuanchun S. Multi-modal face tracking using Bayesian network. In: 2003 IEEE International SOI Conference Proceedings. Nice, France: IEEE, 2003, 135–142 DOI:10.1109/AMFG.2003.1240835

64. Town C. Multi-sensory and multi-modal fusion for sentient computing. International Journal of Computer Vision, 2007, 71(2): 235–253 DOI:10.1007/s11263-006-7834-8

65. Pradalier C. Colas F. Bessiere P. Expressing bayesian fusion as a product of distributions: applications in robotics. In: International Conference on Intelligent Robots and Systems IEEE. IEEE, 2015

66. Savran A, Cao H, Nenkova A, Verma R. Temporal Bayesian Fusion for Affect Sensing: Combining Video, Audio, and Lexical Modalities. IEEE Transactions on Cybernetics, 2015, 45(9): 1927–1941 DOI:10.1109/TCYB.2014.2362101

67. Li W, Lin G. An adaptive importance sampling algorithm for Bayesian inversion with multimodal distributions. Journal of Computational Physics, 2015, 294: 173–190 DOI:10.1016/j.jcp.2015.03.047

68. Yu D, Li D, He X, Acero A. Large-Margin minimum classification error training for large-scale speech recognition tasks. In: International Conference on Acoustics, Speech and Signal Processing. Honolulu, HI, USA: IEEE, 2007, IV-1137

69. He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision. IEEE, 2015, 1026–1034 DOI:10.1109/ICCV.2015.123

70. Yang F, Choi W, Lin Y. Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, 2016, 2129–2137 DOI:10.1109/CVPR.2016.234

71. Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng A Y. Multimodal deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11). 2011, 689–696

72. Collobert R, Weston J. A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning. Helsinki, Finland: ACM, 2008: 160–167 DOI:10.1145/1390156.1390177

73. Seltzer M L, Droppo J. Multi-task learning in deep neural networks for improved phoneme recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, BC, Canada: IEEE, 2013, 6965–6969 DOI:10.1109/ICASSP.2013.6639012

74. Tzeng E, Hoffman J, Darrell T, Saenko K. Simultaneous deep transfer across domains and tasks. In: Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2015, 4068–4076 DOI:10.1109/ICCV.2015.463

75. Kaiser L, Gomez A N, Shazeer N, Vaswani A, Parmar N, Jones L, Uszkoreit J. One model to learn them all. arXiv preprint, 2017, 1706. 05137

76. Wang C, de La Gorce M, Paragios N. Segmentation, ordering and multi-object tracking using graphical models. ICCV, 2009, 12: 747–754

77. Wei F, Li W, Lu Q, He Y. A document-sensitive graph model for multi-document summarization. Knowledge and Information Systems, 2010, 22(2): 245–259 DOI:10.1007/s10115-009-0194-2

78. Myunghwan K, Jure L. Latent multi-group membership graph mode. Computer Science, 2012, 80

79. Honorio J, Samaras D. Multi-task learning of gaussian graphical models. ICML, 2010, 447–454

80. Lake B M, Salakhutdinov R, Tenenbaum J B. Human-level concept learning through probabilistic program induction. 2015, 350(6266): 1332–1338 DOI:10.1126/science.aab3050

81. Wu J, Cheng J, Zhao C, Lu H. Fusing multi-modal features for gesture recognition. In: Proceedings of the 15th ACM on International conference on multimodal interaction. Sydney, Australia: ACM, 2013: 453–460 DOI:10.1145/2522848.2532589

82. Hamouda L, Kilgour D M, Hipel K W. Strength of preference in graph models for multiple-decision-maker conflicts. Applied Mathematics and Computation, 2006, 179(1): 314–327 DOI:10.1016/j.amc.2005.11.109

83. Kim S, D’Haro L F, Banchs R E, Williams J D, Henderson M. The fourth dialog state tracking challenge. In: Dialogues with Social Robots: Enablements, Analyses, and Evaluation. Jokinen K, Wilcock G, eds,. Singapore: Springer Singapore; 2017: 435–449 DOI:10.1007/978-981-10-2585-3_36

84. Williams J D, Young S. Scaling POMDPs for spoken dialog management. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(7): 2116–2129 DOI:10.1109/TASL.2007.902050

85. Wahlster W. Smartkom: Symmetric multimodality in an adaptive and reusable dialogue shell. In: Proceedings of the human computer interaction status conference. Berlin, Germany: DLR, 2003, 3: 47–62

86. McGuire P, Fritsch J, Steil J J, Rothling F, Fink G A, Wachsmuth S, Sagerer G, Ritter H. Multi-modal human-machine communication for instructing robot grasping tasks. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. Lausanne, Switzerland: IEEE, 2002, 1082–1088 DOI:10.1109/IRDS.2002.1043875

87. Michaelis J E, Mutlu B. Someone to read with: design of and experiences with an in-home learning companion robot for reading. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. Denver, Colorado, USA: ACM, 2017: 301–312 DOI:10.1145/3025453.3025499

88. Cheng A, Yang L, Andersen E. Teaching language and culture with a virtual reality game. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. Denver, Colorado, USA: ACM, 2017: 541–549 DOI:10.1145/3025453.3025857

89. Sun M, Zhao Z, Ma X. Sensing and handling engagement dynamics in human-robot interaction involving peripheral computing devices. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. Denver, Colorado, USA: ACM, 2017: 556–567 DOI:10.1145/3025453.3025469

90. Ji Z, Lu Z, Li H. An information retrieval approach to short text conversation. Computer Science, 2014

91. Mrksic N, ó. Séaghdha D, Wen T-H, Thomson B, Young S. Neural Belief Tracker: Data-Driven Dialogue State Tracking. In: The 55th Annual Meeting of the Association for Computational Linguistics, 2017

92. Leuski A, Traum D R, Leuski A. Creating virtual human dialogue using information retrieval techniques. Ai Magazine, 2011, 32 DOI:10.1609/aimag.v32i2.2347

93. Lowe R T, Pow N, Serban I V, Charlin L, Liu C W, Pineau J. Training end-to-end dialogue systems with the ubuntu dialogue corpus. Dialogue & Discourse, 2017, 8(1), 31–65

94. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780 DOI:10.1162/neco.1997.9.8.1735

95. Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, 2014, 3104–3112

96. Yan Z, Duan N, Bao J, Chen P, Zhou M, Li Z, Zhou J. An information retrieval approach for chatbot engines using unstructured documents. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016, 516–525 DOI:10.18653/v1/P16-1049

97. Kurata G, Xiang B, Zhou B. Leveraging sentence-level information with encoder LSTM for semantic slot filling. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016 DOI:10.18653/v1/D16-1223

98. Hu B, Lu Z, Li H. Convolutional Neural Network Architectures for Matching Natural Language Sentences. In: International Conference on Neural Information Processing Systems, 2014

99. Serban I V, Sordoni A, Lowe R, Charlin L, Pineau J, Courville A C, Bengio Y. A hierarchical latent variable encoder-decoder model for generating dialogues. In: AAAI, 2017, 3295–3301

100. Yao K, Zweig G, Peng B. Attention with intention for a neural network conversation model. Computer Science, 2015

101. Li J, Monroe W, Ritter A, Galley M, Gao J, Jurafsky D. Deep reinforcement learning for dialogue generation,2016, 1192–1202

102. Rieser V, Lemon O, Keizer S. Natural language generation as incremental planning under uncertainty: Adaptive information presentation for statistical dialogue systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing. Vancouver, BC, Canada: IEEE, 2014, 22(5): 979–994 DOI:10.1109/TASL.2014.2315271

103. Li J, Monroe W, Shi T, Ritter A, Jurafsky D. Adversarial learning for neural dialogue generation. Empirical methods in natural language processing, 2017, 2157–2169

104. Nakano M, Hoshino A, Takeuchi J, Hasegawa Y, Torii T, Nakadai K, Kato K, Tsujino H. A robot that can engage in both task-oriented and non-task-oriented dialogues. In: 2006 6th IEEE-RAS International Conference on Humanoid Robots. Genova, Italy: 2006, 404–411 DOI:10.1109/ICHR.2006.321304

105. Williams J D, Young S. Partially observable Markov decision processes for spoken dialog systems. Computer Speech & Language, 2007, 21(2): 393–422 DOI:10.1016/j.csl.2006.06.008

106. Yu Z, Xu Z, Black A W, Rudnicky A. Strategy and policy learning for non-task-oriented conversational systems. In: Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2016, 404–412 DOI:10.18653/v1/W16-3649

107. Li J, Monroe W, Ritter A, Galley M, Gao J, Jurafsky D. Deep reinforcement learning for dialogue generation. In: arXiv preprint arXiv, 2016, 1606, 01541 DOI:10.18653/v1/D16-1127

108. Oviatt S. Mutual disambiguation of recognition errors in a multimodel architecture. In: Proceedings of the SIGCHI conference on Human Factors in Computing Systems. Pittsburgh, Pennsylvania, USA: ACM, 1999: 576–583 DOI:10.1145/302979.303163

email E-mail this page

Articles by authors

VRIH

BAIDU SCHOLAR