Title: Single-Channel Mixed Speech Recognition Using Deep Neural Networks
Speaker: Dr. Dong Yu
Time: 13:30~14:30 ,Nov.10,2014
Address: 信电系215会议室
Abstract:
While significant progress has been made in improving the noise robustness of speech recognition systems, recognizing speech in the presence of a competing talker remains one of the most challenging unsolved problems in the field. In this talk, I will present our first attempt in attacking this problem using deep neural networks (DNNs). Our approach adopted a multi-style training strategy using artificially mixed speech data. I will discuss the strengths and weaknesses of several different setups that we have investigated including a WFST-based two-talker decoder to work with the trained DNNs. Experiments on the 2006 speech separation and recognition challenge task demonstrate that the proposed DNN-based system has remarkable robustness to the interference of a competing speaker. The best setup of our proposed systems achieves an overall WER of 18.8% which improves upon the results obtained by the state-of-the-art IBM superhuman system by 2.8% absolute, with fewer assumptions.
Biography:
Dr. Dong Yu is a principal researcher at Microsoft Research. His research interests include speech processing, robust speech recognition, discriminative training, and machine learning. He has published over 140 papers in these areas and is the inventor/coinventor of more than 50 granted/pending patents. His work context-dependent deep neural network hidden Markov model (CD-DNN-HMM) has helped to shape the new direction on large vacabulary speech recognition research and was recognized by the IEEE SPS 2013 best paper award. Most recently, he has focused on applying computational networks, a generalization of many neural network models, to speech recognition.
Dr. Dong Yu is currently serving as a member of the IEEE Speech and Language Processing Technical Committee (2013-) and an associate editor of IEEE Transactions on Audio, Speech, and Language Processing (2011-). He has served as an associated editor of IEEE Signal Processing Magazine (2008-2011) and the leader guest-editor of IEEE Transactions on Audio, Speech, and Language Processing – special issue on deep learning for speech and language processing (2010-2011).