Data Science Research Center Seminar | New challenges and recent progress on speech processing in a cocktail party | Yanmin Qian, associate professor, Department of Computer Science and Engineering, Shanghai Jiao Tong University
This talk will be in English. (DNS)
This event is open to DKU community members only.
Last registration: Sept. 10, 12pm
For questions: Ivy Xu, firstname.lastname@example.org
Abstract: Although intelligent speech processing has been greatly advanced in research and widely used in many real-life applications, there remains a large performance gap between controlled environments and real-life scenarios. One of the core problems in the real-world condition is known as the cocktail party problem. The cocktail party defines a complicated scenario where multiple talkers speak simultaneously with the presence of background noise and reverberation. It is easy for humans to attend to a target source of interest and recognize the speech in such conditions, but the mechanism behind this strong capability has not been well studied. In the past few decades, researchers have tried to develop algorithms for machines to mimic humans’ capability in the cocktail party scenario but the performance is still far from satisfactory. In this talk, we will summarize recent progress and present our efforts on speech processing in the cocktail party problem, especially the new techniques on speech separation and automatic speech recognition those developed at Shanghai Jiao Tong University. Finally, we will discuss the new challenges and potential directions to solve the cocktail party problem.
Bio: Yanmin Qian received his Ph.D. from the Department of Electronic Engineering, Tsinghua University, in 2012. He joined the Department of Computer Science and Engineering, Shanghai Jiao Tong University, in 2013. In 2015 and 2016, he also worked as an associate researcher in the Speech Group at the Cambridge University Engineering Department. He was one of the key members to design and implement the Cambridge Multi-Genre Broadcast Speech Processing system, which won all four tasks of the first MGB Challenge in 2015. He is a senior member of IEEE and a member of ISCA, and one of the founding members of Kaldi Speech Recognition Toolkit. He has published more than 140 papers on speech and language processing with 8,000-plus citations including T-ASLP, Speech Communication, ICASSP, INTERSPEECH and ASRU. His current research interests include the acoustic and language modeling in speech recognition, speaker and language recognition, speech separation and enhancement, natural language understanding, deep learning and multi-media signal processing. Learn more at https://speechlab.sjtu.edu.cn/members/yanmin-qian