On-the-fly data loader with utterance-level aggregation for speaker, language recognition | Duke Kunshan University

On-the-fly data loader with utterance-level aggregation for speaker, language recognition

Working with four different datasets, Ming Li and fellow researchers directly modelled utterance-level aggregation for end-to-end speaker and language recognition with a data loader that generates mini-batch samples on the fly, allowing batch-wise variable length training and online data augmentation. IEEE Transactions on Audio, Speech and Language Processing published their findings on the effectiveness of the training.