Program Overview
Date Time: Saturday, January 23rd, 2021, 12:00 PM UTC
12:00 Welcome
12:05 Oral session 1
13:30 Break
13:40 Keynote 1
14:40 Break
14:50 Oral session 2
16:10 Break
16:20 Keynote 2
17:20 Break
17:30 Oral session 3
18:30 Closing
Welcome
Oral Session 1
Oral Session 2
Oral Session 3
Keynote 1
Abstract
Speaker diarization based on Bayesian HMM and variational Bayes inference was part of the winning system in the two previous DIHARD challenges. In the first DIHARD challenge, the winning system made use of the Bayesian HMM diarization with eigenvoice priors, operating on frame-by-frame cepstral features and incorporating an i-vector-like model to robustly model speaker distributions. In the second DIHARD challenge, the winning system was based on VBx: a BHMM which was directly used to cluster x-vectors and incorporated PLDA to model speaker distributions. In this talk we present these models from a historical perspective. We show how the models evolved from and compare to earlier approaches such as clustering of i-vectors and diarization using factor analysis. Compared to the emerging end-to-end approaches BHMM diarization does not deal with overlapped speech. However, despite this drawback, we show that our method provides state-of-the-art performance also on other datasets such as CALLHOME and AMI. Since we have provided our recipes to train and evaluate VBx on these datasets we believe that this method can serve as strong baseline for future diarization research.
Bio
Dr. Mireia Diez Sánchez is a researcher at the speech@FIT group at Brno University of Technology. Mireia received her Electronic Engineering degree in 2009, and her PhD in 2015, both from the University of the Basque Country, Spain. Her thesis focused on the study of features for Language and Speaker recognition. In 2016 she obtained an individual Marie Curie fellowship for the SpeakerDICE project dealing with diarization tasks. She has attended several international workshops dedicated to the field of speaker recognition and diarization: Bosaris (Brno, 2012), ASRWIS (South Africa, 2016) and SCALE (Baltimore, 2017). Recently, she has successfully coordinated the BUT team for the DIHARD challenges. Her research interests are mainly speaker diarization, speaker and language recognition and Bayesian inference.
Keynote 2
Abstract
The question “Who spoke when?” in a recording has become crucial in speech technologies research. Answering this question may improve the performance of downstream applications such as automatic speech recognition, speaker recognition, meeting transcription, and other related fields. Diarization systems seek to answer this question. However, the automatic speaker labeling can be challenging in part due to noisy conditions, speakers talking simultaneously, and imbalanced participation of the speakers. Advances in diarization involve the pursuit of creative new solutions to these problems. In this presentation, we will walk through the yellow brick road of diarization—going briefly from state-of-the-art techniques up through novel neural approaches. We will discuss the challenges of the embedding-clustering method and how neural diarization aims to solve them by formulating the problem as a multi-label classification problem, additionally exploring how neural diarization can elegantly handle overlapping speech. We will detail the benefits of both embedding-clustering and neural diarization and give some hints on how to attain the best of both. Finally, we will discuss the performance of these approaches in real conditions: long-day recordings featuring child-centered speech.
Bio
Dr. Leibny Paola Garcia Perera (PhD 2014, University of Zaragoza, Spain) joined Johns Hopkins University after extensive research experience in academia and industry, including highly regarded laboratories at Agnitio and Nuance Communications. She lead a team of 20+ researchers from four of the best laboratories worldwide in far-field speech diarization and speaker recognition, under the auspices of the JHU summer workshop 2019 in Montreal , Canada. She was also a researcher at Tec de Monterrey, Campus Monterrey, Mexico for 10 years. She was a Marie Curie researcher for the Iris project during 2015, exploring assistive technology for children with autism in Zaragoza, Spain. She was a visiting scholar at Georgia Institute of Technology (2009) and Carnegie Mellon (2011). Recently, she has been working on children’s speech; including child speech recognition and diarization in day-long recordings. She is also part of the JHU Chime5, Chime6, SRE18 and SRE19 teams. Her interests include diarization, speech recognition, speaker recognition, machine learning, and language processing.