Program Overview

Date Time: Saturday, January 23rd, 2021, 12:00 PM UTC

12:00 Welcome
12:05 Oral session 1
13:30 Break
13:40 Keynote 1
14:40 Break
14:50 Oral session 2
16:10 Break
16:20 Keynote 2
17:20 Break
17:30 Oral session 3
18:30 Closing

Welcome

12:00   Welcome to the Third DIHARD Speech Diarization Challenge Workshop

Oral Session 1

12:05   Overview of the Third DIHARD Speech Diarization Challenge [Slides] [Video]
Neville Ryant1, Kenneth Church2, Christoper Cieri1, Jun Du3, Sriram Ganapathy4, Mark Liberman1
(1 University of Pennsylvania; 2Baidu Research; 3University of Science and Technology of China; 4Indian Institute of Science)

12:30   NAVER Clova Submission to the Third DIHARD Challenge [Slides] [Video]
Heesoo Heo1, Jee-weon Jung1, Youngki Kwon1, You Jin Kim1, Jaesung Huh2, Joon Son Chung1, Bong-Jin Lee1 (1Naver Corporation; 2University of Oxford)

12:50   End-to-End Speaker Diarization System for the Third DIHARD Challenge [Slides] [Video]
Tsun Yat Leung, Lahiru T Samarakoon (Fano Labs)

13:10   Hitachi-JHU System for the Third DIHARD Speech Diarization Challenge [Slides] [Video]
Shota Horiguchi1, Nelson Yalta1, Paola Garcia2, Yuki Takashima1, Yawen Xue1, Desh Raj2, Zili Huang2, Yusuke Fujita1, Shinji Watanabe2, Sanjeev Khudanpur2 (1Hitachi, Ltd.; 2Johns Hopkins University)

Oral Session 2

14:50   System Description for Team DKU-Duke-Lenovo [Slides] [Video]
Weiqing Wang1, Qinjian Lin3, Danwei Cai1, Lin Yang3, Ming Li2 (1Duke University; 2Duke Kunshan University; 3Lenovo)

15:10   The USTC-NELSLIP Systems for DIHARD III Challenge [Slides] [Video]
Mao-Kui He1, Yuxuan Wang1, Shutong Niu1, Lei Sun1,Tian Gao2, Xin Fang2, Jia Pan2, Jun Du1, Chin-Hui Lee3 (1University of Science and Technology of China; 2iFlytek Research; 3Georgia Institute of Technology)

15:30   LEAP Submission for Third DIHARD Diarization Challenge [Slides] [Video]
Prachi Singh, Rajat Varma, Venkat Krishnamohan, Srikanth Raj Chetupalli, Sriram Ganapathy (Indian Institute of Science)

15:50   Domain-Dependent Speaker Diarization for the Third DIHARD Challenge [Slides] [Video]
A Kishore Kumar1, Shefali Waldekar1, Goutam Saha1, MD Sahidullah2 (1Indian Institute of Technology Kharagpur; 2Inria)

Oral Session 3

17:30   BUT System for The Third DIHARD Speech Diarization Challenge [Slides] [Video]
Federico Landini1, Alicia Lozano-Diez1, Lukáš Burget1, Mireia Diez1, Anna Silnova1, Kateřina Žmolíková1, Ondrěj Glembek1, Pavel Matějka1, Themos Stafylakis2, Niko Brümmer2 (1Brno University of Technology; 2Omilia - Conversational Intelligence)

17:50   Diaboliic@DIHARD3 [Slides] [Video]
Wenda Chen1, Sangeeta Ghangam2 (1Intel Labs; 2Intel Corporation)

18:10   DIHARD 3 Dirazation Challenge [Slides] [Video]
TaeJin Park, Raghuveer Peri, Arindam Jati, Shrikanth Narayanan (University of Southern California)

Keynote 1

avatar
Dr. Mireia Diez Sánchez
Brno University of Technology
[Slides]  [Video]

Variants of Bayesian HMMs for speaker diarization

Abstract
Speaker diarization based on Bayesian HMM and variational Bayes inference was part of the winning system in the two previous DIHARD challenges. In the first DIHARD challenge, the winning system made use of the Bayesian HMM diarization with eigenvoice priors, operating on frame-by-frame cepstral features and incorporating an i-vector-like model to robustly model speaker distributions. In the second DIHARD challenge, the winning system was based on VBx: a BHMM which was directly used to cluster x-vectors and incorporated PLDA to model speaker distributions. In this talk we present these models from a historical perspective. We show how the models evolved from and compare to earlier approaches such as clustering of i-vectors and diarization using factor analysis. Compared to the emerging end-to-end approaches BHMM diarization does not deal with overlapped speech. However, despite this drawback, we show that our method provides state-of-the-art performance also on other datasets such as CALLHOME and AMI. Since we have provided our recipes to train and evaluate VBx on these datasets we believe that this method can serve as strong baseline for future diarization research.

Bio
Dr. Mireia Diez Sánchez is a researcher at the speech@FIT group at Brno University of Technology. Mireia received her Electronic Engineering degree in 2009, and her PhD in 2015, both from the University of the Basque Country, Spain. Her thesis focused on the study of features for Language and Speaker recognition. In 2016 she obtained an individual Marie Curie fellowship for the SpeakerDICE project dealing with diarization tasks. She has attended several international workshops dedicated to the field of speaker recognition and diarization: Bosaris (Brno, 2012), ASRWIS (South Africa, 2016) and SCALE (Baltimore, 2017). Recently, she has successfully coordinated the BUT team for the DIHARD challenges. Her research interests are mainly speaker diarization, speaker and language recognition and Bayesian inference.

Keynote 2

avatar
Dr. Leibny Paola Garcia Perera
Johns Hopkins University
[Slides]  [Video]

The yellow brick road of diarization, challenges and other neural paths

Abstract
The question “Who spoke when?” in a recording has become crucial in speech technologies research. Answering this question may improve the performance of downstream applications such as automatic speech recognition, speaker recognition, meeting transcription, and other related fields. Diarization systems seek to answer this question. However, the automatic speaker labeling can be challenging in part due to noisy conditions, speakers talking simultaneously, and imbalanced participation of the speakers. Advances in diarization involve the pursuit of creative new solutions to these problems. In this presentation, we will walk through the yellow brick road of diarization—going briefly from state-of-the-art techniques up through novel neural approaches. We will discuss the challenges of the embedding-clustering method and how neural diarization aims to solve them by formulating the problem as a multi-label classification problem, additionally exploring how neural diarization can elegantly handle overlapping speech. We will detail the benefits of both embedding-clustering and neural diarization and give some hints on how to attain the best of both. Finally, we will discuss the performance of these approaches in real conditions: long-day recordings featuring child-centered speech.

Bio
Dr. Leibny Paola Garcia Perera (PhD 2014, University of Zaragoza, Spain) joined Johns Hopkins University after extensive research experience in academia and industry, including highly regarded laboratories at Agnitio and Nuance Communications. She lead a team of 20+ researchers from four of the best laboratories worldwide in far-field speech diarization and speaker recognition, under the auspices of the JHU summer workshop 2019 in Montreal , Canada. She was also a researcher at Tec de Monterrey, Campus Monterrey, Mexico for 10 years. She was a Marie Curie researcher for the Iris project during 2015, exploring assistive technology for children with autism in Zaragoza, Spain. She was a visiting scholar at Georgia Institute of Technology (2009) and Carnegie Mellon (2011). Recently, she has been working on children’s speech; including child speech recognition and diarization in day-long recordings. She is also part of the JHU Chime5, Chime6, SRE18 and SRE19 teams. Her interests include diarization, speech recognition, speaker recognition, machine learning, and language processing.