The First DIHARD Speech Diarization Challenge
DIHARD is a new annual challenge focusing on “hard” diarization; that is, speech diarization for challenging corpora where there is an expectation that the current state-of-the-art will fare poorly, including, but not limited to:
- clinical interviews
- extended child language acquisition recordings
- YouTube videos
- “speech in the wild” (e.g., recordings in restaurants)
Because the performance of a diarization system is highly dependent on the quality of the speech activity detection (SAD) system used, the challenge will have two tracks:
- Track 1: diarization beginning from gold speech segmentation
- Track 2: diarization from scratch
The results of this initial challenge will be presented at a special session at Interspeech 2018 in Hyderabad.
For questions not answered in this document or to join the DIHARD mailing list, please contact dihardchallenge@gmail.com
Important dates
Registration period – January 30 through February 23, 2018
|
Dev set release – February 1, 2018
|
Eval set release – February 26, 2018
|
Interspeech abstract submission – March 16, 2018
|
Interspeech paper submission/final system outputs – March 23, 2018
|
Final system descriptions – March 31, 2018
|
Interspeech 2018 special session – September, 2018
|
The deadline for submission of final system outputs corresponds to the Interspeech deadline (March 23, 2018 midnight GMT).
Organizers
Kenneth Church has worked on many topics in computational linguistics including: web search, language modeling, text analysis, spelling correction, word-sense disambiguation, terminology, translation,lexicography, compression, speech (recognition, synthesis & diarization), OCR, as well as applications that go well beyond computational linguistics such as revenue assurance and virtual integration (using screen scraping and web crawling to integrate systems that traditionally don’t talk together as well as they could such as billing and customer care). He enjoys working with large corpora such as the Associated Press newswire (1 million words per week) and even larger datasets such as telephone calldetail (1-10 billion records per month) and web logs. He earned his undergraduate and graduate de-grees from MIT, and has worked at AT&T, Microsoft, Hopkins and IBM. He was the president of ACLin 2012, and SIGDAT (the group that organizes EMNLP) from 1993 until 2011. He became an AT&T Fellow in 2001.
Christopher Cieri was trained as a linguist working principally on language contact and variation in phonology. More recently, he has worked at the intersection of corpus building, linguistic analysis and human language technology development and evaluation. In 1998, he became Executive Directory of the Linguistic Data Consortium (LDC), where he oversees LDC operations including the creation and distribution of hundreds of databases. His recent work focuses on social dimensions of linguistic variation, corpus building for clinical applications and the science of human linguistic annotation, especially the impact of incentives and workflows.
Alejandrina Cristia
(Laboratoire de Sciences Cognitives et Psycholinguistique)
Alejandrina Cristia received her PhD in Linguistics from Purdue University and did post-doctoral work on neuroimaging at the Max Planck Institute for Psycholinguistics before joining the French CNRS as a Researcher in 2013. Her publications include 45 journal articles (e.g.,Psychological Sciencei>; Child Developmenti>; average impact factor 2.8) and 16 conference proceeding articles (including a best short paper award from ACL 2017; h-index scholar.google.com: 18). She is the French PI for the “Analyzing Children’s Language Environments across the World” (sites.google.com/site/aclewdid) project,which is developing cross-linguistically valid annotations and automatized analysis routines for daylong audio-recordings gathered from young children.
Jun Du
(University of Science and Technology of China)
Jun Du received the B.Eng. and Ph.D. degrees from the Department of Electronic Engineering and Information Science, University of Science and Technology of China (USTC), Hefei, China, in 2004 and 2009, respectively. From July 2009 to June 2010, he worked with iFlytek Research. From July 2010 toJanuary 2013, he joined Microsoft Research Asia as an associate researcher. Since February 2013, hehas been with USTC as an associate professor. His research interests include speech signal processingand pattern recognition. He has published more than 80 conference and journal papers with 1000+citations in Google Scholar.
Dr. Sriram Ganapathy is an Assistant Professor at the Electrical Engineering Dept., Indian Institute of Science, Bangalore and he leads the activities of the learning and extraction of acoustic patterns(LEAP) laboratory. Before joining as a faculty member in early 2016, he spent 4 years as a Research Staff Member at the IBM T.J. Watson Research Center in Yorktown Heights, NY, USA. He obtainedhis PhD from the Center for Language and Speech Processing (CLSP), Johns Hopkins University, USA. Over the past 10 years, Dr. Ganapathy has published over 60 articles in leading international journals and conferences along with a number of patents. Dr. Ganapathy also won the best tutorial speakeraward in Interspeech 2014 and the Pratiksha Young Investigator award in 2017. Dr. Ganapathy is a member of the International Speech Communication Association (ISCA) and a senior member ofthe IEEE signal processing society. His research interests are in signal processing, machine learning, deep learning and neuroscience with applications to robust speech recognition, speech enhancementand audio analytics including biometrics.
Mark Liberman
(Linguistic Data Consortium)
Mark Liberman trained as a phonetician and has worked in many areas including: corpus-based phonetics; speech and language technology; the phonology and phonetics of lexical tone, and its relationship to intonation; gestural, prosodic, morphological and syntactic ways of marking focus, and their usein discourse; formal models for linguistic annotation; information retrieval and information extraction from text. He was an undergraduate at Harvard and earned a PhD at MIT, before moving on to AT&T Bell Labs from 1975 to 1990. Since 1990 he has served as the Trustee Professor of Phonetics at the University of Pennsylvania. In 1992 he helped found the Linguistic Data Consortium (LDC), whoseefforts have fueled the development and advancement of human language technology (HLT), including speech and speaker recognition, machine translation, and semantic analyses. Today, the LDC is the largest developer of shared language resources in the world, distributing more than 120,000 copies of over 2,000 databases covering 91 different languages to more than 3,600 organizations in over 70 countries.
Neville Ryant
(Linguistic Data Consortium)
Neville Ryant is a researcher at the Linguistic Data Consortium (LDC) at the University of Pennsylvania, where he has worked on many topics in speech recognition including: forced alignment, speech activity detection, large scale corpus linguistics, computational paralinguistics, and automated analysis of tone. He has also supported LDC’s annotation efforts through the development of new tools for named entity recognition, sentence segmentation, and comparable corpora construction for low resource languages. He did his undergraduate and graduate studies at the University of Pennsylvania, where he focused on formal semantics and the neural basis of natural language quantifiers.
If you are interested in participating or just want to learn more,
then please email dihardchallenge@gmail.com with the subject DIHARD
you will be added to the email list for updates.