Submission Results

The Second DIHARD Speech Diarization Challenge

DIHARD II is the second in a series of diarization challenges focusing on "hard" diarization; that is, speaker diarization for challenging recordings where there is an expectation that the current state-of-the-art will fare poorly. As with other evaluations in this series, DIHARD II is intended to both:

  • support speaker diarization research through the creation and distribution of novel data sets
  • measure and calibrate the performance of systems on these data sets.


Following in the success of the First DIHARD Challenge,
we are pleased to announce This Second DIHARD Challenge (DIHARD II)

The task evaluated in the challenge is speaker diarization; that is, the task of determining "who spoke when" in a multispeaker environment based only on audio recordings. As with DIHARD I, development and evaluation sets are provided by the organizers, but there is no fixed training set with the result that participants are free to train their systems on any proprietary and/or public data. Once again, these development and evaluation sets are drawn from a diverse sampling of sources including monologues, map task dialogues, broadcast interviews, sociolinguistic interviews, meeting speech, speech in restaurants, clinical recordings, extended child language acquisition recordings from LENA vests, and YouTube videos. However, there are several key differences from DIHARD I:

  • two tracks evaluating diarization of multi-channel recordings have been added; these tracks use recordings of dinner parties provided by the organizers of CHiME-5
  • the evaluation period has been lengthened (from 4 weeks to 16 weeks)
  • Jaccard Error Rate replaces mutual information as the secondary metric
  • baseline systems and results will be provided to participants

The challenge will run from February 14th, 2019 through July 1, 2019 and results will be presented at a special session at Interspeech 2019 in Graz - Austria. Participation in the evaluation is open to all who are interested and willing to comply with the rules laid out in the evaluation plan. There is no cost to participate, though participants are encouraged to submit a paper to the corresponding Interspeech 2019 special session.


For questions not answered in this document or to join the DIHARD mailing list, please contact dihardchallenge@gmail.com
For more information join our mailing list

Evaluation plan

For all details concerning the overall challenge design, tasks, scoring metrics, datasets, rules, and data formats, please consult the latest version of the official evaluation plan:

Important dates

Event Date
  • Registration period
  • January 30 through March 15, 2019
  • Launch: release of DIHARD II development and evaluation sets + scoring code
  • February 28, 2019
  • Scoring server opens
  • March 12, 2019
  • Baselines released
  • Week of March 11, 2019
  • Interspeech paper registration deadline
  • March 29, 2019
  • Interspeech submission deadline
  • April 5, 2019
  • End of challenge/final Interspeech deadline
  • July 1, 2019
  • System descriptions due
  • August 16, 2019
  • Interspeech 2019 special session
  • September 15-19, 2019

    The deadline for submission of final system outputs is July 1, 2019 midnight.

    Organizers

    Kenneth Church
    (Baidu Research, Sunnyvale, CA, USA)
    Christopher Cieri
    (Linguistic Data Consortium)
    Alejandrina Cristia
    (Laboratoire de Sciences Cognitives et Psycholinguistique, ENS, Paris, France)
    Jun Du
    (University of Science and Technology of China, Hefei, China)
    Sriram Ganapathy
    (Electrical Engineering Department, Indian Institute of Science, Bangalore, India)
    Mark Liberman
    (Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA)
    Neville Ryant
    (Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA)

    In collaboration with

    the organizers of the CHiME 5 Challenge



    Communications Team

    Sunghye Cho
    (Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA)
    Rachid Riad
    (Laboratoire de Sciences Cognitives et Psycholinguistique, ENS, Paris, France)
    Lei Sun
    (University of Science and Technology of China, Hefei, China)

    Software

    Scoring

    The official scoring tool is maintained as a github repo (v1.1.0). To score a set of system output RTTMs sys1.rttm, sys2.rttm, ... against corresponding reference RTTMs ref1.rttm, ref2.rttm, ... using the un-partitioned evaluation map (UEM) all.uem, the command line would be:

                      
    $ python score.py -u all.uem -r ref1.rttm ref2.rttm ... -s sys1.rttm sys2.rttm ...
                    

    The overall and per-file results for DER and JER (and many other metrics) will be printed to STDOUT as a table. For additional details about scoring tool usage, please consult the documentation for the github repo.

    Baseline systems

    We provide three software baselines for speech enhancement, speech activity detection, and diarization:

    • Speech enhancement
      The speech enhancement baseline was prepared by Lei Sun and is based on the system used by USTC and iFLYTEK in their submission to DIHARD I:
        Sun, Lei, et al. "Speaker diarization with enhancing speech for the First DIHARD Challenge." (2018). Proceedings of INTERSPEECH 2018. 2793-2797. (paper)
      It is available on github.
    • Speech activity detection
      The speech activity detection baseline uses WebRTC operating on output of audio processed by the speech enhancement baseline and is maintained as part of that github repo.
    • Diarization
      The diarization baseline was prepared by Sriram Ganapathy, Harshah Vardhan MA, and Prachi Singh and is based on the system used by JHU in their submission to DIHARD I with the exception that it omits the Variational-Bayes refinement step:
        Sell, Gregory, et al. (2018). "Diarization is Hard: Some experiences and lessons learned for the JHU team in the Inaugural DIHARD Challenge." Proceedings of INTERSPEECH 2018. 2808-2812. (paper)
      The x-vector extractor and PLDA parameters were trained on VoxCeleb I and II using data augmentation (additive noise and reverberation), while the whitening transformation was learned from the DIHARD II development set.

      The trained system, as well as recipes to produce the baseline results for each track, is available on github.

    Instructions

    Registration

    To register for the evaluation, participants should email dihardchallenge@gmail.com with the subject line "REGISTRATION" and the following details:

    • Organization – the organization competing (e.g., NIST, BBN, SRI)
    • Team name – the name to be displayed on the leaderboard; you need to use that same team name when you register for the competition on CodaLab (see under Results submission)
    • Tracks – which tracks they will be competing in

    Data license agreement

    One participant from each site must sign the data license agreement and return it to LDC: (1) by email to ldc@ldc.upenn.edu or (2) by facsimile, Attention: Membership Office, fax number (+1) 215-573-2175. They will also need to create an LDC Online user account, which will be used to download the dev and eval releases.

    Once the process is complete, this will give you access to all annotation plus the non-CHiME audio.

    Participants of tracks 3 and 4 need to apply separately to Sheffield for the CHiME 5 data regardless of whether you participated in CHiME 5. To apply for the multi-channel data, visit

    Non-profit organizations should sign the non-commercial license. Everyone else, regardless of use case (even if they are only using the data for non-commercial research), should apply for the commercial license.

    Results submission

    Account creation

    • For system submission and scoring, this year we are using an instance of CodaLab hosted at:
    • Each team should create one (and only one) account, which will then be used for submitting ALL of that team’s results for scoring. In CodaLab, the daily and lifetime submission limits are tied to user accounts, so it is imperative that each team use a SINGLE account to make ALL submissions.
    • To create an account, navigate to: and fill out the following fields:
      • username -- username you wish to use; this will be displayed on the leaderboard
      • email -- the contact email address you provided when registering for DIHARD; if you use a different email, when you later attempt to register for a track your request will not be approved
      • password -- password you wish to use for the competition
    • Accept the terms and conditions, and click Sign Up. A confirmation email will then be sent to the email address that you entered. To activate your account, click on the confirmation link in this email.

    Troubleshooting

    • If you do not see a confirmation email, check that it has not been caught by your email provider’s spam filter. You may find it by searching for the subject line “[CodaLab] Confirm email address for your CodaLab account”
    • If you still do not see a confirmation email, try prompting CodaLab to resend it:
    • If you still are unable to get a confirmation email, try using a different email address. Please then let us know at dihardchallenge@gmail.com which address you are using so that we may make a note of this on your registration. This will ensure that when you later register for the tracks, your requests are not denied.
    • Finally, if none of the above work, contact us by email and we will attempt to resolve your issue.

    Setting up your team name

    • In order for your team name to appear next to each submission on the leaderboard, you will need to add it to your CodaLab user profile. Please use the same name you used when registering for the challenge.
    • Access the User Settings page by selecting Settings from your user menu (always found in the top right of the page with your username).
    • Scroll down to the Competition settings section and look for the box titled Team name. Enter your team name into this box.
    • Click Save Changes.

    Registering for tracks

    Results zip archive format

    Submitting results via CodaLab

    • Navigate to the competition page for the track you are submitting to and click on the Participate tab. This will bring up a page that allows you to make new submissions and see previous submissions.
    • In the Method name field, enter the name of the system that you are submitting results for.
    • Click Submit and select the zip file you wish to submit. This will upload the zip file for processing.

    • Below the Submit button you will see a table listing all submissions you have made up to the current date with the following information for each:
      • # -- ordinal number of submission in system; your first submission will be listed as 1
      • SCORE -- DER for the submission; if the scoring is in progress or failed, this will read "---"
      • METHOD NAME -- the name of the system that produced the submission
      • FILENAME -- name of the zip file you submitted
      • SUBMISSION DATE -- date and time of submission in MM/DD/YYY HH:MM:SS format (all times are UDT)
      • STATUS -- the current status of your submission, which may be one of
        • Submitting -- zip file is being uploaded
        • Running -- upload is successful and scoring script is running
        • Finished -- scoring script finished successfully and results posted to leaderboard
        • Failed -- scoring script failed
      • checkmark -- indicates whether or not submission is on the leaderboard
    • If scoring failed for your submission, click the + symbol to the right of its entry in the table. This will display the following, which may be used for debugging purposes:
      • Method name -- the method name you entered into the form
      • Download your submission -- a download link for the zip file submitted
      • View scoring output log -- the scoring program’s output to STDOUT
      • View scoring error -- the scoring program’s output to STDERR
      • Download output from scoring step -- ignore; downloads a zip file containing files used by CodaLab internally

    Leaderboard

    • After your submission finishes scoring (status “Finished”) it will post to the leaderboard, which is viewable from the Results tab.
    • The leaderboard lists the most recent submission for each system by each team, ranked in ascending order by DER.
    • For each submission on the leaderboard, the following fields are displayed:
      • # -- ranking of system
      • User -- the username for the account that submitted the result
      • Entries -- total number of entries by account that submitted result
      • Date of Last Entry -- date of last entry by user that submitted result in MM/DD/YY format
      • Team Name -- name of team associated with user that submitted result; this is taken from the Team listed on the user’s profile
      • Method Name -- the method name entered at submission time
      • DER -- diarization error rate (in percent) of submission; ranking of this result is indicated in parentheses
      • JER -- Jaccard error rate (in percent) of submission; ranking of this result is indicated in parentheses

    Rules

    • Each team MUST use a SINGLE account to submit all results
    • The team name listed in that user’s profile must be identical to the one you registered with.
    • Each team is limited to 6 submissions per day.
    • Submissions that are not scored (status shows as “Failed”) do not count against this limit

    Paper submission

    For challenge participants contributing papers to the Interspeech special session, the deadlines for abstract submission and final paper submission are:
    • Abstract submission -- March 29, 2019, midnight Anywhere on Earth
    • Paper submission -- April 5, 2019, midnight Anywhere on Earth
    • Updates to accepted papers -- July 1, 2019, midnight Anywhere on Earth
    Please follow instructions provided on: As topic, you should choose ONLY the special session:
      13.13 The Second DIHARD Speech Diarization Challenge (DIHARD II)
    • IMPORTANT: Papers must be registered in the Interspeech submission system by March 29 (midnight Anywhere on Earth). While the title, abstract, authors list, and pdf may all be changed after this date, a version MUST be submitted to the system with the correct topic by midnight on March 29.
    • Papers should not repeat the descriptions of the tasks, metrics, datasets, or baseline systems, but should cite the challenge paper using the following citation:
        Ryant et al. (2019). The Second DIHARD Diarization Challenge: Dataset, task, and baselines. Proceedings of INTERSPEECH 2019. ISCA. Graz, Autria.
    • All papers MUST cite the DIHARD II and SEEDLingS corpora using the following citations:
      • Bergelson, E. (2016). Bergelson Seedlings HomeBank Corpus. doi: 10.21415/T5PK6D.
      • Ryant et al. (2019). DIHARD Corpus. Linguistic Data Consortium.
  • Papers may report additional results on other corpora.
  • Accepted papers may update their results on the development and evaluation sets during the paper revision period.
  • System Descriptions

    • At the end of the evaluation, all participating teams must submit a full description of their system with sufficient detail for a fellow researcher to understand the approach and data/computational requirements. System descriptions should adhere to the format described in Appendix F of the evaluation plan.
    • System descriptions should be submitted by email to dihardchallenge@gmail.com. Please include the text "SYSTEM DESCRIPTION" in the subject of the email.
    • The deadline for submitted system descriptions is August 16, 2019, midnight Anywhere on Earth.

    Final results

    At the conclusion of the evaluation, all final system outputs will be archived by the organizers on Zenodo. This archive will contain RTTM outputs for all systems appearing on the final leaderboard as well as scoring output and associated metadata.

    Results

    During the evaluation, all results will be displayed on the CodaLab competition leaderboards: For each track we maintain two leaderboards:
    • one consisting of results submitted prior to the Interspeech paper deadline on April 5th
    • one consisting of all results
    Results from the baseline system are posted to the leaderboard under team name DIHARD. These results are also available from the challenge paper and the baseline github repo.

    FAQ (Frequently Asked Questions)

    1. Must I participate in all tracks in the challenge?
      No, researchers may choose to participate in a subset of the tracks. All participants MUST register for at least one of track 1 or track 3 (diarization from reference SAD). Participation in tracks 2 and 4 is optional. For example, you may participate only in track 1; only in track 3; or in tracks 3 and 4. (Other combinations are possible.)

    2. Must I submit a paper to the Interspeech special session?
      No, you are not required to submit to the special session in order to participate. Submission to the session is strongly encouraged, but not mandatory.

    3. My team wishes to submit a paper to the Interspeech special session. What should we include?
      Papers submitted to the special session should include preliminary results on the development and evaluation sets; these results may be updated during the paper revision period. If they choose to, papers may also report results on other corpora. Papers should not repeat descriptions of the tasks, metrics, datasets, or baseliens, but instead cite the challenge paper. For more details, please consult the paper submission instructions.

    4. Are there any limitations about the training data?
      Participants have the freedom to choose their own training data, whether it is publicly available or not. The only exception is that you should not use data that overlaps with the evaluation set. See the rules section of the evaluation plan for a listing of these sources. Please also note that clear descriptions of the data used are required in the final system descriptions document.

    5. My team previously has acquired access to the full SEEDLingS corpus. Can we use this data for training or development?
      No, the SEEDLingS data, whether acquired via HomeBank or some other route, is off limits for all purposes. This includes training and tuning, but also acoustic adaptation.

    6. My team participated in DIHARD I. Can we use the DIHARD I development and evaluation sets for training or development?
      The DIHARD I evaluation set is off limits for ALL PURPOSES. The DIHARD I development set may be used however you wish, though given that it is a subset of the DIHARD II development set, we expect it to have limited utility.

    7. Can I use the DIHARD II development set to do data simulation and augmentation?
      Yes, development data is free to be used in any way you see fit, including for tuning your current diarization system or augmenting training data.

    8. How can I upload the results?
      Please see the results submission instructions.

    9. Which files should I submit?
      All submissions should consist exclusively of RTTMs output by your system. For tracks 1 and 2 there should be one RTTM per FLAC file in the single channel evaluation set. For tracks 3 and 4, there should be one RTTM per Kinect array in each CHiME-5 evaluation set session. For full details about what to submit and formatting of your submission, please consult the results submission instructions.

    10. For the multichannel tracks (tracks 3 and 4), should we produce one RTTM per Kinect array or one for the entire session
      Please refer to the previous question.

    11. For the multichannel tracks (tracks 3 and 4), can we use multiple Kinect arrays to produce each RTTM? That is, could we opt to use audio from arrays U01, U02, and U03 to produce the RTTM for array U01?
      Participants should produce ONE RTTM per Kinect array, each the output of the system when considering ONLY the channels from that array. For instance, for session S21 they should produce the following RTTMs:
      • S21_U01.rttm -- produced using only the channels from array U01
      • S21_U02.rttm -- produced using only the channels from array U02
      • S21_U03.rttm -- produced using only the channels from array U03
      • S21_U04.rttm -- produced using only the channels from array U04
      • S21_U05.rttm -- produced using only the channels from array U05
      • S21_U06.rttm -- produced using only the channels from array U06

    12. What should I report in the system descriptions document?
      Clear documentation of each system on the final leaderboard is required, providing sufficient detail for a fellow researcher to understand the approach and data/computational requirements. This includes, as mentioned above, explanation of any training data used. For further details, consult the system descriptions instructions.

    13. Are teams with members from multiple organizations allowed?
      Yes, teams spanning multiple organizations are allowed, though one person from each organization within the team must sign and return the LDC Data License Agreement. One individual should serve as the team's point of contact for DIHARD, but every organization with access to the data must sign the evaluation agreement.

    14. I attempted to register an account with CodaLab, but am unable to get a confirmation email. What should I do?
      Please consult our registration troubleshooting tips.

    15. Is it possible to use the information about the number of speakers being 4 in tracks 3 and 4? This information is available in the CHiME-5 website and other CHiME-5 related publications.
      In order to maintain consistency with the single channel tracks, where domain/number of speakers is not known for the evaluation set, using the oracle number of speakers for the CHiME sessions is not allowed.

    Contact Us


    For more information  join our mailing list


    or email us at dihardchallenge@gmail.com