University of Washington/Northwestern University Corpora

The University of Washington/Northwestern University (UW/NU) Corpus contains recordings and textgrids of Pacific Northwest and Northern Cities speakers reading a subset of the IEEE "Harvard" sentences. The UW/NU Corpus Version 1.0 has been used to study the effects of dialectal variation on speech intelligibility, while version 2.0 is being used in ongoing research in speech intelligibility and gender interaction. Development is supported by the National Institutes of Health, National Institute on Deafness and Other Communication Disorders grant R01-DC006014. The PN/NC Corpus is well suited for both clinical and research studies where high-fidelity recordings and regional accent control are desirable.

University of Washington/Northwestern University (UW/NU) Corpus 1.0

This corpus contains recordings and textgrids of 20 speakers (10 Northern Cities; 10 Pacific Northwest) reading 180 Harvard IEEE sentences. (3600 audio files; 436MB). All information found here is also contained in the README file included with the corpus. You can download the entire corpus (in compressed .tar.gz format, 436 MB) by request. Please email Richard Wright at rawright@uw.edu for more information.

Citation

Please cite as:
McCloy, D. R., Souza, P. E., Wright, R. A., Haywood, J., Gehani, N., & Rudolph, S. (2013). The UW/NU Corpus. Version 1.0. https://depts.washington.edu/phonlab/projects/uwnu.php
BibTeX   Zotero RDF

Relevant Publications

McCloy, D.R., Wright, R.A., & Souza, P.E. (2014). Talker versus dialect effects on speech intelligibility: A symmetrical study. Language and Speech, 1-16. doi:10.1177/0023830914559234. [manuscript]

Souza, P.E., Gehani, N., Wright, R.A., & McCloy, D.R. (2013). The advantage of knowing the talker. Journal of the American Academy of Audiology, 24, 689–700. doi:10.3766/jaaa.24.8.6. [manuscript]

Audio Files

The corpus includes 3600 audio files in WAV format, sampled at 44.1 kHz with 16-bit depth. Files are readings of 180 sentences by 20 different talkers (5 males and 5 females from each of two dialect regions of American English: the Pacific Northwest and the Northern Cities). The set of audio files has been RMS-normalized to equate intensity across all recordings in the corpus.

TextGrids

A set of 3600 time-aligned transcriptions are included in the corpus. These are TextGrids for use with the praat software[1] that have been automatically generated by the Penn Phonetics lab forced aligner software[2] and are known to contain misalignments. They have NOT been checked or corrected by humans (much less by well-trained phoneticians or speech scientists). Use at your own risk.

Sentences

The sentence texts are drawn from the IEEE “Harvard” set.[3] Transcripts of the 180 sentences (along with their identification numbers) are included in the corpus in tab-delimited format. Individual transcript files for each sentence are also included. Sentence identification numbers are derived from the “list-sentence” notation of the original IEEE sentence lists: for example, sentence 01-07 corresponds to sentence #7 from list #1 of the original numbering scheme.

Filename Conventions

The first two characters in the filenames reflect the dialect region of the talker (PN = Pacific Northwest, NC = Northern Cities). The third character indicates talker gender, and the fourth and fifth characters are meaningless digits, serially assigned to talkers during corpus creation. After an underscore, the sentence identification number comprises the remainder of the filename. For example, file PNM02_01-07.wav is a recording of Pacific Northwest Male #02 reading sentence number 01-07.

References

[1] Boersma, P., & Weenink, D. (2013). Praat: Doing phonetics by computer. http://www.praat.org/

[2] Yuan, J., & Liberman, M. (2008). The Penn Phonetics Lab forced aligner. http://www.ling.upenn.edu/phonetics/p2fa/

[3] Rothauser, E. H., Chapman, W. D., Guttman, N., Hecker, M. H. L., Nordby, K. S., Silbiger, H. R., Urbanek, G. E., & Weinstock, M. (1969). IEEE recommended practice for speech quality measurements. IEEE Transactions on Audio and Electroacoustics, 17, 225–246. DOI: 10.1109/TAU.1969.1162058

University of Washington/Northwestern University (UW/NU) Corpus 2.0

This is the second release of the UW/NU corpus, UW/NU 2.0. All information found here is also contained in the README file included with the corpus. You can download the entire corpus (in compressed .tar.gz format, 436 MB) by request. Please email Richard Wright at rawright@uw.edu for more information. Note that this corpus does NOT contain recordings from UW/NU v.1.0[1].

Citation

Please cite as:
Panfili, L. M., Haywood, J., McCloy, D. R., Souza, P. E., and Wright, R. A. (2017). The UW/NU Corpus, Version 2.0. https://depts.washington.edu/phonlab/projects/uwnu.php

Relevant Publication

McCloy D, Panfili L, John C, Winn M, and Wright R (2018). Gender, the individual, and intelligibility. Poster presented at the 176th Meeting of the Acoustical Society of America, Victoria, BC. [abstract]

Audio Files

The corpus includes 22,460 audio files in WAV format, sampled at 44.1 kHz with 16-bit depth, and high-pass filtered from 60 to 22,000 Hz and smoothed at 100 Hz. Files are readings of the IEEE “Harvard” sentences by 33 different talkers from each of two dialect regions of American English: the Pacific Northwest (11 males, 9 females) and the Northern Cities (7 males, 6 females). Pacific Northwest speakers read the full set of 720 sentences, while Northern Cities speakers read a subset of 620 sentences. Unlike the original UW/NU corpus, this set of audio files has not been RMS-normalized.

TextGrids

A set of 22,460 time-aligned transcriptions are included in the corpus. These are TextGrids for use with the praat software[2] that have been automatically generated by the Penn Phonetics lab forced aligner software[3] and are known to contain misalignments. They have NOT been checked or corrected by humans (much less by well-trained phoneticians or speech scientists). Use at your own risk.

Sentences

The sentence texts are drawn from the IEEE “Harvard” set.[4] Transcripts of the 720 sentences (along with their identification numbers) are included in the corpus in tab-delimited format. Individual transcript files for each sentence are also included. Sentence identification numbers are derived from the “list-sentence” notation of the original IEEE sentence lists: for example, sentence 01-07 corresponds to sentence #7 from list #1 of the original numbering scheme.

Filename Conventions

The first two characters in the filenames reflect the dialect region of the talker (PN = Pacific Northwest, NC = Northern Cities). The third character indicates talker gender, and the fourth and fifth characters are meaningless digits, serially assigned to talkers during corpus creation. (Note that due to increased subject numbers, these digits have increased to three characters from the PN/NC v.1.0.) After an underscore, the sentence identification number comprises the remainder of the filename. For example, file PNM02_01-07.wav is a recording of Pacific Northwest Male #002 reading sentence number 01-07.

Speech Errors

Some of the sentences contain speech errors, such as non-standard pronunciations, unnatural delivery, or pauses. These sentences (.wav and .TextGrid files) are contained in a separate subdirectory for each speaker, called [speakername]-err. If this directory is not present for a given speaker, no sentences contain speech errors.

References

[1] McCloy, D. R., Souza, P. E., Wright, R. A., Haywood, J., Gehani, N., & Rudolph, S. (2013). The UW/NU corpus. Version 1.0. https://depts.washington.edu/phonlab/projects/uwnu.php

[2] Boersma, P., & Weenink, D. (2013). Praat: Doing phonetics by computer. http://www.praat.org/

[3] Yuan, J., & Liberman, M. (2008). The Penn Phonetics Lab forced aligner. http://www.ling.upenn.edu/phonetics/p2fa/

[4] Rothauser, E. H., Chapman, W. D., Guttman, N., Hecker, M. H. L., Nordby, K. S., Silbiger, H. R., Urbanek, G. E., & Weinstock, M. (1969). IEEE recommended practice for speech quality measurements. IEEE Transactions on Audio and Electroacoustics, 17, 225–246. DOI: 10.1109/TAU.1969.1162058