Paused Transcription Test (Lange & Matthews, 2020)

Published: 08-09-2020| Version 1 | DOI: 10.17632/g278w62zpg.1
Kriss Lange,
Joshua Matthews


Paused Transcription Test (Lange & Matthews, 2020) This data is a listening test for English language learners. It was designed to measure lexical segmentation, or the ability to identify word boundaries in connected speech, as well as aural decoding, the ability to identify and recognize words in speech. In paused transcription tests, the test-taker listens to an audio recording and at irregular points in the recording, which correspond to the target items selected for the test, a pause is inserted. During this brief pause, the test-taker tries to transcribe the last phrase of three to five words which immediately preceded the pause. The recording resumes playback after the pause and the test-taker continues listening and transcribing the phrases heard before each pause. One aspect of the paused transcription testing format that is difficult to achieve with other tests of lexical segmentation is that it allows for the test-taker to apply their understanding of the aural co-text as well as their own background knowledge to the task of transcribing the target phrase (Field, 2008). By contrast, standard dictation tests or partial dictation tests usually require the listener to provide the target items without the benefit of hearing a significant amount of the target words’ surrounding co-text. The duration of the audio for each section of the paused transcription test was between 10 to 12 minutes. Each section of the test contained 12 target phrases of three words each for a total of 180 items. A 15-second pause was inserted in the audio text after the intonation unit containing each target phrase. All pauses were located in the speech of the native speaker in an effort to standardize the acoustic features of the target phrases. High-frequency vocabulary was almost exclusively used in order to minimize potential errors in lexical segmentation due to inadequate vocabulary knowledge. The vocabulary used in the test was analyzed for frequency in the combined COCA/BNC 1-25K corpus using the online computer program Compleat Web VP (Cobb, 2018). Results showed that 94.8% of the 5,278 tokens used in the test were within the first 1,000-word frequency band, 3.30% were in the second, 0.60% in the third, 0.30% in the fourth, 0.50% in the fifth, and 0.10% in the sixth 1,000-word frequency band with the remaining 0.44% of words not included in the corpora (i.e., offlist). A separate frequency analysis of the 60 target phrases showed that 97.2% of the 180 target words were within the first 1,000-word frequency band, 1.70% were in the second and 0.60% in the third. Only five target words were beyond the first 1,000-word frequency band. Cobb, T. Compleat Web VP v.2 [computer program]. Retrieved 01 Nov 2018 from Field, J. (2008). Bricks or mortar: Which parts of the input does a second language listener rely on? TESOL Quarterly, 42(3), 411–432.