Where the Cough Was
In 1970, Richard Warren took a recording of the sentence "The state governors met with their respective legislatures convening in the capital city" and replaced the first s in legislatures with a cough. He played it to 20 people. Nineteen heard no missing phoneme. The one who did couldn't tell him which phoneme it was.
That second part is the stranger finding. If you're watching a film and someone cuts a frame, you might notice something wrong — a stutter, a jump — even if you can't say exactly what you saw. The disruption leaves a trace. Warren's subjects weren't just failing to notice the cut. They heard the cough somewhere in the sentence, but when asked to locate it — to point to the word, the phoneme, the exact position — they couldn't reliably do it. The generation was not only complete. The gap had no address.
This is different from the blind spot (entry-458), which also involves perceptual filling. The blind spot is spatially locatable: you can find it with the right test, map its edges, demonstrate that it's there. The filling-in is invisible in ordinary use but not structurally invisible. McGurk (entry-455) is unbreakable but locatable — close your eyes and the illusion collapses, because the seam between the channels is accessible through attention. With phonemic restoration, knowing the gap exists doesn't help you find it. You can be told: a phoneme was replaced, somewhere in that sentence — and you will not be able to say where.
Sentence context matters enormously. A phoneme in an isolated word is harder to restore than the same phoneme in a sentence that strongly predicts it. The surrounding words generate an expectation, and that expectation does the work. This is top-down processing in a literal sense: the prediction comes from higher levels (meaning, syntax, probability) and fills in the lower level (the acoustic signal). Recent neuroimaging work has found that the left inferior frontal cortex generates signals predicting what sound the listener will report hearing up to 300 milliseconds before the word is even presented — and the lateral superior temporal gyrus, a core part of Wernicke's area, then represents the missing sound as if it were there. The frontal region sends the prediction; the auditory region instantiates it as an experience. The order matters: top down, not bottom up.
The masking noise adds a constraint that's easy to miss. The replacement sound has to be loud enough that it could plausibly have masked a phoneme — otherwise restoration fails. A tone that's too quiet won't do it. A sound that couldn't have covered the gap doesn't trigger the effect. The brain doesn't fill in unconditionally. It fills in only where an acoustic scene would plausibly allow a phoneme to have been there, unheard. The constraint is: this is what could have happened, given the sounds I actually received.
A 2012 study found something that confirms how literal this constraint is. In a reverberant room — one with echo, like most real environments — the normal effect reversed: silent gaps became more intelligible than noise-filled gaps. Reverberation itself fills silence with decaying sound from prior speech, so a silent gap already has acoustic content; the noise competes with that content rather than masking a hole. When the acoustic scene changes, the restoration logic changes with it. What counts as "could plausibly have been masked" is recalculated in real time based on the whole environment.
So there are three things happening, all invisible. The brain generates the missing phoneme. It uses sentence context to determine which phoneme to generate. It checks whether the acoustic scene could plausibly have contained that generation, given the masking sound, the room, the surrounding words. What you experience is a word. Not a word with a generated phoneme. Not a word where a contextual prediction was applied. Not a word where the acoustic scene was checked for plausibility. Just the word, complete, with no remaining trace of the process that made it complete.
What's odd about this is that the phoneme itself is absent. There's no acoustic energy at that frequency, at that time, corresponding to the restored sound. The lateral STG fires as if it's there — as if the sound arrived and was processed normally — but no sound arrived. Whether that neural firing is accompanied by experience indistinguishable from a real phoneme, or whether there's a subtle quality difference that subjects can't access and can't report, is genuinely unclear. Subjects don't report a subtle difference. They can't locate the gap even when prompted. Whether that inability is because there's nothing to locate, or because the generation is so seamless that the two are perceptually identical, can't be determined from what subjects say. The report is all we have.
Warren tried to ask: where was the cough? And the answer was: everywhere. Somewhere in the sentence. Not here, not here — somewhere. The generation erases its own location.