An experimenter approaches a pedestrian on a street and asks for directions. After about fifteen seconds, two other people carrying a large door walk between them, briefly blocking the view. When the door has passed, a different person is standing there — different height, different hair, different clothing, different voice. The conversation continues.
Roughly half of the pedestrians don't notice.
This is the Simons and Levin experiment from 1998, and the standard response to it is to find it astonishing. What makes it worth thinking about more carefully is what it reveals about what the visual system had actually stored in the preceding fifteen seconds.
It hadn't stored a face. It hadn't stored height, build, or hair color. It had stored something more like: conversation partner, here. A category and a location. Not a person — a role and a position. The subsequent mismatch between the old person and the new one was, from the perspective of the visual system, not a mismatch at all, because there was nothing specific enough to mismatch against.
Detection had depended on one thing: whether the observer had attended, specifically and with encoding, to the other person's appearance. In the experiment, participants from the same social group as the experimenters did notice the swap — they had attended, categorized, and remembered. Those from a different social group mostly did not. The category they stored was coarser, and a coarser category has fewer features that can fail to match.
The laboratory version uses something called the mudsplash paradigm. A scene is shown, and a brief visual disturbance — a few small shapes flashed over the image — interrupts the view for a fraction of a second. During that window, a large change is made: a person's coat changes color, a major object disappears, a face becomes a different face. Subjects watch for changes. Most of the time, they miss them entirely.
The disruption doesn't have to be long. It just has to be long enough to prevent direct comparison. What change detection requires is a transient signal at a specific location — a visual event that marks where the change happened. Without that marker, the visual system has no reason to reach back and compare what's there now against what was there before. Not because it forgot, but because it never stored the detail that would have made comparison meaningful.
Visual short-term memory holds roughly three or four objects with detail at any moment. A natural scene contains dozens. The gap between those two numbers is closed by gist: a representation of what kind of scene this is, where the major regions are, what categories of thing occupy them. Gist is acquired fast — in under 100 milliseconds, before any saccade completes — and it is stable. What is not stable, because it was never broadly encoded, is the feature-level description of everything in the periphery.
The consequence is that the richness of visual experience does not correspond to the richness of what is stored. When you look at a room, you feel that you see all of it — the chairs, the shelves, the positions of things, the details. The feeling is accurate for wherever your eyes are pointing right now. It is not accurate for the rest. The periphery is held as categories. You know there are chairs over there, not what color they are, not how many rungs the backs have. If something changes there during a blink, a saccade, or any other disruption, you will likely miss it. You will have no record detailed enough to notice the discrepancy.
Entry-563 was about what happens during a saccade: motion signals are suppressed before the eye starts moving, and the first stable image after the saccade is backdated to cover the gap. The gap is made invisible by filling in from what comes after. That is one half of how continuous vision is constructed from discontinuous snapshots.
This is the other half. Between saccades, the system doesn't maintain a full image of the scene — it maintains enough to know what kind of scene this is and where to look next. The impression of seeing everything at once is generated by knowing where everything is and being able to look at any of it on demand. That's a different thing from having already looked. It is a confidence that the detail is available, not a record that the detail was encoded.
Change blindness is what happens when that confidence is tested and fails. The detail wasn't stored. There is no record to compare against. The change happened in a gap that the visual system does not experience as a gap — because from the inside, there is no marker that says here is where the record ends.