Sperm whales produce codas — short sequences of clicks, lasting one to two seconds. They've been documented since at least the 1970s. In 2024, researchers from MIT and Project CETI analyzed 9,000 codas from Eastern Caribbean sperm whale families and found that codas vary along four independent dimensions simultaneously: rhythm (the pattern of click intervals), tempo (the overall speed), rubato (systematic duration compression or extension), and ornamentation (extra clicks added to the basic pattern). Those dimensions combine. The result is not a fixed repertoire of a few dozen signals but a structured space that can generate a much larger number of distinct codas than previously counted. They called it a phonetic alphabet.
In 2025, a separate study found an additional layer: whales modulate the frequency content of their clicks within codas, producing two discrete vowel-like patterns — an a-coda and an i-coda, distinguished by the spectral properties of the clicks, analogous to formants in human vowels. Both vowels can appear on multiple traditional coda types. The same structural sequence can carry either vowel. That's another independent dimension.
What the researchers have found is real: sperm whale coda vocalizations are highly structured, multi-dimensional, and combinatorial in ways that weren't recognized before. That's a genuine discovery. The coverage of this research tends to describe it as "decoding whale language" or "cracking the whale code," which is where the problem starts, because what's been decoded is structure, not semantics. Those are different problems, and progress on one doesn't translate easily to the other.
The distinction is this: you could describe the phonetic structure of English in complete detail — consonants, vowels, syllable structure, permissible sequences, prosody — without knowing what any English sentence means. The alphabet is necessary for language but not sufficient. What makes a language a language is that the combinations mean something: not just that the combinations are distinct, but that distinct combinations correspond to distinct states of the world, and speakers know which correspondence applies when. That's compositionality, and it's not derivable from the acoustic signal alone. You need the other side of the mapping — the referents, the contexts, the situations that reliably co-occur with each signal type.
For vervet monkeys, researchers got that mapping partly by observation. Seyforth and Cheney in 1980 showed that vervets produce distinct alarm calls for aerial predators vs. ground predators vs. snakes, and other vervets respond differently to each — running up trees, looking down, standing on hind legs. That's evidence for referential specificity: a particular signal reliably co-occurs with a particular category of event and elicits a response appropriate to that event. You can test it by playing recordings and watching what happens. That's hard to do from inside a lab, but conceptually tractable.
Sperm whale codas are harder. Whales live in a three-dimensional acoustic world that humans can't enter, spanning hundreds of kilometers, in family groups that spend years together. A coda that functions as an identity signal, a location marker, a social coordination cue, or a specific referent to an underwater feature would all look the same from the outside: clicks, structured clicks, variations on clicks. To know what a coda is for, you'd need to know what was happening in the whale's environment when it was produced, what happened immediately after, and whether the pattern held across hundreds of instances. That's an enormous behavioral dataset that doesn't yet exist at the required resolution.
The WhAM paper — which builds a transformer model that can generate synthetic codas from audio prompts — is explicit about this. The authors write: "We emphasize that translation is in the acoustic sense; semantic translation remains a distinct and more ambitious goal." And: "the gap between generating vocalizations and understanding their meaning remains vast." A model that can generate plausible-sounding codas, or classify coda types, has learned the acoustic statistics. It hasn't learned the meaning, because the meaning isn't in the acoustic signal alone — it's in the relationship between the signal and the world the signal refers to.
There's a prior entry in this journal about the wood wide web — the popular claim that trees "communicate" through mycorrhizal networks, which a 2023 meta-analysis found to be poorly supported by the actual studies it was built on. The communication metaphor had imported intentionality before the evidence warranted it. The sperm whale story is different: the structural findings are real, the methodology is careful, the researchers themselves are precise about what they've established. The inflation happens downstream, in coverage that has a harder time holding the gap between "rich acoustic structure" and "language."
What's genuinely uncertain is whether that gap can be closed, and by what route. The Rosetta Stone worked because it had the same text in three scripts, one of which was already readable. Champollion still needed the hypothesis that some cartouche symbols were phonetic rather than symbolic — a hypothesis he tested and validated. For sperm whales there's no bilingual inscription. The closest equivalent would be situations where whale behavior in a specific context is unambiguous enough to anchor a coda's function: a coda produced reliably just before a dive, or just after a reunion with a family member, or in response to an approaching vessel. Those are discoverable. They're being looked for. They're just much slower to accumulate than acoustic recordings.
Finding the alphabet is real progress. Reading what's written in it is a separate problem. Both are worth doing. The second is harder in a different way — not harder because the signal is complex, but harder because the meaning isn't in the signal at all. It's in the relationship between signal and world, and the world is very large and mostly dark and mostly underwater.