zompist wrote:Does anyone have access to the actual article? I'm curious how much of an effect they're finding.
You can access the article
here.
I haven't read the article fully, just skimming, but I think I can help answer some of your questions.
If the effect is very large it would be a bit surprising that no one's noticed. Plus aren't there lots of obvious exceptions? (E.g. for 'nose' there's Mandarin bízi, Hungarian orr, Hebrew af, Swahili pua.) And if the effect is very small, it's hard to figure out what the causal mechanism would be. A weak constraint seems harder to explain than a strong one.
The supplemental information is available
here. Tables S2 and S3 are the most relevant, which list RR, which is the ratio between the frequency of the phoneme in words relating to the concept and the frequency of the phoneme in other words. So, for "ash", the symbol "u" is listed, and RR=1.91, meaning that they found that "u" is 1.91 times more likely to appear in "ash" than to appear in some other word.
The highest RR value reported is 5.12 for small:[tS]. Next highest is tongue:[l], RR=2.77, and then sand:[s], RR=2.58.
Then there's the negative associations, with RR<0. The lowest is (1st person pronoun):[p], RR=0.18, and joint second is (1st person pronoun):[l], RR=0.19, (2nd person pronoun):[l], RR=0.19.
Interesting findings, I'm not quite sure what to make of it.
Also, how well did they test against chance? Some of these things are pretty broad— e.g. 'leaf' has b, p, or l. As a rough estimate, let's say an average language has 20 consonants and an average root has 2 consonants; then in a completely random language you'd get a word following the rule 30% of the time.
Monte Carlo simulations. I don't quite grok the method but it appears to take into account chance comparisons, as they're comparing the results with that of a randomly-generated sound-meaning correspondence.
At the same time, I could see there being something deictic about, say, noses and tongues. A nasal sound for 'nose' is like a phonetic pointer. I wonder if word for 'lip' tend to have labials. An L for 'tongue' is also kind of satisfying— it's a sound that kind of draws attention to the tongue.
(Can't think of a reason why 'sand' would have S, or 'leaf' L. Again, it would be interesting to know how strong these correlations were.)
Yeah. It's interesting that they don't offer any explanations for these (dis)associations, they just simply note that they exist. It's easy to come up with just-so stories for "lip" and "breast" and "nose", but the others are harder. Still, no-one complains when non-arbitrary lexemes are reported in signed languages, so why not spoken languages too? The causal mechanisms are the tough part, though.
Finally, one more worry: they didn't just look directly at their 3600 languages, did they? If they were going for languages rather than families, they could severely distort the effects. E.g., there's a thousand Austronesian languages, and nearly that number of Niger-Congo languages. That's a lot of opportunity to create a pseudo-effect based on cognates.
They classified their word lists into dialects within languages within lineages, and computed their statistics on a per-lineage basis.
Vijay wrote:Why does this article seem to have become so popular all of a sudden? I'm positive it's not new, and I doubt very much that any actual linguists contributed anything to the research beyond maybe data samples, yet I've seen people referring to it three times by now, twice on this forum.
Søren Wichmann and Harald Hammarström are among the authors. They are definitely "actual linguists", having active research agendas in language documentation. The other authors are Peter Stadler, a bioinformatician; Morten Christiansen, a psychologist of language; and Damian Blasi, a graduate student in evolutionary anthropology who focuses on linguistic issues. Also, the article was edited by Anne Cutler, perhaps the world's foremost psycholinguist.
I'm not surprised you've seen this article mentioned on this forum - it is a widely-publicized paper, and this is a forum of language enthusiasts.