Miekko wrote:One thing we could do here is:
- reconstruct Vulgar Latin from modern Romance languages
- reconstruct Latin from intermediate reconstructions
Compare the result with the real thing.
Hasn't at least one of those been done?
Anyway, I thought the discussion was about determining relationships rather than determining the protolanguage. I could well imagine that errors in the latter do grow exponentially, but the situation is different in the latter, because we aren't interested in the absolute strength of the hypothesis (after all, it's well-recognized all reconstructions are approximations) but its strength relativ to other relationship hypotheses. This doesn't even decrease monotonically! For example, take Lithuanian:
— We're pretty sure Latvian is its closest relative.
— We're also somewhat sure that the Slavic languages are the next-closest relatives.
— We're not sure at all what's the next closest… Indo-Iranian? Greco-Aryan? Albanian? Germanic?
— However, we are again pretty sure that all of those
are closer related to Lithuanian than, say, Turkic.
And even while I have little idea if Lithuanian (or IE in general) is closer related to Turkic than to Dravidian or Sino-Tibetan, I'm also reasonably sure it is closer related to all of those than it is to !Xóõ.
I did not use the comparativ method it to reach that last conclusion tho, but as far as that goes, the point is that the cutoff point where we can no long use its results to say much about the protolanguage is not the cutoff point where we can not longer say much about the cladogram. Suppose we're only 80% sure there is a good reconstruction of Proto-Penutian; if we're simultaneously 80% sure there isn't a good reconstruction of Proto-Algic-Utian, then that means we're 96% sure that Algic is not a branch of Penutian. Similarly, even a pretty crappy reconstruction of Proto-Nostratic will have implications in the absense of any kind of hints of, say, a Proto-Indo-Austronesian relationship.
It is actually a required assumption here that the strength of a reconstruction
does decrease monotonically by age. Otherwise, the 16% odds for both the Proto-Penutian and Proto-Algic-Utian reconstructions being on the right track couldn't be counted as a case where we have evidence against the inclusion of Algic in Penutian. (Well, unless the reconstructions would be sufficiently close for this to become a case similar to the one of IE subgroups.)
Now if we want to put numbers on how fast exactly a reconstructions's usefulness tends to zero, we'll first want an idea how to quantify the error in a reconstruction. Do words that were there but we didn't manage to reconstruct count as errors? (Perhaps, but that might require weighing by frequency.) Does reconstructing *sudɪ rather than *tsudi count as less of an error than reconstructing *hudə? (I think it should — the first is only off by two features/sound changes, the latter by about four.) What about the semantics??