Efficiency of languages and conlangs

Plusquamperfekt · Post by **Plusquamperfekt** » Fri Nov 15, 2013 2:38 pm

Hey people,
I'm addressing a question to you which has been bothering me for a long time. Whenever I notice that translations in different languages differ in length, I wonder whether some languages are better in conveying information than others, or in other words, whether some languages need significantly more or less "speaking" to express exactly the same idea. I'm pretty convinced this is the case, but what's the best method to compare the efficiency? My first idea would be comparing the number of syllables, but since the complexity of syllables can vary a lot, I'm not sure if it would not be better to count the phonemes instead. Or simply the length of the whole orthographic output. Or the time you need to pronounce the text...
Any ideas? I think it would be pretty cool if we had a sample text in English and could compare our translations with it, for example with a result like "1 syllable in English = 1.3 syllables in ..." What do you think?

linguoboy · Post by **linguoboy** » Fri Nov 15, 2013 2:40 pm

I think the whole endeavour founders on the impossibility of objectively defining what constitutes "exactly the same idea".

Plusquamperfekt · Post by **Plusquamperfekt** » Fri Nov 15, 2013 2:56 pm

Good point. OK, then let's change our criterion... Instead of trying to measure how much text we need to obtain "exactly the same information" we could try to determine how much text we need to express least the same of what we can find in the English translation. I hope the difference is clear.

Let's say the English text would contain the information {A;B;C}. Then we would not try to investigate how long the closed set {A,B,C} would be in our conlangs, but instead how long the shortest possible text would be that contains at least {A}, {B} and {C}. The question whether {D} and {E} are in our conlang translation as well, would not be relevant anymore.

Radius Solis · Post by **Radius Solis** » Fri Nov 15, 2013 3:46 pm

I think that rescue could be adequate for making the results meaningful, provided you tag the pragmatic status of each bit of information - at minimum, which bits are the part you're trying to convey and which bits serve other functions. Because it's important to make certain you're comparing apples to apples.

"My house is red" and "The red house is mine" are as close as we can reasonably get to 'expressing the same ideas' - but they aren't put together quite the same way, and the difference is important. In the one, you are identifying a house to the listener by who lives there and the main thing you're trying to express is its color. In the other you're identifying the house by its color and the main thing you're trying to convey is who lives there. And as you can see, even within the same language this results in a difference in the minimum number of syllables/words/phonemes required to express it.

Post by **zompist** » Fri Nov 15, 2013 4:36 pm

You can avoid the whole issue of similar content by simply measuring the ratio of morphemes to syllables. This is usually done to estimate fusion vs. agglutination vs. polysynthesis, but small differences would suggest differences in information content.

That said, I'd question the whole concept of "efficiency" here. As was shown long ago by Claude Shannon (this is briefly discussed in the LCK), English has a good deal of redundancy; its information content is about one bit per letter, far lower than the 4-5 bits per letter you'd get in random text. But redundancy is not a bug, it's a feature. We don't hear all the sounds that are produced, and we can understand degraded or obscured speech.

Terra · Post by **Terra** » Fri Nov 15, 2013 4:49 pm

1) Read http://en.wikipedia.org/wiki/Information_theory . It has information that's very relevant to what you want to do.
2) Keep in mind also that "efficient" doesn't necessarily mean fewer syllables/phonemes. Redundancy lets one miss-hear some bits and still interpret/construct the correct message.
3) Also, you'd need a large corpus in each/every language to properly compare them. You'd probably have your best luck with books that are popular enough to have translations into multiple languages, but old enough to have an expired copyright. (The Bible works too.)

Edit: I got ninjad by Zomp.

We don't hear all the sounds that are produced, and we can understand degraded or obscured speech.

Also dialects.

http://www.ling.upenn.edu/phono_atlas/ICSLP4.html wrote:The greatest difficulties for speech recognition are posed not by mergers but by chain shifts of vowels.

Salmoneus · Post by **Salmoneus** » Fri Nov 15, 2013 5:31 pm

linguoboy wrote:I think the whole endeavour founders on the impossibility of objectively defining what constitutes "exactly the same idea".

Also, it would be necessary to compare utterances made in the same (physical and social) contexts. Otherwise, you might be comparing one register with another - and it should be no surprise that if you compare an utterance in ultra-formal court language it may well be more circumlocutious that the same 'idea' conveyed in a bawdyhouse.

So all you have to do then is cross-socially define which contexts equate - exactly which social scenario in imperial thailand equates to which social scenario among the piraha, and so forth.

linguoboy · Post by **linguoboy** » Fri Nov 15, 2013 5:36 pm

Salmoneus wrote:Also, it would be necessary to compare utterances made in the same (physical and social) contexts. Otherwise, you might be comparing one register with another - and it should be no surprise that if you compare an utterance in ultra-formal court language it may well be more circumlocutious that the same 'idea' conveyed in a bawdyhouse.

I don't think register is nearly as important as the issue of pragmatics raised earlier. The more common knowledge the participants share, the less that needs to be expressed explicitly. Consider for a moment how verbose explantions of "nooblang" or "Eddying" would have to be to convey even a fraction of what those terms would express to a longtime member of the board.

Soap · Post by **Soap** » Fri Nov 15, 2013 7:09 pm

I'd say syllables is meaningful but you probabnly shouldnt base a measurement on that alone. A syllable in Japanese gets pronounced a lot more quickly than a syllable in English or Polish, for example.

I'd say there is definitely meaningful measurements to be made though; between closely related languages like Portuguese and Spanish, the Portuguese translation is almost always faster to pronounce even though the spelling looks similar. European languages seem to get slower the further east you go, with English among the fastest and Greek probably the slowest of all.

Herr Dunkel · Post by **Herr Dunkel** » Fri Nov 15, 2013 7:11 pm

I'd disagree. A Greek I hang out with regularly speaks some rapid-fire Greek, as did the Greeks in Greece proper.

Rhetorica · Post by **Rhetorica** » Fri Nov 15, 2013 7:41 pm

I think that has more to do with the phonology—spoken Greek is like burbling water with how many plosives have fricativized (or gone even further towards vowelhood.) Strangely, Korean is also spoken extremely rapidly, despite being rich in very forceful plosives—but the accent is simple and very easy to pick up, which I guess gives it an advantage, and there are a lot of sometimes-repetitive syllables to grind through due to the Altaic (think Mongolian) heritage/contact.

Salmoneus · Post by **Salmoneus** » Fri Nov 15, 2013 7:55 pm

Herr Dunkel wrote:I'd disagree. A Greek I hang out with regularly speaks some rapid-fire Greek, as did the Greeks in Greece proper.

Merc is talking about meaning-per-second, not syllable-per-second or phoneme-per-second, I think. English may well be quite a concise language, after all, at times, but it's not a rapidly spoken one, I don't think (yes, massive reduction of unstressed syllables, but also diphthongs, triphthongs for some of us, and complicated consonant clusters to slow us down).

Herr Dunkel · Post by **Herr Dunkel** » Fri Nov 15, 2013 8:04 pm

Ah, the other kind of slow.
Well, I have very little to say on the meaning-per-second speed of Greek, so, sorry for a bit of a distraction.

Rhetorica · Post by **Rhetorica** » Fri Nov 15, 2013 8:50 pm

Perhaps the most important context is how fast you can rap in a given language?

linguoboy · Post by **linguoboy** » Fri Nov 15, 2013 9:26 pm

Salmoneus wrote:
Herr Dunkel wrote:I'd disagree. A Greek I hang out with regularly speaks some rapid-fire Greek, as did the Greeks in Greece proper.
Merc is talking about meaning-per-second, not syllable-per-second or phoneme-per-second, I think. English may well be quite a concise language, after all, at times, but it's not a rapidly spoken one, I don't think (yes, massive reduction of unstressed syllables, but also diphthongs, triphthongs for some of us, and complicated consonant clusters to slow us down).

Moreover, I thought empirical research into relative speed of languages had demonstrated that these sorts of claims are bunkum. The variation between individual speakers is always greater than the variation between the averages for particular languages, which is generally not statistically significant.

I thought I'd seen this in a Language Log post, but googling brought up instead this highly relevant post.

Tanni · Post by **Tanni** » Sat Nov 16, 2013 7:27 am

Plusquamperfekt wrote:Hey people,
I'm addressing a question to you which has been bothering me for a long time. Whenever I notice that translations in different languages differ in length, I wonder whether some languages are better in conveying information than others, or in other words, whether some languages need significantly more or less "speaking" to express exactly the same idea. I'm pretty convinced this is the case, but what's the best method to compare the efficiency? My first idea would be comparing the number of syllables, but since the complexity of syllables can vary a lot, I'm not sure if it would not be better to count the phonemes instead. Or simply the length of the whole orthographic output. Or the time you need to pronounce the text...
Any ideas? I think it would be pretty cool if we had a sample text in English and could compare our translations with it, for example with a result like "1 syllable in English = 1.3 syllables in ..." What do you think?

Go to a public transportation vehicle. You'll find some messages (e.g. that fare dodging will cost you a lot of money if catched in the act.) translated in different languages. The information content of these messages is quite unique. You'll experience that: English and Turkish are the most ''efficient'' in most of the cases. French is the least ''efficient'' in most of the cases. There are also instances where even the otherwise least efficient language is the most efficient. Maybe counting phonetic features? What about the number phonetical/grammatical processes employed to constitute the message in the languages of the sample?

gmalivuk · Post by **gmalivuk** » Mon Nov 18, 2013 5:54 pm

The problem with most public signs in my experience is that they were usually translated by people more familiar with one of the languages than the other. I'm pretty sure that Spanish, like English, has a more concise way of expressing the idea that the system of intercommunication for passengers is located at the extremes of the train than to say exactly that. The people who came up with the Spanish version of the sign, however, likely didn't know it or couldn't rely on dialect uniformity among Boston's Hispanophone community the same way they can rely on most everyone here knowing the General American English way of saying things.

linguoboy · Post by **linguoboy** » Mon Nov 18, 2013 6:05 pm

A fairer comparison would be to note how similar instructions are expressed on the public transport vehicles of different countries, but even here you would get into the register issues Sal noted because some societies are more formal in their public notices than others. I wager you'd find wide variation in wording just within the Anglosphere.

For instance, on the NYC subway the announcement is, "Stand clear of the doors." On Chicago trains, it used to be, "Watchthe doors, doors are closing." Now it's simply "Doors closing." On the people mover at the Atlanta airport it was, "Caution! Doors do not rebound or spring back." Same basic warning, three totally different ways of expressing it.

Hydroeccentricity · Post by **Hydroeccentricity** » Tue Nov 19, 2013 12:53 pm

I think the thing to look at would be disambiguation.

Let's say you have a sentence like "Ufbu." The root ufb is a verb meaning to feel the tingling sensation you get when your muscles are really tired, and the suffix u is an evidential meaning the act is known indirectly from hearsay. How would you translate this in English? You could say "He tingles, apparently." But this is problematic. You've added a pronoun because English grammar requires it, and you've sacrificed half the meaning of the verb. What are you to do? Make a much longer English sentence that better conveys the exact meaning? Admit that English is "less efficient" than Ufbunese? No. Because when Ufubunesians speak, they are not conveying English meanings. They only need to explain to you which sentence they are saying out of the possible Ufbunese alternatives.

Maybe there are only two evidential suffixes. Maybe there are hundreds. Maybe there are twenty seven homophones all pronounced "ufbu," so a full sentence will require more information or more context to avoid ambiguity. Maybe omitted pronouns always imply a second person subject. Maybe Ufbunesians just don't seem to care about pronouns at all. The amount of information conveyed per second is entirely dependent on internal rules of disambiguation. Even if we measure the length of a typical Ufbunese sentence, we have no way of knowing if they are communicating more complex or less complex ideas in the first place, because those ideas, as expressed in English and Ufbunese, will be dependent on the relationship between words and grammatical rules that are peculiar to each language. We could measure how many seconds of utterance are exchanged in the course of getting something done, like explaining a task or interrogating a prisoner, but the amount of effort needed to do those things will be highly culturally dependent.

So where does that leave us? With no etic standard of semantic, grammatic, or cultural language use, we cannot create a universal standard of "efficiency." The efficiency of languages can therefore not be compared. Perhaps the efficiency of an entire system, where language is not separated from other concerns, might be measurable for some definition of efficiency, but not if only particular utterances are used as the measurements.

EDIT: let's put it another way. Imagine the sentence "I forgot the bag." Translating this sentence into Korean or Japanese would require you to decide if the bag was an accessory or a thin plastic bag for groceries or trash. But you wouldn't have to include any first person pronouns. Which one is more efficient? Can we say the two words for bag double the semantic load of the noun, while including a pronoun multiplies the semantic load of the verb by N, where N is the number of available pronouns? No. Because we don't know yet what other options are available. We don't know what sentences are easily confused for the original one. We don't know what cultural concerns speakers are tuned into. Were they already talking about a bag? Are forgotten bags a bigger problem in Canada than in Japan? It is impossible to measure the amount of information conveyed in the one sentence with the amount of information conveyed in the other as long as the rules for conveying information are language-specific.

Ser · Post by **Ser** » Wed Nov 20, 2013 3:15 pm

Plusquamperfekt wrote:Any ideas? I think it would be pretty cool if we had a sample text in English and could compare our translations with it, for example with a result like "1 syllable in English = 1.3 syllables in ..." What do you think?

I find it very weird you're saying "if we had" (an unreal condition), considering you've been on UniLang for years and there's thousands of threads in the Translations forum where, yes, this phenomenon can be observed. loqu (a (former) UniLanger from Spain) and I (another former UniLanger) have long noticed that no matter what Spanish translations did have a strong tendency to be much longer than the English original sentence in the threads. Maybe you didn't use to come around the Translations forum often, but both of us would marvel every time the Spanish translation turned out shorter than the English (and that happened very few times). Shorter in both terms of phones and in terms of spellings. The tendency of the language to say the same things in more phonemes/graphemes than English was very real, and me and loqu are equally uncertain of what to make of that.

Tanni wrote:Go to a public transportation vehicle. You'll find some messages (e.g. that fare dodging will cost you a lot of money if catched in the act.) translated in different languages. The information content of these messages is quite unique. You'll experience that: English and Turkish are the most ''efficient'' in most of the cases. French is the least ''efficient'' in most of the cases. There are also instances where even the otherwise least efficient language is the most efficient. Maybe counting phonetic features? What about the number phonetical/grammatical processes employed to constitute the message in the languages of the sample?

gmalivuk wrote:The problem with most public signs in my experience is that they were usually translated by people more familiar with one of the languages than the other. I'm pretty sure that Spanish, like English, has a more concise way of expressing the idea that the system of intercommunication for passengers is located at the extremes of the train than to say exactly that. The people who came up with the Spanish version of the sign, however, likely didn't know it or couldn't rely on dialect uniformity among Boston's Hispanophone community the same way they can rely on most everyone here knowing the General American English way of saying things.

...Though in terms of public signs, I'd say Spanish, and French, are simply that long-winded in style in contrast with English. English public signs are often written in a telegraphic style similar to newspaper headlines, with compound nouns and all articles dropped, while this is much less conventional in Spanish and French.

PUSH WINDOW LOWER HALF TO OPEN.
EMPUJE LA MITAD INFERIOR DE LA VENTANA PARA ABRIRLA.

Miekko · Post by **Miekko** » Wed Nov 20, 2013 3:48 pm

Serafín wrote:
Plusquamperfekt wrote:Any ideas? I think it would be pretty cool if we had a sample text in English and could compare our translations with it, for example with a result like "1 syllable in English = 1.3 syllables in ..." What do you think?
I find it very weird you're saying "if we had" (an unreal condition), considering you've been on UniLang for years and there's thousands of threads in the Translations forum where, yes, this phenomenon can be observed. loqu (a (former) UniLanger from Spain) and I (another former UniLanger) have long noticed that no matter what Spanish translations did have a strong tendency to be much longer than the English original sentence in the thread. Maybe you didn't use to come around the Translations forum often, but both of us would marvel every time the Spanish translation turned out shorter than the English (and that happened very few times). Shorter in both terms of phones and in terms of spellings. The tendency of the language to say the same things in more phonemes/graphemes than English was very real, and me and loqu are equally uncertain of what to make of that.

How often have the materials being translated been English -> Spanish, rather than
entirely different language -> English & Spanish
or
Spanish -> English

This can have an impact, e.g. the translation uses unnatural pragmatics to convey the same point, and thus adheres too closely to the structure it has in English.

gmalivuk · Post by **gmalivuk** » Wed Nov 20, 2013 4:56 pm

I'd have to find the pictures I took to be sure, but I vaguely remember Spanish consistently being longer than English translations on the informational signs at Teotihuacán. (Nahuatl was longer than both, by a large margin.)

The thing about English's telegraphic headline and signage style is a good point, though. It still means signs might be a bad example to pick for efficiency, I think, but for a different reason. Maybe the Spanish translations I see really are the simplest reasonable ones, but their English source material is not very representative of how English is actually used most of the time. Large corpora of translations in both directions (and into both languages from a third one), by people fluent in both languages, would be necessary.

However, a still interesting but more manageable comparison would be to replicate Shannon's study within several other languages (maybe this has been done?). As I recall, the 1.1 bits/character figure was the result of cutting out parts of English text at random points and seeing how good native speakers were at guessing what was missing. It might not be a way to directly compare "efficiency" of transmitting the same information, but it would tell you the efficiency of transmitting the information typically communicated in that language.

Post by **zompist** » Wed Nov 20, 2013 6:45 pm

I don't think the Spanish-English thing is that mysterious. English words tend to be shorter, though this is offset by more complicated phonotactics. And Spanish tends to require more syntactic glue.

But Miekko has a good point too... translations can be awkward, and it's not really a fair test of 'efficiency' to compare an original and a translated text.

FWIW, I thought of one counter-example-- Borges's famous invented classification of animals. The original is generally shorter and smoother than the English translation. This is partly because of a syntactic quirk-- Spanish is more hospitable to lists of adjectival phrases-- and also perhaps because after all he is one of the finest writers in the language.

Terra · Post by **Terra** » Thu Nov 21, 2013 6:09 pm

Looking at an instant-meal that I ate today that had Spanish instructions on the back, I notice the following things:
1) Spanish doesn't drop articles, unlike English.
2) Spanish doesn't turn nouns into verbs, unlike English.
-- ("Microwave for 5 minutes." becomes "Cocine en el microondas durante 5 minutos.")
3) Spanish doesn't let nouns modify other nouns, (and instead uses a prep phrase, which obviously requires a prep to be added), unlike English.
-- ("For Food Safety and Quality" becomes "Para Conservar la Calidad y Seguridad de los Alimentos.".)

Also, is the "para conservar" really needed? Would "por" alone not work?

Salmoneus · Post by **Salmoneus** » Thu Nov 21, 2013 6:44 pm

Terra wrote:Looking at an instant-meal that I ate today that had Spanish instructions on the back, I notice the following things:
1) Spanish doesn't drop articles, unlike English.
2) Spanish doesn't turn nouns into verbs, unlike English.
-- ("Microwave for 5 minutes." becomes "Cocine en el microondas durante 5 minutos.")
3) Spanish doesn't let nouns modify other nouns, (and instead uses a prep phrase, which obviously requires a prep to be added), unlike English.
-- ("For Food Safety and Quality" becomes "Para Conservar la Calidad y Seguridad de los Alimentos.".)

Also, is the "para conservar" really needed? Would "por" alone not work?

A lot of that is register effects, though.
Yes, some English ready-meals have "microwave for 5 minutes" on the back, or even just "microwave 5 minutes" or a symbolic version of that. But others have "Cook in the microwave for five minutes" instead. Hell, some have stuff like "heat gently in the microwave for five minutes until ready to eat" and the like. Maybe the Spanish spend more on ready-meals, or just like their instructions to speak to them in a more civil and extravagant manner - after all, if somebody HANDED me a package and barked out "microwave five minutes!", I'd be a bit annoyed with them (unless, of course, it were my job to microwave stuff - but even then, registers in the workplace vary with culture). But for whatever reason we're expected to take it from a piece of cardboard.

In other words: what everybody else said (including me, before).

zompist bboard

Efficiency of languages and conlangs

Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs

Re: Efficiency of languages and conlangs