CALS vs WALS: Part 2 - Nouns

Substantial postings about constructed languages and constructed worlds in general. Good place to mention your own or evaluate someone else's. Put quick questions in C&C Quickies instead.
Post Reply
PTSnoop
Niš
Niš
Posts: 8
Joined: Sun Sep 09, 2007 10:10 pm
Location: Somewhere in the North of England

CALS vs WALS: Part 2 - Nouns

Post by PTSnoop »

PART 1: PHONOLOGY
PART 2: MORPHOLOGY, NOMINAL CATEGORIES, NOMINAL SYNTAX

---

I'd posted this over at the CBB, but I had a feeling I should probably crosspost it here as well. So here you go.

---

So I recently found out about CALS, the conlang world's answer to WALS. And as I noticed that all the categories were the same, and that the numbers of catalogued languages on each side were pretty similar, I started thinking that someone should go through and compare things.

And, by the great tradition of "someone should" => "I should", here we are.

For each of the features in both the WALS and CALS databases, I've converted them to percentages, then subtracted the CALS number from the WALS number. Nothing too mathematically profound, but it should still give us some interesting data. In effect, a value of +10% means that 10% of conlangs have that feature and "shouldn't" have that feature, while -15% means that 15% of conlangs don't have that feature and "should". (Pretty heavy inverted commas there - I'm not trying to be prescriptive - but still a good way of picturing things.)

And because it gives us a *lot* of data, much of it interesting, I've decided to split things up into a few posts so people don't get bogged down in numbers. In general, I've chosen the features with the most extreme positive or negative values, plus a few that I just find interesting. If anyone's curious about any features I've missed off, let me know and I'll throw together another graph.

Just comparing percentages doesn't give you the full picture by any means - an extra 10% on a feature that 75% of natlangs have will show up the same as an extra 10% on a feature that pretty much never happens. But it's not a bad start. Maybe I'll delve into some more complex statistical stuff at a later date.

PART 1: PHONOLOGY

Consonant Inventories

Image

It seems that there's a tendency towards average-sized (19-25) consonant inventories here.

People seem to be shying away more from the very small (6-14) than the very large (34+) inventories - maybe they're not seen as as interesting. ("Just one more phoneme...")

Vowel Inventories

Image

But for vowels, unlike consonants, there's a tendency away from the average. Possibly some of the huge interesting Indo-European vowel systems are pulling people away from the mean, towards larger (7+) inventories.

Voicing in Plosives and Fricatives

Image

There's a strong tendency here - 20% more conlangs have a voicing contrast throughout.

Front Rounded Vowels

Image

And again, possibly fueled by the tendency towards larger vowel systems, we see more conlangers going for the most "interesting" options.

Tone

Image

As people generally assume, lots of conlangs don't have tones. But what surprised me here was how well the languages with tone matched the natlang distribution of complexity of tone system - I'd expected the tonal-conlangers to have gone much more for big dramatic contours-and-sandhi systems over simple two-way contrasts. Maybe there are more pitch-accent langs than I thought...

Stress

Image

Fixed stress seems unpopular. But though I'd have expected to see unpredictable stress as popular, I wouldn't have expected "Right-oriented: one of the last three" to have shown up quite so strongly.

Uncommon Consonants

Image

English rears its ugly head again. Non-sibilant dental fricatives are pretty rare in natlangs, being less common than co-articulated /kp/ - but because English has them (and, if I'm honest, because they're quite a nice-sounding sound) they show up in 18.5% more conlangs than natlangs. Though, to be honest, I was expecting a larger number - the tendency away from tone was larger than this one.

COMING SOON: MORPHOSYNTAX
Last edited by PTSnoop on Fri Jul 12, 2013 5:56 am, edited 1 time in total.

User avatar
2+3 clusivity
Avisaru
Avisaru
Posts: 454
Joined: Fri Mar 16, 2012 5:34 pm

Re: CALS vs WALS: A Comparison

Post by 2+3 clusivity »

This is great, I've been interested to see something like this for a while. Keep it up!
linguoboy wrote:So that's what it looks like when the master satirist is moistened by his own moutarde.

hwhatting
Smeric
Smeric
Posts: 2315
Joined: Fri Sep 13, 2002 2:49 am
Location: Bonn, Germany

Re: CALS vs WALS: A Comparison

Post by hwhatting »

I agree, nice work.
Tautisca has fixed stress (first syllable) - I wouldn't have thought that this is so (relatively) unusual in conlangs!

User avatar
Curlyjimsam
Lebom
Lebom
Posts: 205
Joined: Wed Dec 29, 2004 11:57 am
Location: Elsewhere
Contact:

Re: CALS vs WALS: A Comparison

Post by Curlyjimsam »

I wonder how much of a difference there'd be if you only compared conlangs with European languages: the strongest differences from the worldwide patterns do seem to tend toward common European features.

Interesting about stress - most of my conlangs have fixed stress, so it seems odd for me that other people go for weight-sensitive stress so often.

User avatar
Morrígan
Avisaru
Avisaru
Posts: 396
Joined: Thu Sep 09, 2004 9:33 am
Location: Wizard Tower

Re: CALS vs WALS: A Comparison

Post by Morrígan »

Oh good, and we can download the data as well.

I might have to amuse myself by looking at it in R later. When I studied typology with Matthew Dryer, we spent tons of time looking at correlations between morphological and syntactic patterns with respect to geographic region and language family.

I wonder what we might see if we look at authorship.

PTSnoop
Niš
Niš
Posts: 8
Joined: Sun Sep 09, 2007 10:10 pm
Location: Somewhere in the North of England

Re: CALS vs WALS: A Comparison

Post by PTSnoop »

PART 2: MORPHOLOGY, NOMINAL CATEGORIES, NOMINAL SYNTAX

Morphology was quite a short section, so I've included all the noun stuff as well. And to fit things on the graph, I've increased the y axis from ±30% to ±40%.

Head Or Dependent Marking

Image

General tendency here towards dependent-marking. But interestingly, the trend's away from "Inconsistent or other" rather than "Head marking". Clearly, we need more people to think of crazy inconsistent systems.

Reduplication

Image

This one's the main reason for my change to 40%. There's a *very* strong tendency here away from partial reduplication, scraping my limits at -39.7%.

Number Of Genders

Image

This is one of those places where we're not doing so badly. I'd have expected more of a bias towards no genders (I tend to avoid the things, myself), but if anything, we've got more than we need.

Associative Plurals

Image

Another 30%-breaker, apparently we don't like associative plurals. Or possibly (like me) we'd not really heard of them before...

Definite Articles

Image

Another place where we're not doing too badly. There's a bias towards no articles at all - plausibly to get further away from Standard Indo-European - but not as strong as I'd have thought. Maybe it's time to start reintroducing the things.

Indefinite Pronouns

Image

A strong trend away from interrogative-based indefinite pronouns. (Which is a shame, I like questions like "He ate something?" for "What did he eat?".)

Number of Cases

Image

Vague bell curve here, centered at around three or four cases, and then another big peak for the 10+ case systems. And again, it looks like we need more minimal systems and more inconsistent-borderline systems here.

Ordinal Numerals

Image

The tendency here seems to be towards the regular and consistent "one two three" and "oneth twoth threeth" systems - possibly "first twoth threeth" feels arbitrary and inconsistent. But again, natlangs prove more abitrary and inconsistent than the average conlang...

Distributive Numerals

Image

This is consistent with what we saw about people not really using reduplication before.

Conjunctions and Quantifiers

Image

Another 30%-breaker. Like for indefinite pronouns, we're seeing conlangers more likely to create separate categories instead of just blending in categories we've already got.

Adjectives Without Nouns

Image

Would it be simplistic of me to assume that the "Not without noun" bar slots neatly into the "Without marking" bar, and the "marked by suffix" into the "marked by preceding word" bar? Maybe people who would otherwise have allowed unmarked adjectives-as-nouns decided against them for ambiguity reasons, while preceding-word people decided on suffixes instead? Maybe not, but I can dream.

And and With

Image

And to finish, a nice simple graph, again matching the tendency for conlangers to create multiple categories rather than reusing existing things.

COMING SOON: VERBAL CATEGORIES

User avatar
Legion
Avisaru
Avisaru
Posts: 522
Joined: Sat Mar 05, 2005 9:56 pm

Re: CALS vs WALS: Part 2 - Nouns

Post by Legion »

One area where I expect the eurocentrism of conlangers will dramatically show will be feature 121 (comparative construction) and feature 122 (relativisation on subjects).

Cedh
Sanno
Sanno
Posts: 938
Joined: Tue Nov 14, 2006 10:30 am
Location: Tübingen, Germany
Contact:

Re: CALS vs WALS: A Comparison

Post by Cedh »

PTSnoop wrote:Reduplication
Image
This one's the main reason for my change to 40%. There's a *very* strong tendency here away from partial reduplication, scraping my limits at -39.7%.
I've always had mixed feelings about this category, because to me the most attractive way of using reduplication would be "partial reduplication only". Which is apparently rare enough in natlangs that it's not even considered an option in WALS (although it's reconstructed for Proto-Indo-European, unless there's either a full reduplication construction in PIE that I don't know of, or else PIE reduplication in verb stems is treated as non-productive).

Incidentally, I've listed my own conlang Tmaśareʔ with "No productive reduplication", which definitely isn't the full story. Its nominal plurals, which were formed by productive partial reduplication in an earlier stage of the language, are starting to exhibit effects of sound change, so reduplication is not always obvious anymore (e.g. mera 'dog' ~ mǫra 'dogs'), but new loanwords still regularly form their plural with reduplication of the first syllable. There is no full reduplication in the language though, unless you count the plural of the rare type of nouns whose stem consists of only a single CV syllable (the only two examples in the current lexicon of ~700 words are la 'man' and ha 'eye').

User avatar
Qwynegold
Smeric
Smeric
Posts: 1606
Joined: Thu May 24, 2007 11:34 pm
Location: Stockholm

Re: CALS vs WALS: Part 2 - Nouns

Post by Qwynegold »

I hope you have remembered to remove the natlangs from CALS before counting. :P
Image
My most recent quiz:
Eurovision Song Contest 2018

User avatar
Ser
Smeric
Smeric
Posts: 1542
Joined: Sat Jul 19, 2008 1:55 am
Location: Vancouver, British Columbia / Colombie Britannique, Canada

Re: CALS vs WALS: Part 2 - Nouns

Post by Ser »

Qwynegold wrote:I hope you have remembered to remove the natlangs from CALS before counting. :P
I just checked a couple features and, yeah, he forgot to exclude the natlangs stupidly listed at CALS. So his numbers are actuallly WRONG, and differences between natlang and conlang tendencies ARE LIKELY TO BE EVEN GREATER, since the listed natlangs pull the numbers back towards WALS's.

For example, as for the Reduplication feature, PTSnoop says the difference in full&partial and no-redup is almost 40%. In reality, it's pretty much 48% for both (48.1% for full&partial reduplication, and 47.7% for no-redup).

User avatar
Whimemsz
Avisaru
Avisaru
Posts: 690
Joined: Fri Jun 20, 2003 4:56 pm
Location: Gimaamaa onibaaganing

Re: CALS vs WALS: Part 2 - Nouns

Post by Whimemsz »

This is very interesting, thanks for doing this. (I am happy to see that Hikóómayíi seems to be much more WALSy than CALSy).

Why on earth are natlangs listed on CALS? That doesn't make any sense :\

User avatar
Ser
Smeric
Smeric
Posts: 1542
Joined: Sat Jul 19, 2008 1:55 am
Location: Vancouver, British Columbia / Colombie Britannique, Canada

Re: CALS vs WALS: Part 2 - Nouns

Post by Ser »

Hahahaha, I just looked at the page of Spanish translations:

http://cals.conlang.org/translation/language/spanish/

...They're chock-full of errors typical of an English-speaking learner. :P

But man, the guy even consciously made some crap up: *bisabisabisabuelo?? (it should be tatarabuelo), *en Flander campo!?!? (Spanish is not a Germanic language and cannot do noun-noun compounds like that, it should be en los campos de Flanders).

It might be on the CALS, but Spanish is not a conlang!

User avatar
Salmoneus
Sanno
Sanno
Posts: 3197
Joined: Thu Jan 15, 2004 5:00 pm
Location: One of the dark places of the world

Re: CALS vs WALS: Part 2 - Nouns

Post by Salmoneus »

Serafín wrote:Hahahaha, I just looked at the page of Spanish translations:

http://cals.conlang.org/translation/language/spanish/

...They're chock-full of errors typical of an English-speaking learner. :P

But man, the guy even consciously made some crap up: *bisabisabisabuelo?? (it should be tatarabuelo), *en Flander campo!?!? (Spanish is not a Germanic language and cannot do noun-noun compounds like that, it should be en los campos de Flanders).

It might be on the CALS, but Spanish is not a conlang!
Well, clearly THIS version of Spanish is!
Blog: [url]http://vacuouswastrel.wordpress.com/[/url]

But the river tripped on her by and by, lapping
as though her heart was brook: Why, why, why! Weh, O weh
I'se so silly to be flowing but I no canna stay!

PTSnoop
Niš
Niš
Posts: 8
Joined: Sun Sep 09, 2007 10:10 pm
Location: Somewhere in the North of England

Re: CALS vs WALS: Part 2 - Nouns

Post by PTSnoop »

Wait, there are *natlangs* on CALS? Whose idea was that??

Hmm, that sets things back a bit. I'll go through and retcon the earlier graphs to the conlang-only numbers once I have time.

User avatar
Qwynegold
Smeric
Smeric
Posts: 1606
Joined: Thu May 24, 2007 11:34 pm
Location: Stockholm

Re: CALS vs WALS: Part 2 - Nouns

Post by Qwynegold »

Oh noes! D: You actually overlooked that?
Image
My most recent quiz:
Eurovision Song Contest 2018

cromulant
Avisaru
Avisaru
Posts: 402
Joined: Tue Jul 25, 2006 10:12 pm

Re: CALS vs WALS: Part 2 - Nouns

Post by cromulant »

I did this already. Someone else started to as well.

EDIT: I take that back; you are clearly doing something different.

cromulant
Avisaru
Avisaru
Posts: 402
Joined: Tue Jul 25, 2006 10:12 pm

Re: CALS vs WALS: Part 2 - Nouns

Post by cromulant »

PTSnoop wrote:Wait, there are *natlangs* on CALS? Whose idea was that??

Hmm, that sets things back a bit. I'll go through and retcon the earlier graphs to the conlang-only numbers once I have time.
You're going to have to click on each feature individually, where you'll see a conlang graph next to a natlang graph. You may know this already.

cromulant
Avisaru
Avisaru
Posts: 402
Joined: Tue Jul 25, 2006 10:12 pm

Re: CALS vs WALS: Part 2 - Nouns

Post by cromulant »

Whimemsz wrote:Why on earth are natlangs listed on CALS? That doesn't make any sense :\
It allows you to compare your conlang to various natlangs (or other conlangs), focusing on shared features, different features, or all features (see the "compare" feature in the right-hand toolbar).

It also allows you to compare, at glance, typological tendencies in conlangs vs natlangs. Example.

The inclusion of natlangs is pretty useful.

User avatar
WechtleinUns
Sanci
Sanci
Posts: 59
Joined: Thu Oct 16, 2008 10:45 pm

Re: CALS vs WALS: Part 2 - Nouns

Post by WechtleinUns »

Do you think there might be a correlation between the deviations presented here and the L1's spoken by the conlanging community. My theory is that the native languages of the community would have a weighted affect on the types of conlangs that we create. For a purely hypothetical example, if the majority of the conlanging community speak an L1 where voicing in both plosives and fricatives is dintinguished, then there might be a marked pull towards or away from that kind of feature in constructed languages.

Might be interesting to find out.

Post Reply