Page 1 of 1
Phonotactics and language identification
Posted: Mon Aug 23, 2010 9:43 am
by Kai_DaiGoji
I remember reading once somewhere that one of the ways we identify words as English or not involves phonotactics - like the whole gostak things, we know that 'distims' and 'doshes' are both potential English words, even though we may not know them specifically. We also recognize a word like 'vlim' is definitely not English.
But my question is this - when faced with words that are clearly possible in English phonotactics, we can still place them fairly accurately in terms of origin. So, for example, the name Tunde Adebimpe is clearly African (probably Niger-Congo) even though with a bit of respelling, it fits the English paradigm. Similarly, 'doroshke' is clearly slavic, though again and for the same etc. What else is going on here?
Posted: Mon Aug 23, 2010 11:26 am
by Echobeats
Sound sequences aren't simply evaluated on a binary choice of "allowed" or "not allowed", but also on probability. Anyone familiar with African names knows that certain combinations are much more probable in Niger-Congo languages than they are in English, even if they fall within the realms of the possible in English.
Kai_DaiGoji wrote:So, for example, the name Tunde Adebimpe is clearly African (probably Niger-Congo) even though with a bit of respelling, it fits the English paradigm.
Since when does English allow word-final short /E/?
Posted: Mon Aug 23, 2010 2:16 pm
by Kai_DaiGoji
Echobeats wrote:Sound sequences aren't simply evaluated on a binary choice of "allowed" or "not allowed", but also on probability. Anyone familiar with African names knows that certain combinations are much more probable in Niger-Congo languages than they are in English, even if they fall within the realms of the possible in English.
Kai_DaiGoji wrote:So, for example, the name Tunde Adebimpe is clearly African (probably Niger-Congo) even though with a bit of respelling, it fits the English paradigm.
Since when does English allow word-final short /E/?
I was reading it as /eɪ/ which is allowed all the time.
Posted: Mon Aug 23, 2010 2:38 pm
by Silk
Kai_DaiGoji wrote:Echobeats wrote:Sound sequences aren't simply evaluated on a binary choice of "allowed" or "not allowed", but also on probability. Anyone familiar with African names knows that certain combinations are much more probable in Niger-Congo languages than they are in English, even if they fall within the realms of the possible in English.
Kai_DaiGoji wrote:So, for example, the name Tunde Adebimpe is clearly African (probably Niger-Congo) even though with a bit of respelling, it fits the English paradigm.
Since when does English allow word-final short /E/?
I was reading it as /eɪ/ which is allowed all the time.
Final <e> is usually silent though in English. It's generally ethnic words/names that we pronounce with /eI/ or /i/. Like
Enrique,
karaoke, etc.
Also, there are certain names that can be misleading. The name
Sam Rainsy at first glance looks like it could be a typical English or American name, but it actually is the name of a
Cambodian politician.
Posted: Mon Aug 23, 2010 2:40 pm
by Morrígan
Echobeats makes an important point about probability and frequency.
The human brain seems to be able to recognize statistical patterns which can be formalized using Markov chains and similar probabilistic models, but beefed up by our knowledge of phonetic classes.
Given any sequence of letters, and any set of Markov models, there is a model which is more likely to generate that sequence than any other model. If the models are taken to be "what a given language looks like" then we will probably think the sequence "looks like a word in language X".
Re: Phonotactics and language identification
Posted: Mon Aug 23, 2010 5:01 pm
by Ulrike Meinhof
Kai_DaiGoji wrote:We also recognize a word like 'vlim' is definitely not English.
Though 'vlog' is.
Posted: Mon Aug 23, 2010 5:30 pm
by finlay
I've read that that's probably just that there aren't any native words with 'vl', not that 'vl' is a disallowed sequence of letters – hence we're actually fairly alright with neologisms like 'vlog' and names like 'Vlad'.
Speaking of which, what happened to Vlad? Did he just decamp to IRC and never look back; is he even still around on IRC, come to that?
Posted: Mon Aug 23, 2010 5:41 pm
by Guitarplayer II
finlay wrote:is he even still around on IRC, come to that?
Yes.
Re: Phonotactics and language identification
Posted: Mon Aug 23, 2010 8:17 pm
by Silk
Dingbats wrote:Kai_DaiGoji wrote:We also recognize a word like 'vlim' is definitely not English.
Though 'vlog' is.
A lot of people prefer to pronounce it vee-log, though.
Re: Phonotactics and language identification
Posted: Mon Aug 23, 2010 9:07 pm
by Morrígan
Silk wrote:Dingbats wrote:Kai_DaiGoji wrote:We also recognize a word like 'vlim' is definitely not English.
Though 'vlog' is.
A lot of people prefer to pronounce it vee-log, though.
What?? Really???? That's horrible.
It's like 'flog' or 'blog' but with a 'v'. I hate people.
Posted: Tue Aug 24, 2010 3:50 am
by Radius Solis
There's also a commercial pickle brand,
Vlasic, which like "Vlad", nobody seems to have any trouble pronouncing. (In fact these two names are my main go-to evidence for when I argue with people about whether there's a difference between a phonology disallowing something and merely having a gap for it - some people conflate these ideas.)
Posted: Tue Aug 24, 2010 5:02 am
by Acid Badger
Radius Solis wrote:(...) "Vlad" (...)
Thinking of people pronouncing this like
vee-lad made me laugh.
Posted: Tue Aug 24, 2010 11:34 am
by LinguistCat
Fanu wrote:Radius Solis wrote:(...) "Vlad" (...)
Thinking of people pronouncing this like
vee-lad made me laugh.
I have heard people put a very short schwa between the v and the l... That or the v becomes syllabic. Either way, the <vl> is not pronounced as a consonant cluster...
Posted: Tue Aug 24, 2010 11:57 am
by Magb
Radius Solis wrote:There's also a commercial pickle brand,
Vlasic, which like "Vlad", nobody seems to have any trouble pronouncing. (In fact these two names are my main go-to evidence for when I argue with people about whether there's a difference between a phonology disallowing something and merely having a gap for it - some people conflate these ideas.)
It strikes me that proper names can sometimes behave a bit like interjections in terms of their phonotactics. For instance consider the fact that many people pronounce "LaTeX" with an [x] at the end despite having no such sound in their native language.
I agree with you that /vl/ is probably an accidental gap of sorts in English, but I think it's worth making a distinction between:
1. Phonotactically impossible clusters like, say, initial /kb/
2. Sequences of sounds that don't appear natively, but which most people have little trouble with -- /vl/ being a good example of this in English
3. Eminently plausible sequences of sounds that just so happen not to exist as words, like
feg or
brud (now someone's gonna tell me that both of these
do exist in some obscure dialect)
The difference between (2) and (3) would be that while some speakers might give pause when asked to read the word
vlim, and possibly pronounce it something like [v@"lIm], no native English speaker would bat an eyelid at
feg. As far as accidental gaps go,
feg is "more accidental" than
vlim. It might be best to imagine it as a continuum of phonotactical acceptability.
Posted: Tue Aug 24, 2010 2:26 pm
by faiuwle
I'd actually be tempted to say vee-log for vlog, just because there's a significant probability that it would be mis-heard as "blog" in context.
Magb wrote:It strikes me that proper names can sometimes behave a bit like interjections in terms of their phonotactics. For instance consider the fact that many people pronounce "LaTeX" with an [x] at the end despite having no such sound in their native language.
Really? Everyone I've met who insists on pronouncing LaTeX the "correct" way says something like [lAtEk]. I don't think I've ever heard a monolingual speaker of AmE who wasn't also a language geek pronounce [x] correctly (i.e. not as [k] or [h]).
Posted: Tue Aug 24, 2010 3:42 pm
by finlay
It looks too much like a pun to be pronounced anything other than like latex with /ks/. Also i've never heard /A/, or [x].
Posted: Tue Aug 24, 2010 5:18 pm
by makvas
finlay wrote:It looks too much like a pun to be pronounced anything other than like latex with /ks/. Also i've never heard /A/, or [x].
I've only heard it called /"leI.tEk/, and of course /"leI.tEks/ but only from those who don't know any better.
Posted: Tue Aug 24, 2010 5:48 pm
by Magb
I didn't mean to start a debate about the pronunciation of LaTeX. I probably shouldn't have used the phrase "many people", but I have heard people use [x] in it. Apparently the pronunciation with [x] is Donald Knuth's pet project. I should've known.
The LaTeX thing was a bad example anyway. I take it back.