I'll try this again - essential binary features

alice · Post by **alice** » Sun Oct 04, 2015 1:39 pm

This thread supercedes, and renders obsolete, the thread I started recently in C&C.

The Big Question is:

"Which binary features are necessary to cover all of the segmental symbols on the IPA chart?"

I've yet to find a satisfactory answer; maybe there isn't one, but there's no harm in asking. From what I've gathered after looking at various sites, the answer seems to be at least the following:

For sonority: syllabic, consonantal/vocalic, approximant, sonorant, continuant
For place of articulation: round, anterior, distributed, high, low, back, tense, ATR, voice, spread glottis, closed glottis
For manner of articulation: continuant again, strident, lateral, delayed release, dental
For vowels: front and back, high and low, round, tense

This leads to at least three Lesser Questions:

1. What have I overlooked?
2. How does [ə] (schwa) differ from [ɜ] or [ɘ] (mid central unrounded vowels)?
3. How does [kʲ] differ from [c]?

More will doubtless follow!

Suprasegmentals aren't so difficult, but any comments would be appreciated here too.

Travis B. · Post by **Travis B.** » Sun Oct 04, 2015 9:47 pm

This is indeed a good question, and from looking at the usages of binary features I have seen, I myself have wondered how they can cover the full range of possible phones, especially in the case of vowels.

Post by **zompist** » Sun Oct 04, 2015 10:13 pm

There's a good discussion of this in Roger Lass's Phonology, with his own proposed set of binary features (an adaptation of Chomsky & Halle).

Having established the system in chapter 5, he then casts doubt on it in chapter 6. He points out that it's rather arbitrary, that there is no particular reason that a set of features should be binary or even integral. He plays a bit with integral systems (these go well with vowel openness or backness).

For vowels, the obvious problem is that neither openness nor backness is binary. There are vowel systems with five degrees of height, and ones with four degrees of backness. One could simply add binary features— though you'll need 3 of them to handle those five-height systems— but what's the advantage over simply numbering the heights?

Similarly, what do you do with Estonian's three vowel lengths? Again, yes, you can just add another binary feature, but why bother?

When you look at acoustic phonetics, or the detailed description of ongoing sound changes (e.g. in Labov), then it's normal to use real numbers instead— e.g. the F1 and F2 formants. In Principles of Linguistic Change: Internal Factors Labov talks about some vocalic sound changes that are easy to explain in terms of two-dimensional formant space, but hard to explain in terms of binary features.

Anyway, I suspect that the whole process is more "analyzing the IPA" than analyzing human language. Not that respected linguists haven't spent a lot of effort trying.

BTW, without looking too closely at your list, you've left out non-pulmonic airstreams, and nasal vowels, and I don't know how you handle co-articulation.

Sumelic · Post by **Sumelic** » Mon Oct 05, 2015 5:14 am

Wikipedia suggests that high-low and front-back are not as phonetically natural as a system with three directions: front, raised, and retracted. Then again, that seems to somewhat disregard the common phonological similarities in the behavior of the high vowels /i/ and /u/.

alice · Post by **alice** » Mon Oct 05, 2015 5:26 am

Much of this was prompted by this chart, which doesn't mention any special features for clicks. The intention is that I can collect a set of binary features large enough to be useful for a SCA, which is why I mentioned the IPA; if the features can describe (most of) the IPA, there are probably enough of them. Of course, this doesn't address the question of whether it's necessary to do everything with binary features in the first place; but integral values can be represented in binary anyway

Perhaps I'm still asking the wrong question, and it should be "what is the best way to represent phonemes/phones in an SCA which can handle both IPA text and featural analysis?". It's very unlikely that I'd need to worry about more than three degrees of vowel backness, for example.

And I forgot to mention a "nasal" feature, yes. Presumably the features of - say - coarticulated [kp] aren't simply the features of [k] plus those of [p]?

Pole, the · Post by **Pole, the** » Mon Oct 05, 2015 9:42 am

alice wrote:2. How does [ə] (schwa) differ from [ɜ] or [ɘ] (mid central unrounded vowels)?

I think in practice [ə] is often used as the lax counterpart of either [ɜ] or [ɘ].

3. How does [kʲ] differ from [c]?

Normatively, in both cases the central part of the tongue is raised towards palatum. However, for [c] this raising causes plosion, but for [kʲ] the plosion occurs farther back, with the back of the tongue touching velum.

In practice, these two are often used interchangeably.

Sumelic · Post by **Sumelic** » Mon Oct 05, 2015 11:29 pm

alice wrote: 3. How does [kʲ] differ from [c]?

Also, Wikipedia says that [c] may also be used to represent an alveolo-palatal [t̠ʲ] (for example, in Hungarian, where it is spelled "ty", and where it is historically derived in some cases from the cluster /tj/, but not as far as I know from /kj/). There's a similar ambiguity with the symbol ɲ, which is often used to represent an alveolo-palatal.
(Personally, I find it unnecessary to use the distinct letter [c] just to represent the fronted allophone of /k/ in languages like French and Italian, but the letter has been used this way.)

Richard W · Post by **Richard W** » Sun Oct 25, 2015 3:36 pm

alice wrote:Presumably the features of - say - coarticulated [kp] aren't simply the features of [k] plus those of [p]?

Given that chart, I think features([kp]) = features([k]) + features[p] works. Have you noticed that feature cont has 3 values - '+', '-' and '±'? The affricates are a bit more awkward - some of the features are taken from the fricative part rather than combined. Consider t͡s v. t͡θ! How would you handle presigmatised stops (e.g. [ˢt]?). Would you use the same encoding as t͡s, but with a fourth value of cont, namely '∓'?

Are you planning to handle tones just as strings of pitches? (I presume tone features such as length and glottalisation will be incorporated in the segmentals, along with creakiness.)

I presume you propose to attempt to support the phonemic domain as well as the phonetic. For that you will need things like syllable boundaries - Thai final stop + syllable-boundary + liquid is not the same as any of the three similar combinations of syllable-boundary + Thai stop + liquid. (The minimal pairs for the orthography and for phonetics are different, so I haven't any examples to hand.) The liquid disappears from the latter but not the former in excited speech.

Richard W · Post by **Richard W** » Sun Oct 25, 2015 5:13 pm

You may wish to consider supporting some abstract tone or pitch designations. For example, some diachronic descriptions will start with tones A, B and C, and some descriptions report generic tones 1 to 8, leaving the phonetics for the language-specific notes. If I were playing with Slavic accent developments, I might want to start with abstract 'H' and 'L' and come up with the standard accent symbols rather than worry about their precise realisations.

Have you enough flexibility to handle a rule like, "Move stress from a prefix to the first syllable after the prefix"? It's a rule one will need if generating a Romance language from a form of Latin close to Classical Latin. One wouldn't have to mark the Romance stress oneself.

Do you have a set of tricky sound changes ready for the next stage? Doing West Germanic consonant gemination without a brute force list of changes and doing the Sanskrit conditional change n > ɳ would be a good test.

alice · Post by **alice** » Mon Oct 26, 2015 6:22 am

Since you ask...

Richard W wrote:Have you noticed that feature cont has 3 values - '+', '-' and '±'?

Yes, and I worked out how to get by with just 2.

Richard W wrote:The affricates are a bit more awkward - some of the features are taken from the fricative part rather than combined. Consider t͡s v. t͡θ!

There's a tradeoff here between doing everything possible and doing (almost) everything useful.

Richard W wrote:How would you handle presigmatised stops (e.g. [ˢt]?). Would you use the same encoding as t͡s, but with a fourth value of cont, namely '∓'?

Unless there's good reason not to, this can be treated as /st/.

Richard W wrote:Are you planning to handle tones just as strings of pitches? (I presume tone features such as length and glottalisation will be incorporated in the segmentals, along with creakiness.)

No.

Richard W wrote:I presume you propose to attempt to support the phonemic domain as well as the phonetic. For that you will need things like syllable boundaries - Thai final stop + syllable-boundary + liquid is not the same as any of the three similar combinations of syllable-boundary + Thai stop + liquid. (The minimal pairs for the orthography and for phonetics are different, so I haven't any examples to hand.) The liquid disappears from the latter but not the former in excited speech.

"attempt" is probably correct

I'm trying to do something more detailed than phonemic, but not going so far into the phonetic that my head explodes. (Which is not very far.)

Richard W wrote:You may wish to consider supporting some abstract tone or pitch designations. For example, some diachronic descriptions will start with tones A, B and C, and some descriptions report generic tones 1 to 8, leaving the phonetics for the language-specific notes. If I were playing with Slavic accent developments, I might want to start with abstract 'H' and 'L' and come up with the standard accent symbols rather than worry about their precise realisations.

I've already considered something very like this.

Richard W wrote:Have you enough flexibility to handle a rule like, "Move stress from a prefix to the first syllable after the prefix"? It's a rule one will need if generating a Romance language from a form of Latin close to Classical Latin. One wouldn't have to mark the Romance stress oneself.

That depends on how an unstressable prefix is identified, and I'd be very interested to know if any other SCAs can do this.

Richard W wrote:Do you have a set of tricky sound changes ready for the next stage? Doing West Germanic consonant gemination without a brute force list of changes and doing the Sanskrit conditional change n > ɳ would be a good test.

Very much yes. The first of these is easy; the second might not be too difficult.

Richard W · Post by **Richard W** » Mon Oct 26, 2015 2:51 pm

alice wrote:
Richard W wrote:Have you enough flexibility to handle a rule like, "Move stress from a prefix to the first syllable after the prefix"? It's a rule one will need if generating a Romance language from a form of Latin close to Classical Latin. One wouldn't have to mark the Romance stress oneself.
That depends on how an unstressable prefix is identified, and I'd be very interested to know if any other SCAs can do this.

For a tool that is purely a specialised string editor, one just adds a boundary marker '#' between the prefix and the rest of the word. Of course, handling junctures systematically gets a bit more complicated.

zompist bboard

I'll try this again - essential binary features

I'll try this again - essential binary features

Re: I'll try this again - essential binary features

Re: I'll try this again - essential binary features

Re: I'll try this again - essential binary features

Re: I'll try this again - essential binary features

Re: I'll try this again - essential binary features

Re: I'll try this again - essential binary features

Re: I'll try this again - essential binary features

Re: I'll try this again - essential binary features

Re: I'll try this again - essential binary features

Re: I'll try this again - essential binary features