IPA pronouncer program

Discussion of natural languages, or language in general.
Post Reply
DePaw
Lebom
Lebom
Posts: 125
Joined: Wed Mar 18, 2009 5:46 pm

IPA pronouncer program

Post by DePaw »

There are a lot of different text-to-speech programs out there for different languages and such, but surely someone's made one where you put in the IPA and it says the word out loud? Surely that'd be simpler than programming for a specific language even! A program like that would be invaluable for a conlanger...

User avatar
din
Avisaru
Avisaru
Posts: 779
Joined: Wed Jan 10, 2007 10:02 pm
Location: Brussels

Re: IPA pronouncer program

Post by din »

The problem with text to speech programs is that we don't speak in chains of individual phonemes. Our mouths have to transition from one sound to the other. And then you have suprasegmental things like stress and intonation. You couldn't just record a bunch of sounds, assign them to an IPA symbol, and have the program play the recordings. Every phoneme would probably sound syllabic and the intonation would be all over the place (varying from letter to letter), or be incredibly flat and robotic.

This is why text-to-speech technology is still relatively shitty, even for major languages like English. It's not something you can program in a weekend, unfortunately.


The reason why they have to re-do it for every language specifically is that /e/ doesn't sound the same in two languages, our intonation patterns vary a lot, stress is different, etc...
— o noth sidiritt Tormiott

User avatar
Pole, the
Smeric
Smeric
Posts: 1606
Joined: Sat Feb 11, 2012 9:50 am

Re: IPA pronouncer program

Post by Pole, the »

Dude, such a program would be a sangraal (or even sa'angreal) of conlanging!
The conlanger formerly known as “the conlanger formerly known as Pole, the”.

If we don't study the mistakes of the future we're doomed to repeat them for the first time.

Tanni
Niš
Niš
Posts: 13
Joined: Fri Jan 07, 2011 11:08 am

Re: IPA pronouncer program

Post by Tanni »

DePaw wrote:There are a lot of different text-to-speech programs out there for different languages and such, but surely someone's made one where you put in the IPA and it says the word out loud? Surely that'd be simpler than programming for a specific language even! A program like that would be invaluable for a conlanger...
We had this kind of threads already.

What I don't understand is why people just say that it doesn't work, instead of searching for ways to make it work, e.g. by extending the IPA (or X-SAMPA) input by additional mark-up, e.g. the overall time for a given sequence of sounds (constituting a word, as we are on word level). Or providing some kind of intonation/stress/whatever-envelope for a given word. The software then could select appropriate sound samples to achieve the given shape.
An extended and updated version of Mentors and Students concept is available here.

User avatar
Gulliver
Avisaru
Avisaru
Posts: 433
Joined: Mon May 05, 2003 2:58 pm
Location: The West Country
Contact:

Re: IPA pronouncer program

Post by Gulliver »

Tanni wrote:
DePaw wrote:There are a lot of different text-to-speech programs out there for different languages and such, but surely someone's made one where you put in the IPA and it says the word out loud? Surely that'd be simpler than programming for a specific language even! A program like that would be invaluable for a conlanger...
We had this kind of threads already.

What I don't understand is why people just say that it doesn't work, instead of searching for ways to make it work, e.g. by extending the IPA (or X-SAMPA) input by additional mark-up, e.g. the overall time for a given sequence of sounds (constituting a word, as we are on word level). Or providing some kind of intonation/stress/whatever-envelope for a given word. The software then could select appropriate sound samples to achieve the given shape.
I think, at this stage, a Google/Apple/Whatever's text-to-speech module would be more sensible, by setting up IPA as a dummy language and using very strict phonetic input. Maybe. I don't think Google have released a TTS API that would fit the criteria, though. There are probably some Linuxy ones out there.

As other people have said, IPA is generally only used for "near enough" phonemic transcriptions. It is used for phonetic transcription as well, but that tends to get quite unwieldy: [ˈkʰæʔt͡s] (8 symbols) vs /kæts/ (4 symbols), and that's a relatively broad phonetic transcription.

User avatar
Herr Dunkel
Smeric
Smeric
Posts: 1088
Joined: Mon Jun 21, 2010 3:21 pm
Location: In this multiverse or another

Re: IPA pronouncer program

Post by Herr Dunkel »

Gulliver wrote: As other people have said, IPA is generally only used for "near enough" phonemic transcriptions. It is used for phonetic transcription as well, but that tends to get quite unwieldy: [ˈkʰæʔt͡s] (8 symbols) vs /kæts/ (4 symbols), and that's a relatively broad phonetic transcription.
Make that seven, since you don't need to indicate stress on monosyllables
sano wrote:
To my dearest Darkgamma,
http://www.dazzlejunction.com/greetings/thanks/thank-you-bear.gif
Sincerely,
sano

User avatar
Boşkoventi
Lebom
Lebom
Posts: 157
Joined: Mon Aug 14, 2006 4:22 pm
Location: Somewhere north of Dixieland

Re: IPA pronouncer program

Post by Boşkoventi »

Tanni wrote:What I don't understand is why people just say that it doesn't work, instead of searching for ways to make it work, e.g. by extending the IPA (or X-SAMPA) input by additional mark-up ...
People don't "just" anything. They're telling you why it's harder than you think. And ...

Extending the IPA would really only make the problem worse. What people don't understand about text-to-speech software is this:
din wrote:The problem with text to speech programs is that we don't speak in chains of individual phonemes. Our mouths have to transition from one sound to the other.
As an example, the [a] in [ka] is not exactly the same as the [a] in [ta], which is not the same as the [a] in [b̤ʲa]. (see: Formants) You basically have two options: 1) synthesize the sounds from scratch using some sort of complex model (which is far more computationally intensive than you might think), or 2) record every combination of two segments and combine them. Also harder than you might think, for reasons I will explain shortly. (There may be other approaches, but they'd have to boil down to some combination of these two.)

There's a program called MBROLA which takes the latter approach, and it works fairly well. But ... it has to have a recording of every combination of two segments for each language. And a whole new set for each speaker voice (male, female, young, old, etc.). Now, you could do this for the IPA (ignoring the fact that French /e/ may not be the same as German /e/, etc.), but the IPA has far more symbols, and therefore far more possible combinations, than any language. By my estimates (and these are very rough estimates -- e.g. not every diacritic that applies to consonants applies to every consonant), the IPA has:

82 basic consonant symbols
-- with 20 possible diacritics (21 possibilities counting no diacritic)
... for a total of 1722 consonants

28 basic vowel symbols
-- with 15 possible diacritics (16 possibilities counting no diacritic)
... for a total of 420 vowels
(Not counting length or tone / pitch, since they seem to be relatively easy to manipulate.)

(And this isn't even counting coarticulated consonants or diphthongs, both of which are a-whole-nother can of worms. Or hell, even affricates.)

This gives us a total of 2142 segments (consonant or vowel). If we want this thing to be useful, we have to record every pair of segments, of whatever kind ... CV, VC, CC, VV ... and there are 4,588,164 of them! That's four and a half million individual sound samples. That's an awful lot of data, and an awful lot of time spent recording it (we can't even use existing recordings of people talking because for this project we need neutral accents, since we're ignoring differences between languages).

And then ... for a particular utterance, you have to combine/merge all the necessary samples ([fənɛɾɪks] = [fə] + [ən] + [nɛ] + [ɛɾ] + [ɾɪ] + [ɪk] + [ks] ... easier said than done), and then adjust for length of segments, and for tone / pitch / intonation. And oh by the way, build in a way for the user to set those things (MBROLA can do a pretty good job of this, but you have to define the pitch contour for each syllable -- in terms of Hz and milliseconds -- to get anything remotely natural sounding).



TL;DR: It's much harder than you think.
Radius Solis wrote:The scientific method! It works, bitches.
Είναι όλα Ελληνικά για μένα.

User avatar
Gulliver
Avisaru
Avisaru
Posts: 433
Joined: Mon May 05, 2003 2:58 pm
Location: The West Country
Contact:

Re: IPA pronouncer program

Post by Gulliver »

Herr Dunkel wrote:
Gulliver wrote: As other people have said, IPA is generally only used for "near enough" phonemic transcriptions. It is used for phonetic transcription as well, but that tends to get quite unwieldy: [ˈkʰæʔt͡s] (8 symbols) vs /kæts/ (4 symbols), and that's a relatively broad phonetic transcription.
Make that seven, since you don't need to indicate stress on monosyllables
I copied and pasted that from something else. Sentence stress, possibly?

On a more general note, why would you actually want one? I really don't see how this would be a holy grail (as someone indicated earlier using old French for some reason). Yes, you could have the robot voice read your conlangs, but it would still be a robot voice. Unless your speakers have human speech organs apart from their robot voice boxes, it would sound goofy.

User avatar
Melteor
Lebom
Lebom
Posts: 229
Joined: Sat Dec 27, 2008 3:26 pm

Re: IPA pronouncer program

Post by Melteor »

Text-to-speech is low on people's priorities is the main reason. I'm not really aware of any software that incorporates intonological work (though I'm not sure anyone would advertise whether or not they do.) If you want, check out GNUspeech which does model the whole vocal tract. I mean, there's plenty of free stuff like http://www.abair.tcd.ie/?page=synthesis&lang=eng, which has some pretty ambitious researchers behind it.

I think if you wanted to, you could try for some sort of "whispering" voice that uses as many consonants as possible. I could see some benefit, because whispering is hard on the voice and I haven't been able to find a synthesizer, and because I think it could be done fairly convincingly.

User avatar
Drydic
Smeric
Smeric
Posts: 1652
Joined: Tue Oct 08, 2002 12:23 pm
Location: I am a prisoner in my own mind.
Contact:

Re: IPA pronouncer program

Post by Drydic »

Gulliver wrote:
Herr Dunkel wrote:
Gulliver wrote: As other people have said, IPA is generally only used for "near enough" phonemic transcriptions. It is used for phonetic transcription as well, but that tends to get quite unwieldy: [ˈkʰæʔt͡s] (8 symbols) vs /kæts/ (4 symbols), and that's a relatively broad phonetic transcription.
Make that seven, since you don't need to indicate stress on monosyllables
I copied and pasted that from something else. Sentence stress, possibly?

On a more general note, why would you actually want one? I really don't see how this would be a holy grail (as someone indicated earlier using old French for some reason).
YOUR FATHER SMELT OF ELDERBERRIES!
Image Image
Common Zein Scratchpad & other Stuffs! OMG AN ACTUAL CONPOST WTFBBQ

Formerly known as Drydic.

User avatar
R.Rusanov
Avisaru
Avisaru
Posts: 393
Joined: Sat Jan 05, 2013 1:59 pm
Location: Novo-je Orĭlovo

Re: IPA pronouncer program

Post by R.Rusanov »

Boşkoventi wrote: ... 420 vowels ...
Blaze erryday
Slava, čĭstŭ, hrabrostĭ!

Post Reply