zompist bboard

Posted: **Thu Dec 30, 2010 8:32 pm**

I was wondering if there was a good IPA-to-sound converter available somewhere, whether online or as a download. Just something where I can type "ˌfoʊnəˈtɪʃən" and get some semi-reasonable approximation of the word.

I know of systems (e.g., MBROLA) that let you control output with very fine controls (number of milliseconds, pitch, and all that) but this is more than I'm looking for -- I want something that is simple to use. (In a pinch maybe I could use such a system if I wrote a conversion layer first, but that seems like something that has probably already been done better.)

I know of
http://www2.research.att.com/~ttsweb/tts/demo.php
which can be used the way I want by typing something like

Code: Select all

<phoneme alphabet="ipa" ph="ˌfoʊnəˈtɪʃən"> </phoneme>

but it's pretty flaky -- I had to try this one four times to make it work. (Also, I suspect it will fail with some of the more interesting sounds, but I'll take what I can get in that regard; even a system that only does basic English and Romance sounds would be useful to me.)

And yes, I know of Paul Meier's IPA chart, but since that doesn't let you string sounds together it wouldn't be useful for me.

Bonus points for a system that lets you pass a string in a URL (or in a HTTP POST variable). Further bonus points for a system with liberal licensing.

Posted: **Fri Dec 31, 2010 10:23 am**

Doesn't exist – what you've posted is closer than anything I've ever seen before, even though it clearly only works for english.

The main problem is the sheer size of such a project – without even getting started on the diacritics, the offglides and transitions from consonant to vowel are very subtle and could sound very wrong if you combined them wrongly. Basically, imagine you have to record [t] differently for each vowel it could occur next to...

Posted: **Fri Dec 31, 2010 10:28 am**

I don't think it would be that difficult..Sure, stops would have to be recorded separately for different vowels, but for something basic, just to get the flavor of a new language, a one-to-one regurgitation of sounds would suffice.

Posted: **Fri Dec 31, 2010 11:05 am**

treskro3 wrote:I don't think it would be that difficult..Sure, stops would have to be recorded separately for different vowels, but for something basic, just to get the flavor of a new language, a one-to-one regurgitation of sounds would suffice.

Stops would have to be separately recorded for every secondary articulation, plus vowels, consonant clusters, etc.

Posted: **Fri Dec 31, 2010 11:28 am**

Eh, why would we need to go into that level of detail? The Text to Speech software that already exists doesnt do that, and it may not be perfectly normal sounding but it gets us by.

Posted: **Fri Dec 31, 2010 3:26 pm**

Soap wrote:Eh, why would we need to go into that level of detail? The Text to Speech software that already exists doesnt do that, and it may not be perfectly normal sounding but it gets us by.

Why are you asking why? I mean, wouldn't it be cool to be able to produce sounds from arbitrary languages just using IPA input? This would mean we could generate sounds from "unsupported" languages just by knowing the IPA. It's certainly interesting and possible, but yes, a large undertaking.

Posted: **Fri Dec 31, 2010 3:42 pm**

finlay wrote:Doesn't exist – what you've posted is closer than anything I've ever seen before, even though it clearly only works for english.

and Spanish, Italian, German and French.

I use that site to make my sound samples for Kala because the phonology fits with Spanish.

Posted: **Fri Dec 31, 2010 5:00 pm**

If not actual strings of segments, it would be neat to have something that could combine features– what if one wants to hear a nasalized /s/, instead of just a nasalized /t/? Or maybe something like one of PIE's aspirated labiovelars, /gʷʰ/?

Posted: **Fri Dec 31, 2010 6:24 pm**

An issue with IPA-to-speech is that to accurately represent convert IPA for a given language into audio, one would need a lot of detailed phonetic information that most transcriptions, even my infamous transcriptions of English, simply do not provide. IPA actually has a significant range of ambiguity in how it is actually used, and while human readers of IPA normally know what is actually meant by any given IPA and do not necessarily care about all the phonetic details omitted, to actually generate audio accurately for what some given IPA represents one would need to know all these details.

Posted: **Sat Jan 01, 2011 4:12 pm**

He's not asking for perfectly precise renderings.

All we need is a system that will take basic IPA (or X-SAMPA, or some other phonetic notation) and convert the segments to audio without bias toward a specific language. But you don't need to use separate recordings for every permutation of features, and really the amount of information in even broad transcriptions should be enough to give a reasonable rendering, even when interpreted literally. It won't be perfect, but it would be nice to have a language-independent system.

Actually a better idea would be to use the IPA symbols as shorthand for the parameters of an articulatory synthesis engine, or something, instead of using prerecorded audio. Yes, you would still need data for each symbol, but in this case you're using CPU time instead of RAM.

Imagine, they could get away with so much more by using a phonetic alphabet instead of dedicating hundreds of lines of code to parsing a particular language's orthography.

Posted: **Mon Jan 03, 2011 4:06 am**

Bedelato wrote:He's not asking for perfectly precise renderings.

All we need is a system that will take basic IPA (or X-SAMPA, or some other phonetic notation) and convert the segments to audio without bias toward a specific language.

This is still based on the assumption that individual phone sounds pronounced in sequence by a computer will sound anything remotely like human speech. An assumption that is probably false. The major reason for this becomes clear when you start looking at spectrographs: the target waveforms for each phone are a minority of speech production time, the majority being spent in transition between targets. Delete all that transitional time and the result is not likely to sound much like human speech.

So an IPA-to-speech program really does need to know how to transition from each phone's target position to that of each other phone, you can't realistically ignore that and still have a useful program. And it's a great deal easier said than done - if N is the number of phones your program knows about, it has to handle N^2 transition patterns... so if we ignore diacritics and just assume there are 135 basic IPA symbols (I counted up the charts but can't find a listed figure), there are 135*135 = 18,225 possible sequences of one IPA symbol with another. And that is ignoring the fact that languages differ substantially on how they transition from one phone to the next! You'd have to either pick one (which would result in a language-biased program) or else examine how each transition is handled in multiple languages and average the patterns somehow.

Posted: **Wed Jan 05, 2011 5:18 pm**

But couldn't you use, like, articulatory synthesis or something?

I swear I said this above

Posted: **Sat Jan 08, 2011 10:49 pm**

Bedelato wrote:He's not asking for perfectly precise renderings.

Exactly. If I needed precise renderings of a language into audio, I'd ask a native speaker... or the creator, if it was a conlang.

But something that could do basic phonetics would be really nice. 50 common phones + 100 common diphones would be great.

zompist bboard

IPA-to-speech?

IPA-to-speech?

Re: IPA-to-speech?

Re: IPA-to-speech?

Re: IPA-to-speech?

Re: IPA-to-speech?

Re: IPA-to-speech?

Re: IPA-to-speech?

Re: IPA-to-speech?

Re: IPA-to-speech?

Re: IPA-to-speech?

Re: IPA-to-speech?

Re: IPA-to-speech?

Re: IPA-to-speech?