ConlangDictionary 0.3 - now phonology parsing is faster

David McCann · Post by **David McCann** » Tue Feb 23, 2010 10:42 am

faiuwle wrote:Right, so, I've refreshed my pathetic HTML skills, and one serious issue with this is converting various unicode characters (á ë ñ î etc. as well as 90% of the IPA) into HTML entities. One thing I am certain of is that I am not going to do it, because I am not a walking database of HTML entities codes.

HTML supports unicode these days; just specify it in <head></head> with

<meta http-equiv="Content-Type" content="text/html charset=UTF-8" />

The only entities you need, for obvious reasons, are < and >

faiuwle · Post by **faiuwle** » Tue Feb 23, 2010 11:41 am

Ah, awesome. That makes it lots easier, then.

Trajan · Post by **Trajan** » Tue Feb 23, 2010 11:57 am

I haven't been following this forum, so I don't know if someone has already mentioned this. Please be patient with me.

What is the "alphabetic/parsing order"-function for?
Also, I seem to have Daquarious P. McFizzle's problem: the program won't accept a load of IPA characters: ɸ, ʃ, ɹ, ʋ, ʉ, ʌ, and ɛ, to name a few. I don't like X-SAMPA. Any ideas what else I could do?

faiuwle · Post by **faiuwle** » Tue Feb 23, 2010 12:24 pm

Trajan wrote:What is the "alphabetic/parsing order"-function for?

It allows you to specify two different orders for the phonemes. "Parsing order" is the order than the parser will consider them in when trying to decode your orthography, which basically means that you should put your digraphs first if they could also be parsed as two separate phonemes. "Alphabetic order" will affect the order that the words are displayed on the word tab. Use the up/down arrows to change the orders when you have the correct radio button ticked.

Also, I seem to have Daquarious P. McFizzle's problem: the program won't accept a load of IPA characters: ɸ, ʃ, ɹ, ʋ, ʉ, ʌ, and ɛ, to name a few. I don't like X-SAMPA. Any ideas what else I could do?[/list]

I still don't know why this doesn't work for some people. You can, however, use any text (as long as it doesn't have spaces in it) for the names of your phonemes, so you're not just restricted to XSAMPA. I don't know if that actually helps at all, though.

(Sorry)

Trajan · Post by **Trajan** » Tue Feb 23, 2010 12:41 pm

faiuwle wrote:It allows you to specify two different orders for the phonemes. "Parsing order" is the order than the parser will consider them in when trying to decode your orthography, which basically means that you should put your digraphs first if they could also be parsed as two separate phonemes. "Alphabetic order" will affect the order that the words are displayed on the word tab. Use the up/down arrows to change the orders when you have the correct radio button ticked.

Ah, right! Thanks!

faiuwle wrote:I still don't know why this doesn't work for some people. You can, however, use any text (as long as it doesn't have spaces in it) for the names of your phonemes, so you're not just restricted to XSAMPA. I don't know if that actually helps at all, though. (Sorry)

Oh, well. I'll just have to see what I can do. But thanks anyway!

Torco · Post by **Torco** » Wed Feb 24, 2010 3:36 am

I hate to be a bitch about this, but do you think we can expect the ability to export dictionary files to, say, a spreadsheet, or some other sound-change-friendly format?

faiuwle · Post by **faiuwle** » Wed Feb 24, 2010 6:52 pm

Yeah - the next version is really just waiting on the phonotacticless parsing, but I'm busy and uninspired (but mostly busy). I think I might just upload the latest version anyway, and if you disable phonotactics you'll just have to manually set the phonologies for all your words.

I'll do that tomorrow, and then hopefully figure out what I want to do with the parser over the weekend maybe.

CaesarVincens · Post by **CaesarVincens** » Thu Feb 25, 2010 7:51 pm

First, I want to thank you for this great tool.
Second, I want to report a bug, if one deletes a phoneme but hasn't already deleted that phoneme from the phonotactics section, the program will crash when trying to remove it.
Third, I want to know how to import and export to text (if that is available in 0.2), as I can't figure it out.

faiuwle · Post by **faiuwle** » Thu Feb 25, 2010 9:00 pm

CaesarVincens wrote:First, I want to thank you for this great tool.
Second, I want to report a bug, if one deletes a phoneme but hasn't already deleted that phoneme from the phonotactics section, the program will crash when trying to remove it.

Thanks - I'll fix that before I upload tomorrow.

Third, I want to know how to import and export to text (if that is available in 0.2), as I can't figure it out.

You can't yet, but you will be able to in the 0.2 I'm uploading tomorrow. I'll post more detailed documentation then.

(The number just refers to the savefile - if you want an unambiguous version reference, you also need to note the date it was uploaded, which should be the last-edited date on the executable. This is how it will be referenced in the changelog.)

CaesarVincens · Post by **CaesarVincens** » Thu Feb 25, 2010 9:12 pm

Thanks for the quick reply and the explanations.

Oh, and if you do get to making a sound change portion or separate program, I will certainly have some recommendations.

faiuwle · Post by **faiuwle** » Fri Feb 26, 2010 2:46 pm

Ok, a new version is up now.

changelog wrote:2/26/10
- Wordlists can now be loaded from or saved to a text or CSV file.
- Partial wordlists can now be loaded from existing dictionary files.
- Language name is now stored in savefile (set it on the last tab)
- It is now possible to opt out of phonotactical restrictions on the phonotactics
page.
- Sanity check: Attempting to generate more than 1000 clusters on the phonotactics
page now results in an error message (if you need more than about 500, you're probably
better off just opting out of phonotactics restrictions anyway).
- The dialog for choosing sequences of phonemes has been optimized for displaying
larger number numbers of sequences (YMMV based on your screen size).
- Deleting phonemes and suprasegmentals that are referenced in many places will now result
in all references being properly removed rather than a crash.

NB: The last line is not quite true - I noticed, after I had finished compiling in Windows and packaging everything up, that I'd forgotten to fix a file. All it means is that for now, after you delete phonemes, you should go re-save the phonotactics before doing anything else (in particular, regenning word phonologies, though loading wordlists will probably also crash it). I'd upload the really really fixed version instead, but since I don't have access to Windows right now, it'd just be in linux, and I want to avoid too much version confusion, if possible.

Anyway, new features:
1. You can turn off phonotactics. Note that this will cause all of your words to be phonology-less (for now). Also note that after turning off the phonotactics on the phonotactics tab, you will have to actually click "save phonotactics" for it to take effect.

2. You can save to text files. CD will write each word entry to each line of the file (though if you have newlines in your descriptions, those will be preserved). The formula entered in the formula box determines what is written - /w means the spelling of the word, /t means the type, /s means the subtype, /p means the phonology (the text that's displayed on the word tab) and /d means the description. Any other character will be written literally. Using the formula "/w","/p","/t","/s","/d" (yes, with all the quotes and commas) will produce a CSV file that can be read by excell (though you may have to change the extension to .csv).

3. You can read wordlists from text files. The formula you enter for that is essentially the same as the above (ETA: except for /p - phonology can't be directly read from .txt files), but characters other than the forward-slash-escaped characters and parentheses will be interpreted as a regular expression. Non-punctuation characters will still be interpreted literally, though, so that's not really relevant unless you actually want to make use of regexp functionality. Note, however, that quantifiers don't seem to work with the forward-slash-terms (e.g. "/s?" will match the subtype followed by a question mark, not an optional subtype). There might be other quirks to it too, especially in Windows. CD will spawn a pop-up telling you how many lines of the file were actually matched, and if it turns out to be 0 or something, you can cancel.

4. You can import wordlists from other .cdic files. The phonologies of these words will be ignored, and they will be reinterpreted based on their spelling (this is because there's no way to guarantee that the phonologies correspond, at the moment). You should be able to select consecutive words simultaneously by pressing shift and either scrolling with arrow keys, or clicking, and non-consecutive words with ctrl-click.

5. You can name your language on the last tab.

Let me know how all of the new GUI stuff looks.

CaesarVincens · Post by **CaesarVincens** » Fri Feb 26, 2010 3:20 pm

Hi faiuwle,
I'm having importing from text or csv.
I tried using the exact same formula as I used for export (which works perfectly

), but it reads 0 out of x lines every time. Even if I try to import from the file I just exported.

Edit:
Also, it would nice if one diacritic could be used for two suprasegmentals.
So that a circumflex might mean both long and stressed for example.

faiuwle · Post by **faiuwle** » Fri Feb 26, 2010 3:22 pm

Well, if you have newlines in the descriptions, then they're not going to be readable back in because the entries are no longer on only one line. This is why I use XML for the actual savefiles. What formula are you using? Does the txt file look right when you open it in an editor?

CaesarVincens · Post by **CaesarVincens** » Fri Feb 26, 2010 3:30 pm

No new lines.

I made a exported wordlist with the formula you gave as an example: "/w","/p","/t","/s","/d"

Then when I tried to import the same file with the same formula the program read 0 out of 31 lines.

faiuwle · Post by **faiuwle** » Fri Feb 26, 2010 3:32 pm

Oh, that's because I forgot to mention - "/p" can't actually be read in, because it's just a string of text and not actually a full phonological description.

It should work just fine if you substitute the "/p" with .* .

CaesarVincens · Post by **CaesarVincens** » Fri Feb 26, 2010 3:35 pm

Works great. Thanks.

EDIT:

However, it doesn't seem to like importing words with a colon in them...

CaesarVincens · Post by **CaesarVincens** » Sat Feb 27, 2010 1:50 am

Also, if one tries to save after deleting a phoneme from the phonemes without removing it from phonotactics, it causes a crash.

Trajan · Post by **Trajan** » Sun Feb 28, 2010 10:59 am

Is there going to be some complete documentation at some point?

It's just rather tiring to search this thread for the appropriate bit of information (though I do understand that it's probably even more tiring for you to write a complete documentation).

Torco · Post by **Torco** » Sun Feb 28, 2010 1:13 pm

I've played around with this version and it looks awesome. Lotsa kudos to you, faiuwle, and a big thank you for such a useful tool.

YAAY now I can export my 300-word lexicon... damn, I suck at lexiconmaking... how do people get all the way up to 5000 word lexicons is beyond me =(

Torco · Post by **Torco** » Sun Feb 28, 2010 1:17 pm

Trajan wrote:Is there going to be some complete documentation at some point?

It's just rather tiring to search this thread for the appropriate bit of information (though I do understand that it's probably even more tiring for you to write a complete documentation).

I was thinking about putting together a manual for it... it's the least I can do, really.

BTW, it's not necesary to use the "quotes" on formula terms: this

*** /w (/p) : [/d]

produces entries such as this

*** nostrø (nɔs.tɾœ) : [free one, unbound one.

also yeoman.]

faiuwle · Post by **faiuwle** » Sun Feb 28, 2010 8:21 pm

I didn't get to recompile anything this weekend - we had emergency lease-breaking to do. You know how it is.

CaesarVincens wrote:However, it doesn't seem to like importing words with a colon in them...

Yeah, I've got it set to be interpreted as \w+ in the regexp. Might work better as .+ if you're going to be using weird punctuation in your words.

Also, if one tries to save after deleting a phoneme from the phonemes without removing it from phonotactics, it causes a crash.

But not if you save the phonotactics first, right?

Is there going to be some complete documentation at some point?

That's the plan.

Torco wrote:I was thinking about putting together a manual for it... it's the least I can do, really.

That would be awesome, actually. If you decide to do that, let me know if you need to know anything, and I'll link or host it from my wiki.

BTW, it's not necesary to use the "quotes" on formula terms: this

*** /w (/p) : [/d]

produces entries such as this

*** nostrø (nɔs.tɾœ) : [free one, unbound one.

also yeoman.]

Yes, but without the quotes it's not strict CSV and won't be interpreted correctly by excell.

CaesarVincens · Post by **CaesarVincens** » Sun Feb 28, 2010 8:49 pm

faiuwle wrote:I didn't get to recompile anything this weekend - we had emergency lease-breaking to do. You know how it is.

CaesarVincens wrote:However, it doesn't seem to like importing words with a colon in them...
Yeah, I've got it set to be interpreted as \w+ in the regexp. Might work better as .+ if you're going to be using weird punctuation in your words.

Are you saying for the .exe code or for the import expression that I put in?

Also, if one tries to save after deleting a phoneme from the phonemes without removing it from phonotactics, it causes a crash.
But not if you save the phonotactics first, right?

I'm not sure, I don't think I tried it that way.

Edit:

Partial success. I used "/w.+" and it matched the line up to the colon but didn't include the colon or anything after it in the word part. It did get the other sections.

I'm using colons to represent long vowels as that is more aesthetically pleasing to me than double vowels and more practical than a four way distinction with plain, acute, diaeresis, and circumflex (short, short stressed, long, and long stressed respectively).

faiuwle · Post by **faiuwle** » Sun Feb 28, 2010 8:56 pm

CaesarVincens wrote:Are you saying for the .exe code or for the import expression that I put in?

The .exe code - I'm just thinking aloud, here.

CaesarVincens · Post by **CaesarVincens** » Sun Feb 28, 2010 8:59 pm

However, do see my edit to my last post.

Edit:
Also, would it be possible to have a diacritic be used for more than one thing? As it is right now, any one diacritic can be used only one way. I'm not sure how difficult it would be to prevent crash-causing conflicts, but I don't see why a single diacritic be used at least four ways (as there are four phonetic options).

faiuwle · Post by **faiuwle** » Sun Feb 28, 2010 9:58 pm

What do you mean? You mean you want to use a circumflex to mark simultaneous high tone and stress, or something? That would be kind of a pain. Parsing problems shouldn't cause crashes though - it just means your word won't be parsed correctly, and might wind up as /???/ at which point you'll just have to input the phonology by hand.