ConlangDictionary 0.3 - now phonology parsing is faster

Substantial postings about constructed languages and constructed worlds in general. Good place to mention your own or evaluate someone else's. Put quick questions in C&C Quickies instead.
Post Reply
David McCann
Sanci
Sanci
Posts: 25
Joined: Thu Mar 16, 2006 12:27 pm
Location: London

Post by David McCann »

faiuwle wrote:Right, so, I've refreshed my pathetic HTML skills, and one serious issue with this is converting various unicode characters (á ë ñ î etc. as well as 90% of the IPA) into HTML entities. One thing I am certain of is that I am not going to do it, because I am not a walking database of HTML entities codes.
HTML supports unicode these days; just specify it in <head></head> with

<meta http-equiv="Content-Type" content="text/html charset=UTF-8" />

The only entities you need, for obvious reasons, are < and >

User avatar
faiuwle
Avisaru
Avisaru
Posts: 512
Joined: Mon Feb 12, 2007 12:26 am
Location: MA north shore

Post by faiuwle »

Ah, awesome. That makes it lots easier, then. :)

User avatar
Trajan
Niš
Niš
Posts: 13
Joined: Tue Nov 18, 2008 3:39 pm
Location: Freising (Germany) and Bath (UK)

Post by Trajan »

I haven't been following this forum, so I don't know if someone has already mentioned this. Please be patient with me.
  • What is the "alphabetic/parsing order"-function for?
  • Also, I seem to have Daquarious P. McFizzle's problem: the program won't accept a load of IPA characters: ɸ, ʃ, ɹ, ʋ, ʉ, ʌ, and ɛ, to name a few. I don't like X-SAMPA. Any ideas what else I could do?
[color=red]Economic Left/Right: -5.12
Social Libertarian/Authoritarian: -4.87[/color]
[img]http://www.cwnc.net/my/images/cwnc-mod.png[/img]

User avatar
faiuwle
Avisaru
Avisaru
Posts: 512
Joined: Mon Feb 12, 2007 12:26 am
Location: MA north shore

Post by faiuwle »

Trajan wrote:What is the "alphabetic/parsing order"-function for?
It allows you to specify two different orders for the phonemes. "Parsing order" is the order than the parser will consider them in when trying to decode your orthography, which basically means that you should put your digraphs first if they could also be parsed as two separate phonemes. "Alphabetic order" will affect the order that the words are displayed on the word tab. Use the up/down arrows to change the orders when you have the correct radio button ticked.
Also, I seem to have Daquarious P. McFizzle's problem: the program won't accept a load of IPA characters: ɸ, ʃ, ɹ, ʋ, ʉ, ʌ, and ɛ, to name a few. I don't like X-SAMPA. Any ideas what else I could do?[/list]
I still don't know why this doesn't work for some people. You can, however, use any text (as long as it doesn't have spaces in it) for the names of your phonemes, so you're not just restricted to XSAMPA. I don't know if that actually helps at all, though. :| (Sorry)

User avatar
Trajan
Niš
Niš
Posts: 13
Joined: Tue Nov 18, 2008 3:39 pm
Location: Freising (Germany) and Bath (UK)

Post by Trajan »

faiuwle wrote:It allows you to specify two different orders for the phonemes. "Parsing order" is the order than the parser will consider them in when trying to decode your orthography, which basically means that you should put your digraphs first if they could also be parsed as two separate phonemes. "Alphabetic order" will affect the order that the words are displayed on the word tab. Use the up/down arrows to change the orders when you have the correct radio button ticked.
Ah, right! Thanks!
faiuwle wrote:I still don't know why this doesn't work for some people. You can, however, use any text (as long as it doesn't have spaces in it) for the names of your phonemes, so you're not just restricted to XSAMPA. I don't know if that actually helps at all, though. :| (Sorry)
Oh, well. I'll just have to see what I can do. But thanks anyway!
[color=red]Economic Left/Right: -5.12
Social Libertarian/Authoritarian: -4.87[/color]
[img]http://www.cwnc.net/my/images/cwnc-mod.png[/img]

User avatar
Torco
Smeric
Smeric
Posts: 2372
Joined: Thu Aug 30, 2007 10:45 pm
Location: Santiago de Chile

Post by Torco »

I hate to be a bitch about this, but do you think we can expect the ability to export dictionary files to, say, a spreadsheet, or some other sound-change-friendly format? :D :D :D :D :D

User avatar
faiuwle
Avisaru
Avisaru
Posts: 512
Joined: Mon Feb 12, 2007 12:26 am
Location: MA north shore

Post by faiuwle »

Yeah - the next version is really just waiting on the phonotacticless parsing, but I'm busy and uninspired (but mostly busy). I think I might just upload the latest version anyway, and if you disable phonotactics you'll just have to manually set the phonologies for all your words. :P I'll do that tomorrow, and then hopefully figure out what I want to do with the parser over the weekend maybe.

CaesarVincens
Lebom
Lebom
Posts: 204
Joined: Thu Feb 25, 2010 7:26 pm

Post by CaesarVincens »

First, I want to thank you for this great tool.
Second, I want to report a bug, if one deletes a phoneme but hasn't already deleted that phoneme from the phonotactics section, the program will crash when trying to remove it.
Third, I want to know how to import and export to text (if that is available in 0.2), as I can't figure it out.

User avatar
faiuwle
Avisaru
Avisaru
Posts: 512
Joined: Mon Feb 12, 2007 12:26 am
Location: MA north shore

Post by faiuwle »

CaesarVincens wrote:First, I want to thank you for this great tool.
Second, I want to report a bug, if one deletes a phoneme but hasn't already deleted that phoneme from the phonotactics section, the program will crash when trying to remove it.
Thanks - I'll fix that before I upload tomorrow.
Third, I want to know how to import and export to text (if that is available in 0.2), as I can't figure it out.
You can't yet, but you will be able to in the 0.2 I'm uploading tomorrow. I'll post more detailed documentation then.

(The number just refers to the savefile - if you want an unambiguous version reference, you also need to note the date it was uploaded, which should be the last-edited date on the executable. This is how it will be referenced in the changelog.)

CaesarVincens
Lebom
Lebom
Posts: 204
Joined: Thu Feb 25, 2010 7:26 pm

Post by CaesarVincens »

Thanks for the quick reply and the explanations.

Oh, and if you do get to making a sound change portion or separate program, I will certainly have some recommendations.

User avatar
faiuwle
Avisaru
Avisaru
Posts: 512
Joined: Mon Feb 12, 2007 12:26 am
Location: MA north shore

Post by faiuwle »

Ok, a new version is up now.
changelog wrote:2/26/10
- Wordlists can now be loaded from or saved to a text or CSV file.
- Partial wordlists can now be loaded from existing dictionary files.
- Language name is now stored in savefile (set it on the last tab)
- It is now possible to opt out of phonotactical restrictions on the phonotactics
page.
- Sanity check: Attempting to generate more than 1000 clusters on the phonotactics
page now results in an error message (if you need more than about 500, you're probably
better off just opting out of phonotactics restrictions anyway).
- The dialog for choosing sequences of phonemes has been optimized for displaying
larger number numbers of sequences (YMMV based on your screen size).
- Deleting phonemes and suprasegmentals that are referenced in many places will now result
in all references being properly removed rather than a crash.
NB: The last line is not quite true - I noticed, after I had finished compiling in Windows and packaging everything up, that I'd forgotten to fix a file. All it means is that for now, after you delete phonemes, you should go re-save the phonotactics before doing anything else (in particular, regenning word phonologies, though loading wordlists will probably also crash it). I'd upload the really really fixed version instead, but since I don't have access to Windows right now, it'd just be in linux, and I want to avoid too much version confusion, if possible.

Anyway, new features:
1. You can turn off phonotactics. Note that this will cause all of your words to be phonology-less (for now). Also note that after turning off the phonotactics on the phonotactics tab, you will have to actually click "save phonotactics" for it to take effect.

2. You can save to text files. CD will write each word entry to each line of the file (though if you have newlines in your descriptions, those will be preserved). The formula entered in the formula box determines what is written - /w means the spelling of the word, /t means the type, /s means the subtype, /p means the phonology (the text that's displayed on the word tab) and /d means the description. Any other character will be written literally. Using the formula "/w","/p","/t","/s","/d" (yes, with all the quotes and commas) will produce a CSV file that can be read by excell (though you may have to change the extension to .csv).

3. You can read wordlists from text files. The formula you enter for that is essentially the same as the above (ETA: except for /p - phonology can't be directly read from .txt files), but characters other than the forward-slash-escaped characters and parentheses will be interpreted as a regular expression. Non-punctuation characters will still be interpreted literally, though, so that's not really relevant unless you actually want to make use of regexp functionality. Note, however, that quantifiers don't seem to work with the forward-slash-terms (e.g. "/s?" will match the subtype followed by a question mark, not an optional subtype). There might be other quirks to it too, especially in Windows. CD will spawn a pop-up telling you how many lines of the file were actually matched, and if it turns out to be 0 or something, you can cancel.

4. You can import wordlists from other .cdic files. The phonologies of these words will be ignored, and they will be reinterpreted based on their spelling (this is because there's no way to guarantee that the phonologies correspond, at the moment). You should be able to select consecutive words simultaneously by pressing shift and either scrolling with arrow keys, or clicking, and non-consecutive words with ctrl-click.

5. You can name your language on the last tab.

Let me know how all of the new GUI stuff looks.
Last edited by faiuwle on Fri Feb 26, 2010 3:34 pm, edited 1 time in total.

CaesarVincens
Lebom
Lebom
Posts: 204
Joined: Thu Feb 25, 2010 7:26 pm

Post by CaesarVincens »

Hi faiuwle,
I'm having importing from text or csv.
I tried using the exact same formula as I used for export (which works perfectly :D), but it reads 0 out of x lines every time. Even if I try to import from the file I just exported. :?


Edit:
Also, it would nice if one diacritic could be used for two suprasegmentals.
So that a circumflex might mean both long and stressed for example.
Last edited by CaesarVincens on Fri Feb 26, 2010 3:27 pm, edited 1 time in total.

User avatar
faiuwle
Avisaru
Avisaru
Posts: 512
Joined: Mon Feb 12, 2007 12:26 am
Location: MA north shore

Post by faiuwle »

Well, if you have newlines in the descriptions, then they're not going to be readable back in because the entries are no longer on only one line. This is why I use XML for the actual savefiles. What formula are you using? Does the txt file look right when you open it in an editor?

CaesarVincens
Lebom
Lebom
Posts: 204
Joined: Thu Feb 25, 2010 7:26 pm

Post by CaesarVincens »

No new lines.

I made a exported wordlist with the formula you gave as an example: "/w","/p","/t","/s","/d"

Then when I tried to import the same file with the same formula the program read 0 out of 31 lines.

User avatar
faiuwle
Avisaru
Avisaru
Posts: 512
Joined: Mon Feb 12, 2007 12:26 am
Location: MA north shore

Post by faiuwle »

Oh, that's because I forgot to mention - "/p" can't actually be read in, because it's just a string of text and not actually a full phonological description.

It should work just fine if you substitute the "/p" with .* .

CaesarVincens
Lebom
Lebom
Posts: 204
Joined: Thu Feb 25, 2010 7:26 pm

Post by CaesarVincens »

Works great. Thanks.


EDIT:

However, it doesn't seem to like importing words with a colon in them...

CaesarVincens
Lebom
Lebom
Posts: 204
Joined: Thu Feb 25, 2010 7:26 pm

Post by CaesarVincens »

Also, if one tries to save after deleting a phoneme from the phonemes without removing it from phonotactics, it causes a crash.

User avatar
Trajan
Niš
Niš
Posts: 13
Joined: Tue Nov 18, 2008 3:39 pm
Location: Freising (Germany) and Bath (UK)

Post by Trajan »

Is there going to be some complete documentation at some point?

It's just rather tiring to search this thread for the appropriate bit of information (though I do understand that it's probably even more tiring for you to write a complete documentation).
[color=red]Economic Left/Right: -5.12
Social Libertarian/Authoritarian: -4.87[/color]
[img]http://www.cwnc.net/my/images/cwnc-mod.png[/img]

User avatar
Torco
Smeric
Smeric
Posts: 2372
Joined: Thu Aug 30, 2007 10:45 pm
Location: Santiago de Chile

Post by Torco »

I've played around with this version and it looks awesome. Lotsa kudos to you, faiuwle, and a big thank you for such a useful tool.

YAAY now I can export my 300-word lexicon... damn, I suck at lexiconmaking... how do people get all the way up to 5000 word lexicons is beyond me =(

User avatar
Torco
Smeric
Smeric
Posts: 2372
Joined: Thu Aug 30, 2007 10:45 pm
Location: Santiago de Chile

Post by Torco »

Trajan wrote:Is there going to be some complete documentation at some point?

It's just rather tiring to search this thread for the appropriate bit of information (though I do understand that it's probably even more tiring for you to write a complete documentation).
I was thinking about putting together a manual for it... it's the least I can do, really.

BTW, it's not necesary to use the "quotes" on formula terms: this

*** /w (/p) : [/d]

produces entries such as this

*** nostrø (nɔs.tɾœ) : [free one, unbound one.

also yeoman.]

User avatar
faiuwle
Avisaru
Avisaru
Posts: 512
Joined: Mon Feb 12, 2007 12:26 am
Location: MA north shore

Post by faiuwle »

I didn't get to recompile anything this weekend - we had emergency lease-breaking to do. You know how it is.
CaesarVincens wrote:However, it doesn't seem to like importing words with a colon in them...
Yeah, I've got it set to be interpreted as \w+ in the regexp. Might work better as .+ if you're going to be using weird punctuation in your words.
Also, if one tries to save after deleting a phoneme from the phonemes without removing it from phonotactics, it causes a crash.
But not if you save the phonotactics first, right?
Is there going to be some complete documentation at some point?
That's the plan.
Torco wrote:I was thinking about putting together a manual for it... it's the least I can do, really.
That would be awesome, actually. If you decide to do that, let me know if you need to know anything, and I'll link or host it from my wiki.
BTW, it's not necesary to use the "quotes" on formula terms: this

*** /w (/p) : [/d]

produces entries such as this

*** nostrø (nɔs.tɾœ) : [free one, unbound one.

also yeoman.]
Yes, but without the quotes it's not strict CSV and won't be interpreted correctly by excell.

CaesarVincens
Lebom
Lebom
Posts: 204
Joined: Thu Feb 25, 2010 7:26 pm

Post by CaesarVincens »

faiuwle wrote:I didn't get to recompile anything this weekend - we had emergency lease-breaking to do. You know how it is.
CaesarVincens wrote:However, it doesn't seem to like importing words with a colon in them...
Yeah, I've got it set to be interpreted as \w+ in the regexp. Might work better as .+ if you're going to be using weird punctuation in your words.
Are you saying for the .exe code or for the import expression that I put in?
Also, if one tries to save after deleting a phoneme from the phonemes without removing it from phonotactics, it causes a crash.
But not if you save the phonotactics first, right?
I'm not sure, I don't think I tried it that way.

Edit:

Partial success. I used "/w.+" and it matched the line up to the colon but didn't include the colon or anything after it in the word part. It did get the other sections.

I'm using colons to represent long vowels as that is more aesthetically pleasing to me than double vowels and more practical than a four way distinction with plain, acute, diaeresis, and circumflex (short, short stressed, long, and long stressed respectively).
Last edited by CaesarVincens on Sun Feb 28, 2010 8:58 pm, edited 1 time in total.

User avatar
faiuwle
Avisaru
Avisaru
Posts: 512
Joined: Mon Feb 12, 2007 12:26 am
Location: MA north shore

Post by faiuwle »

CaesarVincens wrote:Are you saying for the .exe code or for the import expression that I put in?
The .exe code - I'm just thinking aloud, here.

CaesarVincens
Lebom
Lebom
Posts: 204
Joined: Thu Feb 25, 2010 7:26 pm

Post by CaesarVincens »

However, do see my edit to my last post.

Edit:
Also, would it be possible to have a diacritic be used for more than one thing? As it is right now, any one diacritic can be used only one way. I'm not sure how difficult it would be to prevent crash-causing conflicts, but I don't see why a single diacritic be used at least four ways (as there are four phonetic options).

User avatar
faiuwle
Avisaru
Avisaru
Posts: 512
Joined: Mon Feb 12, 2007 12:26 am
Location: MA north shore

Post by faiuwle »

What do you mean? You mean you want to use a circumflex to mark simultaneous high tone and stress, or something? That would be kind of a pain. Parsing problems shouldn't cause crashes though - it just means your word won't be parsed correctly, and might wind up as /???/ at which point you'll just have to input the phonology by hand.

Post Reply