ConlangDictionary 0.3 - now phonology parsing is faster

faiuwle · Post by **faiuwle** » Thu Mar 19, 2009 2:18 pm

Sorry for leaving this for so long - I rewrote the backend to use SQLite, and then got stuck and took an extended break from working on it. I'm back, though! Since I've been working on this by myself for a long time with only me to test it, it's probably full of bugs. That's why it's version 0. If you find one, let me know about it here and I'll fix it up. (For the record, this is just over 7000 lines of code, now.)

Ahem... ConlangDictionary is a program for keeping track of dictionaries and (eventually) grammar offline. The current version is 0.3 March 19 2014. This is released under GPL.
Changelog

Latest updates:
- Fixed bug with word tab for imported XML databases.
- Made phonology parsing much faster on the phonotactics tab.

Download
Windows
Linux
Mac (Thanks, Skomakar'n!) (Version 0.2, requires installation of Qt 4.8 to run)
Source Code

The mac version updating is dependent on someone else downloading the sources to their mac and compiling them with Qt, since I don't have a mac.

If this is your first time downloading 0.3, you also need:
Qt Libraries
for Windows (5.1 MB): Unzip the dlls into the same directory where you put the executable.
for Linux (4.9 MB): If you have root privileges, unzip the files into /usr/lib and delete executablename.sh. If you don't, unzip them into the directory where the executable is, rename executablename.sh to ConlangDictionary.sh and run the shell script instead of the executable.

Documentation

Screenshots

In order to import your dictionary from the last version, create a new dictionary and then go to File -> Import -> Load from XML.

Major Changes From the Previous Version
1. You no longer need to save manually. In general, if there is a button that says "Submit", you need to click it in order to save your changes on that panel/dialog/etc., but if it says "Done" it only exists to hide the current dialog. Additionally, in order to start working, you'll have to create a new dictionary first, even if you're just importing.

2. The phonology parsing is now rather slow, unfortunately, and I'm not sure how to fix that as it seems to be the SQL queries.

3. There's now a menu system!

4. As I mentioned, the savefile is now an SQLite3 database. The schema is here in case you want to browse it using SQLite3.

5. There are features! You can create natural classes of phonemes or words and define your phonotactics using them! This is how they work: First, you click on one of the "Manage features" buttons, which will bring you to a dialog like this:

You can define univalent features (which have no subfeatures), binary features (which have + and - as subfeatures) and feature groups, which can have whatever you want as subfeatures. (If you add subfeatures to a different type of feature, it just becomes a feature group.) So you can have +/-consonantal, or you can have Type: consonant. The subfeature display format combo box on the upper right determines how subfeatures are displayed for that feature. You can also add a parent feature (which will be required for that feature to be allowed on a phoneme or word). Since the feature system replaces the previous type system for words, your word types will be imported as feature groups, kind of like this:

Next, go to "Manage natural classes" and get a dialog like this:

You can create classes and add features to them. For phonemes, those classes will then show up on the Phonotactics page. For words, those classes will be listed next to them in the list. For both, what classes a phoneme/word falls into will be listed in bold above the feature list.

6. There's also a new edit phonology dialog, which looks like this:

The spaces are separating Onset from Peak from Coda. Let me know how it works!

finlay · Post by **finlay** » Thu Mar 19, 2009 2:21 pm

Cool, would it work on macs?

Suggestions would include the obvious, like allophony/sound changes, checking phonotactics to make sure invalid combinations of phonemes aren't used. What's the difference between ‹u› and ‹ú› in the example above; is one stressed or long while the other isn't? Can the program implement things like that?

faiuwle · Post by **faiuwle** » Thu Mar 19, 2009 2:24 pm

It is written using Qt, which is a cross-platform API, and they do in fact have a Mac version. So in theory, if someone using a Mac were to download my source code, and then download the Mac version of Qt and install it, and then compile my source code using said Mac version of Qt, the result would be a Mac executable. I do actually plan to make a Windoze executable by exactly that process, but I don't have a Mac to compile the source code on, so someone else would have to do that.

Qwynegold · Post by **Qwynegold** » Fri Mar 20, 2009 12:31 pm

There's gonna be a Windows version which can be downloaded without having to install anything else? In that case, do you know about SIL Lexique Pro? Could you make it possible to convert that program's files into your program? Phonotactics check like finlay wrote would be great! And the possibility to apply sound changes to generate words into other languages would be awsome! Oh, and being able to view several conlangs at once would also be great!

Sorry for giving tons of suggestion which would take a lot of work to realize.

vec · Post by **vec** » Fri Mar 20, 2009 12:41 pm

Would you be able to add further information on the word? Such as etymology notes, case alignment rules and so forth. In that case, it would be extremely helpful for me.

faiuwle · Post by **faiuwle** » Fri Mar 20, 2009 1:05 pm

I somehow managed to read only half of Finlay's post. Oops. ::oops::

finlay wrote:Suggestions would include the obvious, like allophony/sound changes, checking phonotactics to make sure invalid combinations of phonemes aren't used.

I like the idea, but that's actually kind of far down on my list right now, as I want to focus on morphology so that I don't have to keep extensive notes on how everything inflects. No doubt this will require a more complex representation of the phonology, which will be added to as needed. If there's enough interest, though, I'll see if I can put something together.

What's the difference between ‹u› and ‹ú› in the example above; is one stressed or long while the other isn't? Can the program implement things like that?

Yes, it's stress. I definitely want to involve stress/tone in the program - that may actually be the next major thing I add to it.

There's gonna be a Windows version which can be downloaded without having to install anything else?

There will be some hefty dlls in the zipfile (previous programs I've done this way resulted in ~5MB of dlls, zipped

) but other than extracting everything into the same directory, no, you won't have to install anything else.

In that case, do you know about SIL Lexique Pro?

Not really, other than what a quick google shows me. I'll check it out.

Could you make it possible to convert that program's files into your program?

Unless the savefile format is something very simple (like human-readable CSV or something) I don't think I could convert that, but if you can coerce it to output your wordlist in some semi-organized way (e.g. a text file with lines that go word, (POS), "Definition" or something along those lines) than I can definitely import something like that.

And the possibility to apply sound changes to generate words into other languages would be awsome! Oh, and being able to view several conlangs at once would also be great!

This is also part of the Grand Master plan, although IIRC there is already a sound change applier program, isn't there? I definitely want to make so that you can link entries in different dictionaries together to trace etymologies, but that is way far in the future right now.

Would you be able to add further information on the word? Such as etymology notes, case alignment rules and so forth.

Etymology is still to come; for case alignment rules, I am going to add a "Subtype" field in addition to "Type", where you can specify things like that, e.g. "this is a verb (Type) that takes a certain configuration of arguments (Subtype). Type and Subtype will later be used to make rules about how compounds and inflections and whatnot are formed too, so you can add Subtypes for different noun classes and create different rules for each declension. Types and Subtypes are essentially just strings that you enter than the program remembers, so you're not confined to the tradition "noun/verb/adjective" set.

Until I get the cross-dictionary-linking madness done, you can always put additional information in the definition space.

Wycoval · Post by **Wycoval** » Fri Mar 20, 2009 1:11 pm

Suprasegmental contrasts: timing, tone, stress, length, etc.

faiuwle · Post by **faiuwle** » Fri Mar 20, 2009 1:14 pm

What would you want to specify for timing, specifically?

finlay · Post by **finlay** » Fri Mar 20, 2009 1:19 pm

faiuwle wrote:I somehow managed to read only half of Finlay's post. Oops. ::oops::

That's my fault really; I edited it without realising that you'd replied.

faiuwle · Post by **faiuwle** » Fri Mar 20, 2009 1:26 pm

Ahh - I forgot where the "this was post was edited at XX:XX:XX" line showed up on this forum. I've clearly been away too long.

Qwynegold · Post by **Qwynegold** » Fri Mar 20, 2009 1:37 pm

faiuwle wrote:
And the possibility to apply sound changes to generate words into other languages would be awsome! Oh, and being able to view several conlangs at once would also be great!
This is also part of the Grand Master plan, although IIRC there is already a sound change applier program, isn't there? I definitely want to make so that you can link entries in different dictionaries together to trace etymologies, but that is way far in the future right now.

I know there are some such programs, but this way you wouldn't have to copy a word from your program, past it to the sound change program, make it process the word, then copy the output and paste it back to your program.

faiuwle wrote:Unless the savefile format is something very simple (like human-readable CSV or something) I don't think I could convert that, but if you can coerce it to output your wordlist in some semi-organized way (e.g. a text file with lines that go word, (POS), "Definition" or something along those lines) than I can definitely import something like that.

Hmm, I'll check what the files look like.....

Wycoval · Post by **Wycoval** » Fri Mar 20, 2009 1:59 pm

faiuwle wrote:What would you want to specify for timing, specifically?

http://en.wikipedia.org/wiki/Isochrony

Surely there are conlangs that have contrastive syllable or mora timing - having both long vowels and double vowels but with differing ablaut patterns or tone sandhi, for example. Maybe not.

Anyway, yeah. Suprasegmentals.

faiuwle · Post by **faiuwle** » Fri Mar 20, 2009 2:03 pm

I'm not quite sure what the difference between long vowels and double vowels is in this case, but I'd assume that this could just be expressed as a suprasegmental length feature in the one case and as /aa/ or something in the second case, right?

But yeah, length/tone/stress will all get done.

Cedh · Post by **Cedh** » Fri Mar 20, 2009 4:02 pm

Qwynegold wrote:
faiuwle wrote:
And the possibility to apply sound changes to generate words into other languages would be awsome! Oh, and being able to view several conlangs at once would also be great!
This is also part of the Grand Master plan, although IIRC there is already a sound change applier program, isn't there? I definitely want to make so that you can link entries in different dictionaries together to trace etymologies, but that is way far in the future right now.
I know there are some such programs, but this way you wouldn't have to copy a word from your program, past it to the sound change program, make it process the word, then copy the output and paste it back to your program.

bricka once made a vocabulary manager (called "qclover") which implemented his own Sound Change Applier (of which he just released an improved version). Unfortunately, the qclover page is not online anymore (and the program didn't work on my computer either, although his SCA does work). Maybe you could contact him to ask whether you could build on his code, or something like that. (But since he's not on the board anymore, try e-mail rather than PM.)

faiuwle wrote:
In that case, do you know about SIL Lexique Pro?
Not really, other than what a quick google shows me. I'll check it out.

Also, have a look at SIL Toolbox, which operates similarly to Lexique Pro, but is much more powerful. It's a bit clumsy to handle, but I think it is capable of most of what you're trying to implement (it can even do semi-automated glosses if you've entered the relevant morphology info).

Ambessalion · Post by **Ambessalion** » Fri Mar 20, 2009 4:25 pm

is this program customizable for any language? or you built it specifically for your own?

faiuwle · Post by **faiuwle** » Fri Mar 20, 2009 7:01 pm

Yes, Ambssalion, the idea is that it should be able to deal with any conlang.

Thanks for those links, cedh - I'll check out that SIL program, now that I'm running Windows.

faiuwle · Post by **faiuwle** » Sat Mar 21, 2009 3:45 pm

Alright, I uploaded the program. It doesn't have suprasegmentals yet, but hopefully it is still useful. This could probably be considered version 0.1. Let me know if there are problems.

kelsavasi · Post by **kelsavasi** » Sun Mar 22, 2009 9:02 am

faiuwle wrote:I thought I would post here and ask if anyone else would be interested in using this program, and if so, what formats of word-lists they would like to be able to import into it, etc.

You probably should create at least a CSV export (one file for phonology, another for the word list) because it's easy to transform into another format. Personally, I'd also like an HTML export (XHTML compatible: e.g. no writing "<HEAD>" instead of "<head>") into a definition list - maybe like this:

Code: Select all

<dl>
   <dt>fácia /fatʃia/</dt>
   <dd>(noun) snow</dd>
   <dd>(verb, intransitive) it snows</dd>
</dl>

Also, could you make it so that the words are sorted in custom order instead of the order of the latin alphabet? Since you already have the option to change the order of phonemes, it probably would be easiest to sort them in the order the phonemes are given.

While seconding phonotactics, it should be noted that there also are orthographic constraints, e.g. how in English /k/ can be written <c> or <k> and /s/ can be written <s> or <c>, but /ke/ cannot be written <ce> which usually means /se/, and /s/ can be written <c> only in this context.

Qwynegold · Post by **Qwynegold** » Sun Mar 22, 2009 11:14 am

Qwynegold wrote:
faiuwle wrote:Unless the savefile format is something very simple (like human-readable CSV or something) I don't think I could convert that, but if you can coerce it to output your wordlist in some semi-organized way (e.g. a text file with lines that go word, (POS), "Definition" or something along those lines) than I can definitely import something like that.
Hmm, I'll check what the files look like.....

Lexique Pro makes several kinds of files, but I think this is the only one needed for this purpose. I don't know what file format it is (even left clicking and choosing "Properties" won't tell what it is, except that it's a "database file"). The icon has a green and an orange gearwheel in it.

Lexique Pro wrote:\_sh v3.0 400 MDF
\_DateStampHasFourDigitYear

\lx 'da
\ps Pron
\ge what
\gn M; F
\ag Y
\sd Round; Long; Warm; Cold; Shiny; Thin; Book; Equipment; Vital; Tree; Plant; People; Bird; Fish; Reptile; Bug; Carnivore; Herbivore

\xv
\xe

\dt 12-mar-2009

\lx 'dabho
\ps Adv
\ge how

\xv
\xe

\dt 15-feb-2009

\lx 'daga
\ps Pron
\ge what
\de what (round object)
\gn F
\ag Y
\sd Round

\xv
\xe

\dt 12-mar-2009

\lx 'da-gba
\ps Pron
\ge where
\gn M; F
\ag O; Y
\sd Round; Long; Warm; Cold; Shiny; Thin; Book; Equipment; Vital; Tree; Plant; People; Bird; Fish; Reptile; Bug; Carnivore; Herbivore
\ev Vital by default.

\xv
\xe

\dt 15-feb-2009

This is a sample from one of my conlangs, I only took the first little part of it because it was really long. There's a lot of different "Paradigm markers", the one's in this sample are:
\lx = the word in the conlang
\ps = part of speech
\ge = English gloss
\de = English definition. The difference between this and \ge is that \ge is what's shown in the list of all words, but when you click it \de is displayed. For example, for one of the words I have "\ge what" and "\de what (round object)", because otherwise the user would have to type in all that, including the things in the parenthesis, when searching for a word, for this particular word to show up.
\gn and \ag = These are things that I've defined myself (gender and age), so these are not paradigm markers found in the program by default.
\sd = semantic domain. This is used for creating lists of words that belong together, for example words for different plants, words for different animals, words for different types of clothing, words for kin-terms, etc. But in this case I used it for defining noun classes.
\xv = example sentence
\xe = English translation of the example sentence
\dt = date when the word was added to the lexicon

Qwynegold · Post by **Qwynegold** » Sun Mar 22, 2009 11:18 am

cedh audmanh wrote:
Qwynegold wrote:
faiuwle wrote:
And the possibility to apply sound changes to generate words into other languages would be awsome! Oh, and being able to view several conlangs at once would also be great!
This is also part of the Grand Master plan, although IIRC there is already a sound change applier program, isn't there? I definitely want to make so that you can link entries in different dictionaries together to trace etymologies, but that is way far in the future right now.
I know there are some such programs, but this way you wouldn't have to copy a word from your program, past it to the sound change program, make it process the word, then copy the output and paste it back to your program.
bricka once made a vocabulary manager (called "qclover") which implemented his own Sound Change Applier (of which he just released an improved version). Unfortunately, the qclover page is not online anymore (and the program didn't work on my computer either, although his SCA does work). Maybe you could contact him to ask whether you could build on his code, or something like that. (But since he's not on the board anymore, try e-mail rather than PM.)

Ah, I don't know anything about programming.

cedh audmanh wrote:
faiuwle wrote:
In that case, do you know about SIL Lexique Pro?
Not really, other than what a quick google shows me. I'll check it out.
Also, have a look at SIL Toolbox, which operates similarly to Lexique Pro, but is much more powerful. It's a bit clumsy to handle, but I think it is capable of most of what you're trying to implement (it can even do semi-automated glosses if you've entered the relevant morphology info).

Huh, I've never understood what those Toolbox or Shoebox programs are, the description on SIL's webpage is rather diffuse. So Toolbox is essentially like Lexique Pro except it has more functions?

Cedh · Post by **Cedh** » Sun Mar 22, 2009 6:05 pm

Qwynegold wrote:Ah, I don't know anything about programming.

My suggestion regarding the code was primarily directed at faiuwle. Sorry for this confusion.

Qwynegold wrote:Huh, I've never understood what those Toolbox or Shoebox programs are, the description on SIL's webpage is rather diffuse. So Toolbox is essentially like Lexique Pro except it has more functions?

Toolbox is a software intended to help fieldworkers in documenting indigenous languages. It can create and manage customized vocabulary databases, store text samples, and assist in creating interlinear glosses for these texts through a specialized morphology analysis tool. There are some additional functions which I didn't try when I was testing the program, such as creating an automatic rough grammar sketch (not that such a sketch would be terribly useful: the example file looks a bit strange, but hey, AI just isn't quite there yet).

Lexique Pro is basically the vocabulary management portion of Toolbox with a different user interface, and without some of the more sophisticated functions and customization options. The file format (UTF-8 plaintext file with .db suffix, Windows-style line feeds, no BOM) and database settings are more or less identical though.

vampireshark · Post by **vampireshark** » Sun Mar 22, 2009 6:28 pm

Faiulwe: When I insert IPA and accented characters into the program, save it, close it, and bring it back up, some of the characters are replaced by question marks. Bummer.

Qwynegold · Post by **Qwynegold** » Mon Mar 23, 2009 12:52 pm

cedh audmanh wrote:
Qwynegold wrote:Ah, I don't know anything about programming.
My suggestion regarding the code was primarily directed at faiuwle. Sorry for this confusion.

Oh, OK.

cedh audmanh wrote:
Qwynegold wrote:Huh, I've never understood what those Toolbox or Shoebox programs are, the description on SIL's webpage is rather diffuse. So Toolbox is essentially like Lexique Pro except it has more functions?
Toolbox is a software intended to help fieldworkers in documenting indigenous languages. It can create and manage customized vocabulary databases, store text samples, and assist in creating interlinear glosses for these texts through a specialized morphology analysis tool. There are some additional functions which I didn't try when I was testing the program, such as creating an automatic rough grammar sketch (not that such a sketch would be terribly useful: the example file looks a bit strange, but hey, AI just isn't quite there yet).

Lexique Pro is basically the vocabulary management portion of Toolbox with a different user interface, and without some of the more sophisticated functions and customization options. The file format (UTF-8 plaintext file with .db suffix, Windows-style line feeds, no BOM) and database settings are more or less identical though.

Oh, so I don't need to download that program then, because all I need is a dictionary program?

faiuwle · Post by **faiuwle** » Mon Mar 23, 2009 2:46 pm

kelsavasi wrote:You probably should create at least a CSV export (one file for phonology, another for the word list) because it's easy to transform into another format. Personally, I'd also like an HTML export (XHTML compatible: e.g. no writing "<HEAD>" instead of "<head>") into a definition list - maybe like this:
Code: Select all
<dl>
   <dt>fácia /fatʃia/</dt>
   <dd>(noun) snow</dd>
   <dd>(verb, intransitive) it snows</dd>
</dl>

That's probably a good idea. I haven't done a lot with (X)HTML, but Qt actually has an XHTML module, so it shouldn't be that hard to figure out. I do have an open-source CSV library that I've used for other things, so that should be easy enough.

Also, could you make it so that the words are sorted in custom order instead of the order of the latin alphabet? Since you already have the option to change the order of phonemes, it probably would be easiest to sort them in the order the phonemes are given.

Sure, although the ordering of the phonemes actually serves the specific purpose of disambiguation at the moment - that is, if you have /s S h/ written <s sh h> and you want <sh> to be interpreted as /S/ and not /sh/, you move the phonemes around so that /S/ comes before /s/ in the list. I can put in a way for you to specify alphabetical order of the phonemes, though, and then alphabetize by that.

While seconding phonotactics, it should be noted that there also are orthographic constraints, e.g. how in English /k/ can be written <c> or <k> and /s/ can be written <s> or <c>, but /ke/ cannot be written <ce> which usually means /se/, and /s/ can be written <c> only in this context.

I'll see what I can do for that too.

Qwynegold wrote:Lexique Pro output

Thanks - that seems straightforward enough.

vampireshark wrote:Faiulwe: When I insert IPA and accented characters into the program, save it, close it, and bring it back up, some of the characters are replaced by question marks. Bummer.

Where are you inserting the IPA, and what IPA characters are you having problems with? I haven't used many in my dictionary, but they all seem to load/save properly. It might have something to do with the default font for displaying with applications - does changing that help? There is actually bug right now that I just discovered where I forgot to actually load custom-specified phonologies (whoops); that will be fixed next version.

faiuwle · Post by **faiuwle** » Mon Mar 23, 2009 9:13 pm

kelsavasi wrote:You probably should create at least a CSV export (one file for phonology, another for the word list) because it's easy to transform into another format. Personally, I'd also like an HTML export (XHTML compatible: e.g. no writing "<HEAD>" instead of "<head>") into a definition list - maybe like this:
Code: Select all
<dl>
   <dt>fácia /fatʃia/</dt>
   <dd>(noun) snow</dd>
   <dd>(verb, intransitive) it snows</dd>
</dl>

Right, so, I've refreshed my pathetic HTML skills, and one serious issue with this is converting various unicode characters (á ë ñ î etc. as well as 90% of the IPA) into HTML entities. One thing I am certain of is that I am not going to do it, because I am not a walking database of HTML entities codes. Do you know of a C/C++ library for converting unicode into HTML? I know there are several online apps written in javascript or php that do it, but that wouldn't really help me. Maybe I will just recommend that the end-user copy and paste the raw HTML into one of those apps.