Automatic Conlang Inflector - Poll for opinions
Posted: Mon Oct 05, 2015 6:33 pm
It's been a long time since I've posted here, mainly due to real life stuff (moved, got a 9-5, trying to finish master's degree), but I still want to keep working on [this](viewtopic.php?f=4&t=30786) - a software tool to assist conlanging by keeping track of your phonology, vocab, grammar, etc. I still kind of want to do it just for myself, but I'd like to hear if people here are still interested in this, and get some opinions on the next stage of development, as I'm trying to work out the nitty gritty on it and feel like the tool would be more useful if I made decisions based on other people's ideas as well.
Basically, this is the part where I want to introduce morphemes separately from words, and rules to combine morphemes together to produce words. What I've come up with so far would work like this:
You have a list of words, which can have arbitrarily complex classifications. They also have a phonological form and a set of morphemes that make them up.
You have a list of morphemes, which can have arbitrarily complex classifications. Like words, they have a "phonological" form, although unlike words, this form is very loose and isn't constrained by your phonological constraints. (The inflector will sort the phonology out when it does its thing.)
Paradigms (think a set of rules for a conjugation or a declension), which contain a class of morpheme or word they can apply to and a set of inflectional rules.
Inflectional rules, which contain two classes of morphemes or words that they combine (one of them will be redundant if they are part of a paradigm, but it will be necessary if you want to apply the rule outside of the paradigm - will get to this later), two expression with regular-expression-like syntax to describe the form of the inputs, one class of word or morpheme to specify what class the output is, and one regular-expression-like expression to specify the form of the output. The expressions can share references, so the input expressions can have parenthesized segments and the output can have references to them - e.g. \m1 and \m2 refer to the entire morphemes, \m1.1 refers to the first parenthesized segment of morpheme 1, or similar. Suprasegmentals could be applied to elements of the expressions. To apply, both the input classes and expressions need to match.
You could use the rules to generate words and morphemes from other morphemes, or you could simply bring up a word and see the results of all relevant inflectional rules in each paradigm. There'd be some sanity checks, since if you weren't careful classifying things you could wind up with huge numbers of rule applications. Ideally, you use paradigms to define a relatively small set of possible rule applications.
To give some examples of how this would work:
Case 1: Simple compounding or agglutination
xxx + yyy = xxxyyy
You define an inflectional rule that takes the proper word/morpheme classes and outputs the proper class, with the expressions for the inputs being "*" and the expression for the output being "\m1\m2". You would only add this to a paradigm if one or both of the input classes were very constrained.
Case 2: Not-so-simple compounding or agglutination
xxxh + yyy = xxxyyy
You define an inflectional rule that has the proper input/output classes, with the expressions for the inputs being "(*)h" and "*", and the expression for the output being "\m1.1\m2".
xxxv + vyyy = xxxvhvyyy
You define an inflectional rule that has the proper input/output classes, with the expressions for the inputs being "*v" and "v*" and the expression for the output being "\m1h\2".
Case 3: Inflection
xxxm + y = xxxnz (or vice versa)
Inputs: "(*)m", "*", Output: "\m1.1nz". Probably you would have this be part of a paradigm.
ccc + yz = cyczc
Inputs: "(c)(c)(c)", "(v)(v)", Output: "\m1.1\m2.1\m1.2\m2.2\m1.3"
Case 4: Suprasegmentals
xxx + y = 'xxx
Inputs: "*", Output: "'\m1"
It would also be possible to not specify a second morpheme in some cases, if you didn't want to model all of your case endings as morphemes, for example, but still wanted to define inflectional rules that only apply to certain kinds of words.
Does that about cover it, or is there something else I'm missing that you would want to do with something like this?
Basically, this is the part where I want to introduce morphemes separately from words, and rules to combine morphemes together to produce words. What I've come up with so far would work like this:
You have a list of words, which can have arbitrarily complex classifications. They also have a phonological form and a set of morphemes that make them up.
You have a list of morphemes, which can have arbitrarily complex classifications. Like words, they have a "phonological" form, although unlike words, this form is very loose and isn't constrained by your phonological constraints. (The inflector will sort the phonology out when it does its thing.)
Paradigms (think a set of rules for a conjugation or a declension), which contain a class of morpheme or word they can apply to and a set of inflectional rules.
Inflectional rules, which contain two classes of morphemes or words that they combine (one of them will be redundant if they are part of a paradigm, but it will be necessary if you want to apply the rule outside of the paradigm - will get to this later), two expression with regular-expression-like syntax to describe the form of the inputs, one class of word or morpheme to specify what class the output is, and one regular-expression-like expression to specify the form of the output. The expressions can share references, so the input expressions can have parenthesized segments and the output can have references to them - e.g. \m1 and \m2 refer to the entire morphemes, \m1.1 refers to the first parenthesized segment of morpheme 1, or similar. Suprasegmentals could be applied to elements of the expressions. To apply, both the input classes and expressions need to match.
You could use the rules to generate words and morphemes from other morphemes, or you could simply bring up a word and see the results of all relevant inflectional rules in each paradigm. There'd be some sanity checks, since if you weren't careful classifying things you could wind up with huge numbers of rule applications. Ideally, you use paradigms to define a relatively small set of possible rule applications.
To give some examples of how this would work:
Case 1: Simple compounding or agglutination
xxx + yyy = xxxyyy
You define an inflectional rule that takes the proper word/morpheme classes and outputs the proper class, with the expressions for the inputs being "*" and the expression for the output being "\m1\m2". You would only add this to a paradigm if one or both of the input classes were very constrained.
Case 2: Not-so-simple compounding or agglutination
xxxh + yyy = xxxyyy
You define an inflectional rule that has the proper input/output classes, with the expressions for the inputs being "(*)h" and "*", and the expression for the output being "\m1.1\m2".
xxxv + vyyy = xxxvhvyyy
You define an inflectional rule that has the proper input/output classes, with the expressions for the inputs being "*v" and "v*" and the expression for the output being "\m1h\2".
Case 3: Inflection
xxxm + y = xxxnz (or vice versa)
Inputs: "(*)m", "*", Output: "\m1.1nz". Probably you would have this be part of a paradigm.
ccc + yz = cyczc
Inputs: "(c)(c)(c)", "(v)(v)", Output: "\m1.1\m2.1\m1.2\m2.2\m1.3"
Case 4: Suprasegmentals
xxx + y = 'xxx
Inputs: "*", Output: "'\m1"
It would also be possible to not specify a second morpheme in some cases, if you didn't want to model all of your case endings as morphemes, for example, but still wanted to define inflectional rules that only apply to certain kinds of words.
Does that about cover it, or is there something else I'm missing that you would want to do with something like this?