ASCA v0.1.6 - NEW
Posted: Thu Jul 01, 2010 12:23 am
As a few of you may have heard, I'm working on a new sound change applier, written in Java. This is partly for my own edification, since I don't have a huge amount of practical programming experience. I'd like to release this as an executable and JAR package, run through the command line and with debug support.
I'm working on some basic features right now, namely Optionals, Variables, and Sets, which I'll explain presently.
The basic rule format is intended to look more like sound-change rules, as they are written normally:
You probably recognize how this works if you've ever used an SCA before. One feature I've already added is how you can handle unconditioned rules:
Rules 2 and 3 do the same thing; rule 4 will result in an error. The Condition can be omitted, but cannot be empty. So far, this stuff works.
Actually, this still doesn't support word boundaries; However, it does allow comment lines starting with # and even inline-comments in ##
My current task is to correctly implement Variables; I did a sloppy job yesterday, and only the first variable in a rule actually got processed. Variables are capital letters and/or numbers preceded by @.
All of these definitions should be valid; if all goes according to plan, you should also be able to put variables into variable definitions:
There is another feature that I think will be both interesting and useful, but will require a lot of debugging: the way things are written now, Rules and Variables are read concurrently, in a single pass of the Rules file. In principal, you should be able to define and redefine Variables on the fly throughout the file, which might be good if you need to redefine @CONSONANT to add /θ ð/ from the lenition of /t̪ d̪/ especially because these have no need to be in the file from the beginning. If this works, I may also permit you to delete variables once you are done with them.
Optionals are strings enclosed by parens (...) indicating that the enclosed string may be present or omitted, as is the standard practice. Unlike in some other SCAs, these can be included in both the Initial and Condition, not just the Condition. an Optional in the Final will produce an error.
Finally, I'd like to allow ad hoc Sets, which basically work like Variables, but are not stored in memory. These are useful where you would otherwise use a Variable, but perhaps only the once.
I would like to allow Optionals and Sets in the Initials, but this is inviting bugs.
Some other features I'd like to add are related to some of these others. One is Many-to-One replacement:
I might also add an Audit feature, which inspects your input lexicon and rules and finds rules which are never used (this usually happens with two or more variables in a digraph or in the Condition). I might also include a Stats mode to do cluster counts. I already have the software written, but I'd need to add it to the commend line.
Also, some real-world rules are hard to run using the standard find-and-replace method that this and other SCAs seem to use. Among these are things like vowel syncopation ( I know Jeff was having trouble with this with MUBA's VSCA ) because they require large strings, which means a huge search-space. Another such problem is Grassmann's law in eastern Indo-European, where an aspirated stop in a root is de-aspirated if another aspirated stop occurs later in the root. Distance assimilation/dissimilation will probably always be a problem. Here are the rules from my VSCA code for doing Grassmann's law in Kuma-Koban:
The numbers to the left indicate the number of individial rules generated by each search, which I actuallyl trimmed down substantially, because it was such a runtime hog. It means the program has to search each word for 2496 different strings.
I think I know how I could potentially solve this using regular expressions (built in, of course; I want this to be relatively user-friendly), but this will require a totally different processing mechanism than the other rules and will at least have to wait until I finish getting the basics done.
I'll try to make an alpha of some kind available as soon as I can.
I'm working on some basic features right now, namely Optionals, Variables, and Sets, which I'll explain presently.
The basic rule format is intended to look more like sound-change rules, as they are written normally:
Code: Select all
1. a b c > x y z / P_SCode: Select all
2. a b c > x y z / _
3. a b c > x y z
4. **a b c > z y z /Actually, this still doesn't support word boundaries; However, it does allow comment lines starting with # and even inline-comments in ##
My current task is to correctly implement Variables; I did a sloppy job yesterday, and only the first variable in a rule actually got processed. Variables are capital letters and/or numbers preceded by @.
Code: Select all
5. @C = p t k b d g r l m n
6. @V = a e i o u
7. @PLOSIVE = p t k b d gCode: Select all
8. @NASAL = m n
9. @LIQUID = r l
10. @SPIRANT = s z
11. @CONSONANT = @PLOSIVE @SPIRANT @LIQUID @NASALOptionals are strings enclosed by parens (...) indicating that the enclosed string may be present or omitted, as is the standard practice. Unlike in some other SCAs, these can be included in both the Initial and Condition, not just the Condition. an Optional in the Final will produce an error.
Finally, I'd like to allow ad hoc Sets, which basically work like Variables, but are not stored in memory. These are useful where you would otherwise use a Variable, but perhaps only the once.
I would like to allow Optionals and Sets in the Initials, but this is inviting bugs.
Some other features I'd like to add are related to some of these others. One is Many-to-One replacement:
Code: Select all
12. a e o > ə
Also, some real-world rules are hard to run using the standard find-and-replace method that this and other SCAs seem to use. Among these are things like vowel syncopation ( I know Jeff was having trouble with this with MUBA's VSCA ) because they require large strings, which means a huge search-space. Another such problem is Grassmann's law in eastern Indo-European, where an aspirated stop in a root is de-aspirated if another aspirated stop occurs later in the root. Distance assimilation/dissimilation will probably always be a problem. Here are the rules from my VSCA code for doing Grassmann's law in Kuma-Koban:
Code: Select all
VS=aeiou[ə]
VL=āēīōū[ə̄]
V=<VS><VL>
N=mn
R=rl
IU=iu
CH=[pʰ][tʰ][cʰ][kʰ]
[pʰ][tʰ][cʰ][kʰ]/bdɟg/_V(N)<CH> 576
[pʰ][tʰ][cʰ][kʰ]/bdɟg/(R)_V<IU><CH> 1152
[pʰ][tʰ][cʰ][kʰ]/bdɟg/_VR(s)<CH> 768
2496I think I know how I could potentially solve this using regular expressions (built in, of course; I want this to be relatively user-friendly), but this will require a totally different processing mechanism than the other rules and will at least have to wait until I finish getting the basics done.
I'll try to make an alpha of some kind available as soon as I can.