This is the successor to my old ASCA program that nobody asked for. It is, however, much better tested - I've been working professionally as a computational linguist for the last year and have learned a lot about software design.
I don't have a clever name for it, but it's an SCA and it's part of a set of tools I'm developing for future dissertation research. Let's just call it HTS for now. I'm interested in people's feedback on usability, documentation, and any problems - as stated above, I needed to develop this as part of another project, but I wanted to make it available for public use.
Download Here (2014.01.24) - contains jar, an example batch file, and example PIE to Kuma-Koban lexicon and rules. If you have an ASCA rules file, it will be compatible with this, though because HTS handles combining characters differently, the intent might be slightly off.
I'll try to be as concise as possibe here, but there is one really critical thing to understand before using this tool: HTS does not manipulate strings.
It is designed to operate on phonetic segments, and handles combining diacritics and modifier letters intelligently. A rule targeting p will not affect pʰ, ever, under any circumstances. Symbols like <ʰ>, and others like <ˣ>, <ʶ>, combining diacritics, alphanumeric sub- and superscripts, are understood to not be characters on their own, but modifiers on another (preceding) character.
Important Caveat: all diacritics are will currently combine with the segments they follow, so pre-aspiration as denoted ʰp is not possibly at the moment, though it will be once support for phonetic features is added (though it will need to be pre-defined).
Also, rules and lexicon entries are normalized using canonical composition in unicode, so you don't have to worry about manually converting between forms with combining accents and their precomposed equivalents.
Running HTS
This is a stand-alone Java application - you will need the JRE installed at the moment, but there are no external dependenes. Run it with a script or from the command line. HTS needs three arguments: lexicon, rules, and output, in that order.
Code: Select all
java -jar toolboxSCA.jar PIE_lex.txt PIE-PKK.rules PKK_lex.txt
The rule format is supposed to be pretty intuitive. You can add comments using % anywhere in a line - anything after % will be ignored. The following characters are reserved and should only be used in commands:
Code: Select all
# % _ / > ( ) { } * +? =
Variables are simple - unlike in ASCA, there are not restrictions on naming. The following block demonstrates some of what can be done:
Code: Select all
Q = kʷʰ kʷ gʷ
K = kʰ k g
KY = cʰ c ɟ
P = pʰ p b
T = tʰ t d
[PLOSIVE] = P T KY K Q
[OBSTRUENT] = [PLOSIVE] s
You can also modify variables by just redefining them, or you can add a symbol to a variables like this:
Code: Select all
C = C s
Transformation rules are a bit more complicated. Unlike programs that used the // notation, HST uses > for transformation, and / to denote the condition:
Code: Select all
h₁ h₂ h₃ h₄ > ʔ x ɣ ʕ
bʰ dʰ ǵʰ gʰ gʷʰ > pʰ tʰ ḱʰ kʰ kʷʰ
% GRASSMANN'S LAW
CH = pʰ tʰ cʰ kʰ
J = b d ɟ g
CH > J / _R?VV?C*CH
y w > i u / _{C X #}
You can delete characters using 0:
Code: Select all
xa xə > 0 / [LongV]L_
X > 0 / _{C #}
X > 0 / C_
ʔ > 0 / #_
As in the following examples, you can combine segments and variables:
Code: Select all
r̩X l̩X > ə̄r ə̄l / _{C #}
r̩X l̩X > ər əl / _V
Code: Select all
NX > Nə / #_C
Regular Expressions
The first thing to remember is that in HTS, parentheses do not mean "optional". I'll probably make this configurable in a later version, but for now parentheses are used when you need to apply ?, *, or + to a series of segments or variables, or to separate potentially-conflicting variable names like these: (K)(Y), K(Y), (K)Y.
If you are not familiar with regular expressions, the semantics here should be fairly straightforward. ? indicates that the preceding expression (group, segment, or variable) can be matched zero or one times - this is equivalent to the use of parenthese to make something optional. * will match the preceding expression zero or more times - using variables from our examples, C* indicates any number of consonants. + is similar, matching an expression one or more times.
Not that you'd be likely to need to, but I built HTS to parse expressions into state machines, so you have a lot of power in writing conditions (provided you don't need back-references). For example, you can write conditions like the following:
Code: Select all
_{ab* (cd?)+ ((ae)*f)+}tr
_{ab {cd xy} ef}tr
Future Development
I've been developing HTS with hooks in the code for future additions, like being able to load additional rules or variable definitions from inside another rule file, or reading and writing lexicons in the same way.
Also, I'd like to add support for compound conditions, like
Code: Select all
X > 0 / _{C #} OR C_
The next step is integrating features into the rule system. The segment-based approach I've taken already support this and is used by sequence alignment code that I might release soon as well - this uses dynamic programming and a hybrid articulatory-perceptual feature system to align sequences as a first stem in researching semi-automated reconstruction.