An Extended Sound Change Applier

Discussions worth keeping around later.
Post Reply
bradrn
Niš
Niš
Posts: 10
Joined: Tue Mar 08, 2016 2:13 am

An Extended Sound Change Applier

Post by bradrn »

EDIT: The link given is outdated; you can get the latest version from here

First of all, I know that there are many, many sound changers online already. The goal of this project is not to make the best sound change applier, but to make a sound change supplier that supports lots of features and is also compatible with zompist's SCA2 - as mentioned above, there are many very good sound changers already, but their format is quite different to the SCA's (e.g. going from P/B/_ *reaction -> reagdion to * <ustop> <vstop> _ ! reaction -> reagdion) and it could be hard to completely change how you write sound change rules.

Since I had some time, I decided to create a sound change applier that does exactly that. Thus, I present the exSCA (Extended Sound Change Applier), which you can get from here (no mac version yet, though I'm working on it - sorry!), supporting:
  • Everything that the SCA2 does (except the wildcard)
  • Syllabification using regexes
  • Automatic affixer
  • Categories within categories
  • Syntax highlighting
  • Opening and saving sound changes and lexicon
  • Writing custom sound changes in Python
  • Everything is typable using an ordinary computer keyboard - 2 is replaced by > (I haven't added glosses or the wildcard yet)
And some things it does not support:
  • The wildcard (in the meantime you can implement it using Python - at the bottom of the post is some code you can copy into the program)
  • Backtracking (so a/b/_(C)C would not apply to the word 'ap' because it would interpret the 'p' as part of the (C))
  • The Edit menu (in the meantime, use Ctrl-C, Ctrl-X and Ctrl-V).
Screenshots (sorry if these don't work)
Image
Image
Image
Image

New Features
ASCII
The SCA2 uses 2 to represent duplication. The exSCA replaces this with the character >, which can be typed much more easily.

Syllabification
Many people have complained on this forum about the lack of syllabification in the SCA and SCA2. In the exSCA, I have included regex syllabification, accessed by putting an 'x' in front of the rule, separated from it by a single space, which adds a . at each syllable boundary. Its main purpose would probably be to not let sound changes work across syllable boundaries (e.g. st/ss/_ x, when applied to a word like aste, would syllabificate it as as.te and not match anything because the sequence of characters is s.t, not st).
Syllabification regexes
The syllabification algorithm operates using regexes: the program finds all the parts of the word that matches the regex, then takes them and sticks a . between them. (Note: One consequence of the way the algorithm is implemented is that if you provide a regex that doesn't match all of the word, the word will get scrambled!) The regexes themselves are ordinary .NET Framework regexes, with one exception: you can use category names in them (e.g. using the definitions C=ptkbdg and V=aeiou, the regex C?V will be turned into [ptkbdg]?[aeiou]). These will have a beige background.
The default regex is an expression that matches (C)V(C(C)) words, leveraging the regex capabilities of the .NET Framework. Below I provide some regexes for several different syllable structures:

(C)V: C?V

(C)V(C): C?V(((?=CC))C|(((?=C$))C|))

Onset-Rime-Coda (as in Chinese): There are two variations of this one, depending on how you want to parse a VCVC word (e.g. anaŋ).
If you want it to parse as VC-VC (e.g. an-aŋ): O?RC?
If you want it to parse as V-CVC (e.g. a-naŋ): O?R(((?=O))|(((?=C))C|))

(C)(C)V(C): (CC?)?V(((?=CC))C|(((?=C$))C|))

Affixer (Decliner/Conjugator)
The automatic affixer (called the Decliner/Conjugator in the program, and accessed via Tools->Decliner/Conjugator or Ctrl-J) is a utility I created to help run declensions or conjugations through the exSCA quickly. When you open it up there are three textboxes: a prefixes one, a stems one and a suffixes one. (No infix support yet, sorry!)
The affixer works by using slots. Each line you put into each textbox counts as one slot. Each slot is composed of several parts, each separated by one space. When you press OK, the words textbox in the main window is filled with all combinations of the different slots with exactly one affix from each slot in each word. You can use an * (asterisk) to denote that the slot is optional - or, to be precise, the empty affix. Each textbox (except the stems textbox) by default contains one * (asterisk). If you keep a textbox empty, you shouldn't remove it. (You can, but then the affixer won't work.) Putting this all together, here is some sample input to the affixer and its output:

Code: Select all

Prefixes  Stems  Suffixes

*         apa    ne * in
          higal  ye go * de
          ša     n nne y yne z zne

Produces an output of:

apaneyen
apaneyenne
apaneyey
apaneyeyne
apaneyez
apaneyezne
apanegon
apanegonne
apanegoy
apanegoyne
apanegoz
apanegozne
apanen
apanenne
apaney
apaneyne
apanez
.
.
.
šaindenne
šaindey
šaindeyne
šaindez
šaindezne
Categories within Categories
You can define categories that use other categories inside them - for instance:

Code: Select all

P=ptk
B=bdg
F=fsx
V=vzG
N=mnŋ
X=rlw
C=PBFVSZX
You can also use pre-existing categories inside inline categories, for instance [szX]. However, if you try and do, for instance, [VN]/X/_C, it won't work.

Python Sound Changes
EDIT: After version 2.0.0, exSCA does not support Python sound changes anymore. You can use regex rules instead, described in this post.

In my opinion, the most innovative feature of this sound change applier is that you can define your own sound changes using the Python (specifically, IronPython, which supports the clr or .NET Framework as well as everything in ordinary Python) programming language. You can open the Python editor from Tools -> Python Editor, or Ctrl-P. Note that you don't have to save your changes for them to work - you just switch between windows and your changes are applied automatically.
Each python sound change is a method of the form

Code: Select all

def <soundchangename>(word, categories):
    <code>
    return changedWord
In the above, word is the word to be changed and categories (which in my sound changes I usually abbreviate as cats) is a clr[/t] Dictionary[,] (to change to a Python dict, go pycats = dict(cats)). Note that import clr is performed automatically, so you don't have to put it in yourself.
To execute this sound change from the list of sound changes, go -<my_python_sound_change. For instance, to execute the python method called remove_consonant_clusters, go -remove_consonant_clusters.
One More Thing - a Replacement for the Wildcard
I didn't include the wildcard in the exSCA for three reasons:
  • In most situations where the wildcard was used in the SCA2, it needed recursion, which the SCA2 didn't have, so in my opinion it was useless
  • It is very hard to implement
  • You can implement it in Python!
Which is exactly what I'm going to do here:

Code: Select all

# Executes TEXT_AFTER/REPLACEMENT_TEXT/TEXT_BEFORE..._
def python_wildcard(word, cats):
    changedWord = word
    scannedText = ""
    before = "TEXT_BEFORE"
    after = "TEXT_AFTER"
    replacement = "REPLACEMENT_TEXT"
    flag = False
    i = 0
    for c in changedWord:
        scannedText += c
        if flag == False:
            if scannedText.endswith(before):
                flag = True
                scannedText = ""
        else:
            if scannedText.endswith(after):
                changedWord = changedWord[:i - len(after) + 1] + replacement + changedWord[i + 1:]
                
        i += 1
    return changedWord
This code executes the SCA2 statement "TEXT_AFTER/REPLACEMENT_TEXT/TEXT_BEFORE..._". To use it, copy this code into the Python Editor (Tools->Python Editor or Ctrl-P) and replace TEXT_BEFORE, TEXT_AFTER, and REPLACEMENT_TEXT with your replacements. Then, in your sound changes, put -python_wildcard.

If you find any bugs (and there are probably lots), post them here and I'll try my best to fix them. Also, if you have a suggestion for a new feature, you post here too and I'll do my best to implement it.
Last edited by bradrn on Thu Apr 13, 2017 3:42 am, edited 3 times in total.

Richard W
Avisaru
Avisaru
Posts: 363
Joined: Sat Oct 16, 2010 8:28 pm

Re: An Extended Sound Change Applier

Post by Richard W »

One possible enhancement, perhaps more appropriate for a philological tool rather than for an artistic tool, would be optional rules. They do seem to be real rather than artificial (cf. English food, good and blood), and they can be used to handle the indeterminate order of idempotent rules, which sometimes happens, e.g. /ˈhɪstri/ v. /ˈhɪstʃri/ for history in my idiolect. The way I've handled indeterminate ordering is to have:

Optional Rule A
Rule B
Rule C # Identical to Rule A.

Aili Meilani
Lebom
Lebom
Posts: 144
Joined: Sun Feb 24, 2013 3:21 pm

Re: An Extended Sound Change Applier

Post by Aili Meilani »

Please add a license. Without it, it's illegal to download and use the program.

User avatar
Ser
Smeric
Smeric
Posts: 1542
Joined: Sat Jul 19, 2008 1:55 am
Location: Vancouver, British Columbia / Colombie Britannique, Canada

Re: An Extended Sound Change Applier

Post by Ser »

bradrn:

http://choosealicense.com/no-license/
For users

If you find software that doesn’t have a license, that generally means you have no permission from the creators of the software to use, modify, or share the software. Although a code host such as GitHub may allow you to view and fork the code, this does not imply that you are permitted to use, modify, or share the software for any purpose.
Please add a license.

User avatar
Chagen
Avisaru
Avisaru
Posts: 707
Joined: Thu Sep 22, 2011 11:54 pm

Re: An Extended Sound Change Applier

Post by Chagen »

Oh my god, that decliner/conjugator is so cool. Does it support morphophonology? For instance in Pazmat -ar stem nouns in -yarā have reduced forms with ṣ (y becomes ṣ in front of any consonant): uyarā "nothing", ūṣrāva "in nothing" (cf. viśarā "the forge", viśrāva "in the forge")
Nūdhrēmnāva naraśva, dṛk śraṣrāsit nūdhrēmanīṣṣ iźdatīyyīm woḥīm madhēyyaṣṣi.
satisfaction-DEF.SG-LOC live.PERFECTIVE-1P.INCL but work-DEF.SG-PRIV satisfaction-DEF.PL.NOM weakeness-DEF.PL-DAT only lead-FUT-3P

bradrn
Niš
Niš
Posts: 10
Joined: Tue Mar 08, 2016 2:13 am

Re: An Extended Sound Change Applier

Post by bradrn »

Aili Melani and Serafín, I've added a licence (MIT) so now you should be able to download the program.

Richard W, I'm not quite sure what you're talking about. I assume that by optional rules you mean rules that are only applied sometimes? This is the sort of scenario I added the Python integration for - you can open up the Python editor (Tools > Python Editor) and add the following code:

Code: Select all

import re
import System

regex = re.compile(r"TEXT")  # replace this with your own regex; details
                             # on IronPython regexes can be found at
                             # https://ironpython-test.readthedocs.io/en/latest/howto/regex.html
                             # (Regular Expression HOWTO)

rnd = System.Random()        # random number generator
percentage = 0.5             # % of times optional rule is applied, expressed
                             # as a decimal

def optional_soundchange(word, cats):
    nextdouble = rnd.NextDouble()
    if nextdouble < percentage:
        return regex.sub("REPLACEMENT", word)   # apply the sound change
    else:
        return word                             # do not apply the sound change
I have no idea what you mean by the 'indeterminate order of idempotent rules' . I looked up 'idempotent'; by 'idempotent rules', do you mean rules that have no further effect if applied more than once? In that case, why would they have an 'indeterminate order'?

Chagen, I can't understand the particular example you're giving me; would a similar rule be the rule that 'if an underlying /k/ appears after a vowel, it is changed to [h] within the word' in Yurakaré (http://www.comparativelinguistics.uzh.c ... Class4.pdf)? These sorts of rules can be modeled by using a combination of the decliner/conjugator and sound changes:

Code: Select all

In the decliner/conjugator:
Prefixes: | Stems: | Suffixes:
* tiŋ_ a_ | kama   | *
Note the underscores at the end of the prefixes; these serve as morpheme boundary markers to disambiguate sequences such as /ak/ inside a word and /a_k/ on a morpheme boundary. The above produces an output of:

Code: Select all

kama
tiŋ_kama
a_kama
We can now use the sound changes

Code: Select all

a_k/a_h/_
_//_
to a) change a sequence of /a_k/ to /a_h/ and b) remove all morpheme boundary markers. These sound changes result in an output of

Code: Select all

kama
tiŋkama
ahama
which is exactly what we wanted. (I would have liked to use a hyphen as the morpheme boundary marker. However, this would have made the second sound change -//_, but since the program interprets rules starting with a hyphen as being Python sound changes, this would be interpreted as invoking a Python sound change //_. You do not have to use an underscore; however, this was the closest character to a hyphen that I could find.)

bradrn
Niš
Niš
Posts: 10
Joined: Tue Mar 08, 2016 2:13 am

Re: An Extended Sound Change Applier

Post by bradrn »

I have now released Version 2.0.3 of exSCA. I've rewritten it from scratch using C++ and Qt, because:
  1. I wanted to make it cross platform, and
  2. The original code, I discovered, is just about unmaintainable.
You can get an installer for Windows here.

Note: I will use the terminology 'target', 'replacement', 'environment' and 'exception(s)' to refer to the following parts of a sound change: target/replacement/environment/exception(s). I will use the term 'category' to mean the sets of characters that you define e.g. if you define P=ptk, this would be referred to as the category P.

New features, in no particular order:
  • One-to-one correspondence of categories in the target and replacement, unlike in the SCA2 (e.g. CC/CC/_ would change e.g. 'apte' to 'appe' in the SCA2, whereas in exSCA it wouldn't do anything). This was also in the previous version, but I didn't explicitly mention it.
  • Can combine metathesis with other characters
  • Nonce categories can contain exceptions (e.g. [C~b] is all consonants except 'b', and [~b] is anything except 'b')
  • Nonce categories can appear in the replacement, so you can do things like [pt]/[fs]/_ (see 'misfeature' 7 below)
  • Backreferences: in any part of a rule, you can use the syntax @<number> to refer back to the appropriate category (e.g. to take an example from here, the rule VSë/VS@1/_ changes e.g. 'anë' to 'ana', and 'komëni' to 'komoni'
  • A lot faster than exSCA 1.0
  • Changeable separator for syllabification: in exSCA 1.0, the character for syllabification was set as a period/full stop, whereas now you can change it
  • Glosses, which weren't supported in exSCA 1.0
  • The Decliner/Conjugator (now renamed to the Affixer, since that's shorter) now supports infixes and can live preview the results
  • Slightly improved saving/opening interface
  • Disappearing categories: a special character (~) in the replacement makes the corresponding category disappear
  • Rules that may or may not apply (e.g. ?50 a/b/_c only applies 50% of the time)
  • Comments: a dedicated comment character (*), unlike in the SCA2
  • More different output formats
  • Unlike in SCA2, the option 'show changed words' now bolds words different to their ancestors, not the last run
And some things which could arguably be called 'misfeatures' (mostly since they might trip up users of SCA2), again in no particular order:
  • There are no Python rules; they've been replaced with simpler 'regex rules'
  • The metathesis character has been changed from '\\' to '\'
  • Rewrite rule syntax has been changed from 'ng|ŋ' to 'ng>ŋ'; I find the former much harder to understand than the latter
  • The character for glosses has been changed from the hard-to-type character to the easier-to-type >
  • The character for (de)gemination has also been changed to >
  • (Partly) as a result of changing all the characters to ASCII, you can't use any of the following characters as letters: / \ _ # > ~ [ ] ( ) @
  • Nonce categories and backreferences in the target count as 'categories'; this is to enable things like PV@1/PVF/_ (where P=ptk, F=fsx and V=aeiou) and [pt]/[fs]/_
There's probably more that I've forgotten; if you see something weird, post something and I'll try to either clarify or fix it.

Now, the details.

One-to-one Correspondence of Categories
In the SCA2, the value of the first category in the replacement is 'saved' and reused for the rest of the categories mentioned in the replacement. For instance, in the SCA2, the rule CC/CC/_ would have the same effect as the rule CC/C²/_, since the value of the first C in the replacement is reused for the second C. However, in exSCA, each category mentioned in the target is paired up with the corresponding category in the replacement. In exSCA, the rule CC/CC/_ would have no effect, since the first and second Cs in the target and replacement respectively are paired up with each other. Graphically:

Code: Select all

CC/CC/_
│└─┼┘
└──┘
where the lines indicate the pairs mentioned above.
This feature can be utilised to create complex sound changes, such as e.g. PP/BV/_ (where the categories are assumed to be P=ptk, B=bdg and V=vzɣ), which changes e.g. 'apta' to 'abza'. Graphically:

Code: Select all

PP/BV/_
│└─┼┘
└──┘
The same rule in the SCA2 would change 'apta' to 'abva'.

This feature is not new in exSCA 2.0.1, but I forgot to mention it in my first post.

Backreferences
Backreferences are a new feature that allow you to do changes that are very hard, or even impossible, to encode in the SCA2. They take the form '@<digit>'.

Backreferences probably best introduced by example. Say that you want to replace an 'e' by the previous vowel, so that e.g. 'kane' turns into 'kana', and 'elunela' turns into 'elunula'. For simplicity, we will assume that syllables are of the form CV. We can implement this using backreferences as follows, where @1 is the backreference: VCe/VC@1/_. It refers to the first category mentioned - V. Run through the exSCA, it gives the following output:

Code: Select all

kane → kana
elunela → elunula
Similarly, @2 will refer to the second category mentioned, @3 to the third and so on. Backreferences can be used in any of the target, replacement, environment or exception.

Backreferences and Nonce Categories in the Target

I mentioned above that there is a one-to-one correspondence of categories in the target and replacement. This also applies to nonce categories and backreferences. Using the same sort of diagram as above, we have something like this:

Code: Select all

PV@1/PVF/_
││└──┼┼┘
│└───┼┘
└────┘
The rule above changes e.g. 'ateta' to 'atesa'.

The main reason I created this rule was so you can do things like [pt]/[fs]/V_V, without having to create nonce categories first. This is extremely useful, since it means you don't have to create nearly as many categories as in the SCA2.

Removal of Categories
Categories can be removed in the replacement by using the special character ~. For instance, in a language with CV syllables, the rule CV@1@2/CV~~/_ deletes duplicate syllables. Some example output:

Code: Select all

letutun → letun
kare → kare [no change]
renana → rena
Removal of Python Editor
I've removed the Python Editor, since IronPython (the Python implementation I was using) and C++ don't go well together. In addition, the Python rules were quite cumbersome. I've replaced them with simpler regex rules, which have the syntax _regex/substitution. It replaces all occurrences of regex with substitution.

Note for the non-technical: Regexes, or regular expressions, are quite similar to sound changes. If you're familiar with the syntax of SCA2, it should be easy to learn regexes with tutorials such as https://regexone.com/ or http://www.regular-expressions.info/tutorial.html.

Prefixes
exSCA 2 introduces a new part that can be added to rules, called the 'prefix'. Prefixes are added to the beginning of rules by separating them from the rest of the rule with a space. For instance, x N//M_ would be the rule N//M_ with a prefix of x. Prefixes change the operation of their rule in some way. There are two prefixes in exSCA: x, for syllabification, and ?, for optional rules.

x - Syllabification
The prefix x was actually introduced in the previous version of exSCA. However, a new feature has been added: the character for syllabification can be chosen manually. By default it is '-', but it can be changed.

? - Optional Rules
The prefix ?<number> will mean that the rule will only be executed <number>% of times. For instance, the rule ?80 C//C_, when applied to several copies of the word 'alda', will produce an output something like this:

Code: Select all

alda → ala
alda → ala
alda → ala
alda → alda
alda → ala
alda → ala
alda → ala
alda → ala
alda → alda
alda → ala
Request for comment: The rule given above (?80 C//C_), when applied to a word with two instances of the pattern CC, for instance 'alfande', may result in any of 'alfande', 'alfane', 'alande', 'alane'. I'm not sure if this behaviour is entirely naturalistic; in particular, I'm not sure whether the results 'alfane' and 'alande' should be possible. So, my question is this: If an optional sound change applies more than once in a word, can it both apply and not apply in the same word, or do all instances where the change applies have to behave the same? (I do hope that was clear enough...)

Affixer (previously Decliner/Conjugator)
The Decliner/Conjugator has been renamed to the Affixer. Other than that, there have been several new features introduced to the Affixer:

Live Preview
The Affixer now contains a 'Preview' box, which updates as text is typed to show the result. For instance, if the Roots box contains 'a', and the Suffixes box contains '* b c', then the Preview box will contain 'a ab ac'.

Infixes
The Affixer now supports infixes. The notation is similar to that of prefixes and suffixes, but is slightly different, since the Affixer cannot place infixes on its own. Each root can contain several digits inside it e.g. for Tagalog, you could have '1b2asa'. Each infix consists of a digit followed by an infix e.g. for Tagalog, '2um 1nag 1na'. Each infix is substituted in at the digit, resulting in:

Code: Select all

bumasa
nagbasa
nabasa
Addition in Different Places
You can now add the words generated by the Affixer to any of the start, the end or the current cursor position. You can also choose to replace the current wordlist entirely.

Miscellaneous

Metathesis
The character for metathesis has been changed to \. It can also now be combined with other characters. An example of where this could be useful is if you want to break up vowel sequences by inserting the previous consonant e.g. changing 'alaen' to 'alalen'. This could be done using the rule CV/C\/_V, with the following output:

Code: Select all

alaen → alalen
agen → agen [no change]
faisal → fafisal
Exceptions to Nonce Categories
Nonce categories can now contain exceptions, seperated by a tilde (~). For instance, [L~l] is all of L except for l, and [~x] is all characters except for x.

Glosses
Glosses are separated from words using the character >. For instance, 'focus > fire' would be the word 'focus' with a gloss of 'fire'. Unlike in the SCA2, glosses are not affected by rewrite rules.

(De)gemination
The character for gemination and degemination has been changed from ² to >.

Rewrite rules
Personally, I find the notation A|B for rewrite rules very hard to understand. In exSCA 2.0.3, this notation is changed to A>B.

Changed words
In the SCA2 when the option 'Show Changed Words' is checked, words that are different to the previous run are bolded. However, in exSCA 2.0.3, words that are different to their etymon/ancestor, which I think should be a bit more useful. If you want the old version, please post and I'll implement it.

Reporting rules
The 'report which rules apply' checkbox shows a dialog box containing a log of rules applied. However, if there are many words or many changes, then the dialog box can run off the bottom of the screen. To see the rest of the text, you can press Ctrl-A to select all the text and then Ctrl-C to copy. You can then close the dialog and paste the text somewhere else for viewing.

Future directions
The two features I'm thinking about for the exSCA are a) implementing a way to do analogy and b) making a way to reverse sound changes (like the RSCA).
If you have any ideas, please post here. I'm more than happy to implement them!

Bugs
You can post bugs here or at the github repository. If you do post here, I'll create an issue on github, just so I have all the issues in one place.
Last edited by bradrn on Mon Sep 18, 2017 9:50 pm, edited 1 time in total.

bradrn
Niš
Niš
Posts: 10
Joined: Tue Mar 08, 2016 2:13 am

Re: An Extended Sound Change Applier

Post by bradrn »

I wrote: Future directions
The two features I'm thinking about for the exSCA are a) implementing a way to do analogy and b) making a way to reverse sound changes (like the RSCA).
I've released a new version, exSCA 2.1.0. It contains mostly bugfixes, like the previous few versions, but I've also added a new feature: reversing sound changes.

Basic Usage
The new functionality should work right away: write down some rules (I find zompist's Latin->Portuguese rules provided by default with the SCA2 work well), add a word (each word can result in thousands of possible ancestors, so it's a a good idea to try only one word at a time), toggle the 'Reverse changes' checkbox, and press 'Apply'. If we try this with the Latin->Portuguese rules using distrito as our word, this is what we get:

Code: Select all

dīvīstērīvīptūs dīvīstērīvīptūm dīvīstērīvīptū dīvīstērīvīptus dīvīstērīvīptum dīvīstērīvīptu dīvīstērīviptūs dīvīstērīviptūm dīvīstērīviptū dīvīstērīviptus dīvīstērīviptum ... [skipped 3477 words] ... districtū districtus districtum districtu dīstrīctōs dīstrīctōm dīstrīctō dīstrīctos dīstrīctom dīstrīcto dīstrictōs dīstrictōm dīstrictō dīstrictos dīstrictom dīstricto distrīctōs distrīctōm distrīctō distrīctos distrīctom distrīcto districtōs districtōm districtō districtos districtom districto dīstrīptūs dīstrīptūm dīstrīptū dīstrīptus dīstrīptum dīstrīptu dīstriptūs dīstriptūm dīstriptū dīstriptus dīstriptum dīstriptu distrīptūs distrīptūm distrīptū distrīptus distrīptum distrīptu distriptūs distriptūm distriptū distriptus distriptum distriptu dīstrīptōs dīstrīptōm dīstrīptō dīstrīptos dīstrīptom dīstrīpto dīstriptōs dīstriptōm dīstriptō dīstriptos dīstriptom dīstripto distrīptōs distrīptōm distrīptō distrīptos distrīptom distrīpto distriptōs distriptōm distriptō distriptos distriptom distripto
We can verify that these are indeed possible ancestors of distrito by running the rules forward with the words above as input.

Filtration
We can filter the results shown using the textbox marked 'Filters:'. Each line corresponds to a regular expression (see previous post if you don't know what that is) that matches one or more words. If a word is matched by the regular expression, then it is not shown in the output.
Note: The regular expressions used here have an important difference to normal PCRE regular expressions: you can use categories within them. For instance, if the category C was defined as C=ptkbdg, then the regular expression C{3,} would match three or more of ptkbdg in a row. Note that this caveat applies to all regular expressions withing exSCA, not just these.
Let's try filtering the results above: we'll filter the pattern [īi][iī], matching any of īiiiīī. Adding that to the list of filters and pressing 'Apply' still gives us lots of results, but none containing any of the four patterns specified though the list of filters.

Below the textbox is a button with the text 'Filter Current'. This button filters the current list of words without reapplying all the sound changes. It is useful when trying to quickly narrow down a long list of results: alternately typing a new filter and pressing 'Filter Current' is quicker than doing the same process using the Apply button.

Efficiency
Even though I have tried to make the reversing process as quick as possible, by its very nature reversing a series of sound changes takes a long time. Even short series of rules can take a long time to reverse, as is shown by the following example:

Code: Select all

C=ptkbdgfsvzmnŋrlwy
V=aeiou
V//V_
C//C_
When applied to even a simple word (e.g. fasuz), this can take a very long time to reverse. This is because of the very large number of possible ancestors; fasuz, for instance, has more than 200000.

To counteract this problem, I created two prefixes (see last post for details on what a prefix is),  f and a, to control the execution of rules.

f
The f prefix, when applied to a rule, has the effect of only executing the rule in forward mode; when reversing sound changes, this rule will be skipped. This prefix is particularly useful for changes which would take up an extremely long time if executed backwards (e.g. h//_, which when reversed would try to insert an h between every two letters of the word).

a
To understand this prefix, we need to know some details of the reversal algorithm. The reversal algorithm creates two copies of each word. One is kept unmodified (the 'unchanged version'), while the other is run through the sound change we are reversing (the 'changed version'). The effect of this can be seen by reversing a very simple change e.g. i/j/_: when reversed on the word ja, the ancestors suggested are ia and ja. The reason exSCA does this is because the j in ja could have come from either a previous i or been unchanged from a previous j.

But what if we don't want an unchanged version of a word? An example of when this could happen is a rule to introduce palatalization: C/Cʲ/_[ei]. We can assume that the language previously did not have palatalization, and so words like *dʲeb would not occur. But let's have a look at what happens if we reverse this change of dʲjeb: we get the two results of deb and dʲeb. But we stipulated that the language did not have palatalization, and so *dʲeb would not in fact be a possible ancestor. However, by adding the prefix a to the rule, forming a C/Cʲ/_[ei], stipulating that palatalization is always reversed, and correctly giving only deb as a possible ancestor.

Another use of the a prefix is with shifts in pronunciation. For instance, if we have a shift ħ/x/_, where x did not exist before the shift occurred, it would be a good idea to add the a prefix at the beginning so e.g. xe is not listed as a possible ancestor of xe.

Prefixes vs Filtering
But wait! Couldn't we have just filtered out the characters ʲ and a, instead of adding a prefix? Yes, we could have done that. The real strength of the a prefix is in reducing the time taken for reversal. To illustrate, let's expand on the example given above:

Code: Select all

C=ptkbdgfsxvzɣrlwymn
V=aeiou
C//C_
C/Cʲ/_[ei]
When reversed on the word dʲebʲit with a filter ʲ, exSCA takes about 3½ seconds to generate 6859 possible ancestors. When a is added as a prefix to the rule C/Cʲ/_[ei] and the now-redundant filter removed, it takes less than 1 second to generate the same words. Why such a large difference in time? The a filters out three-quarters of the words at an early stage, meaning that the computationally expensive operation of reversing C//C_ only has to reverse a quarter as many words.
In general, an a or f near the end of a list of changes can mean that the reversal works much more quickly, so try and use a and f as much as possible!

What Not to Expect
There are several things that I have not implemented yet:

Pausing
Currently you cannot pause or stop an ongoing reversal operation. I advise you to always save your changes before you do a reversal operation, so that if it's taking a very long time and you need to stop the program your changes will still be there. Hopefully this will be implemented in the future.

Limitations on Reversable Rules
Currently there are several types of rules which exSCA cannot reverse:
  • Regex rules (e.g. _a(.*)a/e\1a)
  • Metathesis (e.g. CV/C\/_V)
  • Tilde rules (e.g. ViT/~iD/C_)
I will be adding a feature in the next version of exSCA which should mitigate this somewhat.

Other Features
I have added a few other features unrelated to reversing sound changes
  • Syntax highlighting has been improved.
  • As tildes can only occur in the replacement, they are now ignored when in the target or the environment.
  • In a rule with categories like A=abc D=de A/D/_, where A has two members and B has only two, the third element of A will not be processed, so that e.g. abc will turn into dec.
  • The handling of nonce categories has been improved. This is explained in more detail below.
Handling of Nonce Categories
Complex nonce categories now follow two simple rules:
  1. If a nonce category contains other categories, those categories are replaced by their expansions, preserving order.
  2. If a nonce category contains exceptions, these exceptions are then removed, again preserving order.
For instance, consider the following:

Code: Select all

A=abcde
C=cef
F=fg
X=pqrs

[AF~C]/X/_
[AF~C] is expanded by Rule 1 to [abcdefg~cef], which is then simplified by Rule 2 to [abdg], meaning that a, b, f and g are changed to p, q, r and s respectively

Documentation
As some of the information from earlier blog posts has become outdated, I will be progressively moving all the documentation to the github wiki.

User avatar
Uruwi
Sanci
Sanci
Posts: 17
Joined: Sat Sep 09, 2017 4:22 pm
Contact:

Re: An Extended Sound Change Applier

Post by Uruwi »

I built a Linux version (with some modifications to the source). Make sure Qt5 is installed.

I'd like to see a command-line mode, though.

bradrn
Niš
Niš
Posts: 10
Joined: Tue Mar 08, 2016 2:13 am

Re: An Extended Sound Change Applier

Post by bradrn »

Uruwi wrote:I built a Linux version (with some modifications to the source). Make sure Qt5 is installed.

I'd like to see a command-line mode, though.
Thank you! Can I include this in the releases?

Also, how did you install Qt5 on Linux? I tried and I can't figure it out.

Edit: About the command-line mode: I've been thinking that it might be a good idea to split it into a GUI front-end and a command-line backend. I've also been thinking that it might be a good idea to implement the backend in Haskell, as the C++ code is rapidly becoming unmaintainable.

gestaltist
Lebom
Lebom
Posts: 125
Joined: Fri Feb 06, 2015 5:21 am

Re: An Extended Sound Change Applier

Post by gestaltist »

Any plans for a Mac version?

User avatar
Uruwi
Sanci
Sanci
Posts: 17
Joined: Sat Sep 09, 2017 4:22 pm
Contact:

Re: An Extended Sound Change Applier

Post by Uruwi »

bradrn wrote: Thank you! Can I include this in the releases?
Feel free.

Travis B.
Sumerul
Sumerul
Posts: 3570
Joined: Mon Jun 20, 2005 12:47 pm
Location: Milwaukee, US

Re: An Extended Sound Change Applier

Post by Travis B. »

As for reimplementing it in Haskell - do eet! And to those out there, if you don't know Haskell already, you won't regret learning it... except that from that point on, you'll see the limitations in every other language out there that you had never noticed before.
Dibotahamdn duthma jallni agaynni ra hgitn lakrhmi.
Amuhawr jalla vowa vta hlakrhi hdm duthmi xaja.
Irdro. Irdro. Irdro. Irdro. Irdro. Irdro. Irdro.

bradrn
Niš
Niš
Posts: 10
Joined: Tue Mar 08, 2016 2:13 am

Re: An Extended Sound Change Applier

Post by bradrn »

gestaltist wrote:Any plans for a Mac version?
Unfortunately no, unless someone builds it on a Mac. I'd like to see that happen, though.

gestaltist
Lebom
Lebom
Posts: 125
Joined: Fri Feb 06, 2015 5:21 am

Re: An Extended Sound Change Applier

Post by gestaltist »

bradrn wrote:
gestaltist wrote:Any plans for a Mac version?
Unfortunately no, unless someone builds it on a Mac. I'd like to see that happen, though.
I might try to build it. What language/libraries/etc. have you used?

bradrn
Niš
Niš
Posts: 10
Joined: Tue Mar 08, 2016 2:13 am

Re: An Extended Sound Change Applier

Post by bradrn »

gestaltist wrote:
bradrn wrote:
gestaltist wrote:Any plans for a Mac version?
Unfortunately no, unless someone builds it on a Mac. I'd like to see that happen, though.
I might try to build it. What language/libraries/etc. have you used?
I used C++ and Qt 4.7. It might be worth waiting for a few more days until you build it, though, as I am intending to release a new version some time in the next few days.

Uruwi wrote:I built a Linux version (with some modifications to the source). Make sure Qt5 is installed.

I'd like to see a command-line mode, though.
I can't get it to work. It prints the following message and then it exits:

Code: Select all

./exSCA: /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5: version `Qt_5' not found (required by ./exSCA)
./exSCA: /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5: version `Qt_5' not found (required by ./exSCA)
./exSCA: /usr/lib/x86_64-linux-gnu/libQt5Core.so.5: version `Qt_5.7' not found (required by ./exSCA)
./exSCA: /usr/lib/x86_64-linux-gnu/libQt5Core.so.5: version `Qt_5' not found (required by ./exSCA)

User avatar
Artaxes
Sanci
Sanci
Posts: 27
Joined: Tue Dec 21, 2010 3:31 pm
Location: Anshan Imparatorlugu
Contact:

Re: An Extended Sound Change Applier

Post by Artaxes »

I have 0xc00007b :roll:

I po ptåkach, as we say it shortly in Eastern Poland. :)

bradrn
Niš
Niš
Posts: 10
Joined: Tue Mar 08, 2016 2:13 am

Re: An Extended Sound Change Applier

Post by bradrn »

exSCA 2.1.2 has been released. You can download it here. It has a few bugfixes, an improved user interface and a few additions to the reversal capabilities.

IMPORTANT: This release uses a new installer, so please uninstall any old versions (exSCA 2.1.0 and less) before you install this one. Future releases will run the uninstaller automatically.

(Oops... I released version 2.1.1, then wrote this post and discovered several bugs. That's why the latest version is 2.1.2 and not 2.1.1.)
bradrn wrote: Limitations on Reversable Rules
Currently there are several types of rules which exSCA cannot reverse:
  • Regex rules (e.g. _a(.*)a/e\1a)
  • Metathesis (e.g. CV/C\/_V)
  • Tilde rules (e.g. ViT/~iD/C_)
I will be adding a feature in the next version of exSCA which should mitigate this somewhat.
This release adds the 'new feature' - or rather, several new features, most of which relate to prefixes.

b prefix
The f prefix already allows running rules in forwards mode only. Now you can use the b prefix to do the reverse - that is, run a rule in backwards mode only. In conjunction with f, this could be useful for reversing rules which can't currently be reversed. Some examples:

Code: Select all

* Forward rule
f _a(.*)a/e\1a
* Backwards rule
b _e(.*)a/a\1a

f CV/C\/_V
b CV@1/CV/_V
Now what about reversing ViT/~iD/C_? This transforms CViT into CiD, deleting the V in the process. But this means that each word has more than one possible ancestor - CiD could come from CaiD, CeiD or any other similar word. How do we make a rule with multiple outputs? This leads us to the next new feature:

Backticks (`)

By placing a backtick (`) before a category or nonce category, all values in the category are used in the output. A simple example: the rule a/`[aeiou]/_, when applied to the word a, produces as output the words (note the plural) a, e, i, o and u. So now we can reverse the rule ViT/~iD/C_

Code: Select all

f ViT/~iD/C_
b iD/`ViT/C_
However, there is still a more subtle problem: the word itself may also be a valid ancestor! For example, let's look at this rule:

Code: Select all

f CV/C\/_V
b CV@1/CV/_V
Now let's consider the ancestors of the word pape. If we run this through exSCA, we get pae - and this is indeed a valid ancestor. However, the word pape itself is also an ancestor. How can we accommodate this in exSCA?

s prefix
The s (standing for sporadic) prefix allows a rule to both run and not run at the same time. This may be clearer with an example: applying s a/b/_ to the word a results in the two words a and b. It should be pretty clear how we apply this to our backwards rule: s CV@1/CV/_V. But if we want to use s, we have to get rid of b. Can we fix this? Yes, it's yet another new feature:

Combining Multiple Prefixes
In exSCA 2.1.1, we can combine multiple prefixes by separating them with a space, so we can do s b or b s to get the full range of words:

Code: Select all

f CV/C\/_V
b s CV@1/CV/_V
This can be useful outside of reversing changes as well. For instance, if we want to change short vowels to long vowels in open syllables 25% of the time, we can do x ?25 V/L/_[-#] (assuming that the default settings haven't been changed and your syllable separator is set to -).

Other changes
The user interface has been improved: the textboxes now have descriptive labels, there is now a 'Save' button (as opposed to the 'Save As' button), and keyboard shortcuts have been added.

What Next?
If anyone has a feature request, you can add it as an issue on github if you have a github account, or you can post it here. This also applies to bugs: if you've found one, please tell me!

If you want to build exSCA for Mac or Linux, the source can be found on github. It's written using C++ and Qt.

Post Reply