zompist bboard

Posted: **Mon May 20, 2013 4:45 pm**

I have done some Javascript coding recently. The results might be of interest to other conlangers, so I’m posting them here:

The Derivizer.
A simple tool that you can use while building/ expanding your conlang’s lexicon. You can enter (some of) your language’s root words and derivational affixes, and use this script to suggest a few random derivatives and/or compounds, for which you can then try to come up with nice idiomatic meanings. I’ve started writing this tool back in October 2012, and I’ve found it quite useful already. Of course, the exact degree of usefulness depends on the data you enter – I find it works best when you limit the input to a certain domain, e.g. only noun roots from a single semantic field, and only derivational affixes that can be attached to these nouns.

The Frequentizer.
I wanted to do a corpus-based phoneme frequency analysis for one of my conlangs today, but I couldn’t find a suitable online tool for this task in a quick round of googling, so I decided to write one myself. Even in the very first version, it can already (a) provide separate figures for vowels and consonants (and you can even define what counts as a vowel in your conlang), (b) handle user-defined di- and trigraphs correctly by treating them as single segments, and (c) arbitrarily combine different letters into a single phoneme, for instance accented and unaccented vowels. In a future version, the Frequentizer may also be able to assign the same grapheme to different phonemes depending on an orthographically predictable environment, but don’t hold your breath…

I hope these tools will prove useful to some of you. There’s not much documentation for either at this point, but I recommend taking a look at the example data and testing some of the different settings in order to get a feeling for how they work. Have fun!

(Both of these are beta versions, so I can't guarantee that the scripts always work as they should...

)

Posted: **Mon May 20, 2013 5:07 pm**

Um, your Frequentizer doesn't want to work for my con-lang. It only seems to recognise what you put in the special graphemes box, and even then, it won't count anything.

Posted: **Mon May 20, 2013 5:30 pm**

I forgot to provide a default value for one internal variable, to be used when the Special Graphemes box is empty. Thanks for the error report! Should be fixed now.

Posted: **Mon May 20, 2013 6:35 pm**

cedh audmanh wrote:The Derivizer.

You're a beautiful person for doing this. Thank you.

Posted: **Mon May 20, 2013 10:53 pm**

cedh audmanh wrote:I forgot to provide a default value for one internal variable, to be used when the Special Graphemes box is empty. Thanks for the error report! Should be fixed now.

Works great now!

Posted: **Mon May 20, 2013 11:46 pm**

Super cool!

Posted: **Tue May 21, 2013 4:17 am**

masako wrote:
cedh audmanh wrote:The Derivizer.
You're a beautiful person for doing this. Thank you.

It is very lovely.

I wonder if I can use it for a language with multiple prefix and suffix slots (as in Georgian), cedh?

Posted: **Tue May 21, 2013 6:43 am**

These look amazing. Thankyou.

Posted: **Tue May 21, 2013 7:59 am**

These two are awesome.

Posted: **Tue May 21, 2013 8:23 am**

Bristel wrote:
cedh audmanh wrote:The Derivizer.
I wonder if I can use it for a language with multiple prefix and suffix slots (as in Georgian), cedh?

Not directly. But of course you can do it in several passes: First, enter only affixes of the innermost layer in the dependent morphemes box and use them to derive som complex stems. Second, add these stems to the lexical bases box, and put outer-layer affixes in the dependent morphemes box.

(Doing something like this directly would be an interesting feature for a future version, and there's the added motivational bonus that I'd have a use for it in several of my own languages. However, I don't have an idea offhand how to approach this without having to define separate input boxes for each slot...)

Posted: **Wed May 22, 2013 1:41 am**

Cool!

Posted: **Wed May 22, 2013 4:39 pm**

cedh audmanh wrote:
Bristel wrote:
cedh audmanh wrote:The Derivizer.
I wonder if I can use it for a language with multiple prefix and suffix slots (as in Georgian), cedh?
Not directly. But of course you can do it in several passes: First, enter only affixes of the innermost layer in the dependent morphemes box and use them to derive som complex stems. Second, add these stems to the lexical bases box, and put outer-layer affixes in the dependent morphemes box.

(Doing something like this directly would be an interesting feature for a future version, and there's the added motivational bonus that I'd have a use for it in several of my own languages. However, I don't have an idea offhand how to approach this without having to define separate input boxes for each slot...)

I'd love that, really. I'm having a hard time quickly deriving words for Proto-Takayo, which is heavily polysynthetic, and even waiting a while for this to come about would be absolutely fine.

The affix slots can probably be filled up to 9 times, 4 before a root, and 5 after, and that's just the verb with a possible noun root incorporated. Not sure how I'm going to handle nouns, which is probably what this tool would be mostly useful for.

Posted: **Wed May 22, 2013 9:46 pm**

These are really helpful. Do you mind if I incorporate the Frequentizer into my online dictionary (with attribution, of course)? I'd like to pass the input in from the name and pronunciation fields in the database and pass them to your javascript, making it easy to maintain frequency statistics for the words in my lexicon.

Posted: **Thu May 23, 2013 1:57 pm**

Sevly wrote:These are really helpful. Do you mind if I incorporate the Frequentizer into my online dictionary (with attribution, of course)? I'd like to pass the input in from the name and pronunciation fields in the database and pass them to your javascript, making it easy to maintain frequency statistics for the words in my lexicon.

I like this idea, just go ahead!

In other news, I have just uploaded a new version of the Frequentizer. There are now separate grapheme fields for consonants and vowels which both work like the old Special Graphemes field. You can also choose what to do with characters that don't appear in those fields. And of course, the most significant change: The tool now has a basic understanding of syllable structure, so it can give separate statistics for onset and coda consonants. By default, it will treat the first consonant of every cluster plus all word-final consonants as belonging to a syllable coda; if you need different rules, you can add a syllable divider character (by default, the MIDDLE DOT ·) to your text corpus in places where the built-in rules do not give the intended syllabification.

Posted: **Thu May 23, 2013 4:00 pm**

Nice.

The Frequentizer would be even nicer if it reported rhyme frequencies,

and also if it could count the absence of codas as null codas.

Posted: **Fri May 24, 2013 11:53 am**

Man!!!! Thank you!!!!
F*** it's awesome!!!

Posted: **Mon May 27, 2013 4:17 pm**

Cool!

Thank you, Cedh, especially for the Frequentizer!

Also, I suspect the Derivizer can be used to generate paradigms, with only minor modifications?

Corundum wrote:The Frequentizer would be even nicer if it reported rhyme frequencies,

and also if it could count the absence of codas as null codas.

And zero onsets?

Posted: **Mon May 27, 2013 7:42 pm**

the frequentizer is fucking awesome

awesome i tell you !!

i used it to check if phonemes in a corpus follow zipf's law, using graphemes in spanish as a proxy, and looks like they don't.

Posted: **Thu May 30, 2013 4:39 am**

Basilius wrote:
Corundum wrote:The Frequentizer would be even nicer if it reported rhyme frequencies,
and also if it could count the absence of codas as null codas.
And zero onsets?

A more sophisticated syllable model is definitely on my to-do-list. Also, consonant cluster frequencies.

Basilius wrote:Also, I suspect the Derivizer can be used to generate paradigms, with only minor modifications?

I'm not so sure about this one. It might work for a purely agglutinative language, but once you have sandhi effects at the stem-affix boundary you'd need a software where you can specify (morpho)phonological rules. A sound change applier such as GSCA, which supports output for multiple dialects that you can repurpose for different morphological forms of the same word, is probably a much better starting point for a paradigm generator.

Posted: **Tue Jun 04, 2013 2:56 pm**

cedh audmanh wrote:
Basilius wrote:
Corundum wrote:The Frequentizer would be even nicer if it reported rhyme frequencies,
and also if it could count the absence of codas as null codas.
And zero onsets?
A more sophisticated syllable model is definitely on my to-do-list. Also, consonant cluster frequencies.

Done.

I've just uploaded version 0.3 of the Frequentizer, which has an improved syllable model that supports full and null onsets or codas, intervocalic consonants or consonant clusters, and can restrict the analysis to syllables in a certain position in the word. This means you can now ask for things like “vowels in word-medial syllables” or “syllable onsets in non-final syllables”. The program also presents the results in a shiny visual diagram now (made with the Chart.js library), and it has a proper open source license (Free BSD).

Posted: **Tue Jun 04, 2013 5:29 pm**

Is there any way to get the Frequentizer to recognise a consonant as syllabic without using another character? For example, my conlang has syllabic r as an allophone to consonantal r. This is a pain when dealing with syllables, as the word for 'brother' is niptr.

Posted: **Tue Jun 04, 2013 8:55 pm**

The Frequentizer is looking really good. A simple request that would be helpful to me—could we have comments in the text corpus, something like having the parser ignore text from // to the end of the line?

Posted: **Wed Jun 05, 2013 7:58 am**

Comments in the corpus should be easy to implement; that's definitely something for the next version.

Syllabic consonants are more difficult, (a) because the program currently doesn't really know what to do with segments that are defined as both consonants and vowels (it currently treats them as vowels by accident), (b) because consonant syllabicity is very much context-dependent, and (c) because not all consonants can become syllabic, so I would probably need to teach the program something about the sonority hierarchy. It's a good idea for a future feature though.

Posted: **Fri Jun 07, 2013 2:08 pm**

Version 0.4 of the Frequentizer is up. It can now give some word-level statistics, restrict the analysis to words of a certain length, determine the most commonly used bi- and trigrams within words, and report the frequency of syllable shapes of the type CV, CCV, CVC etc. And it supports comments in the corpus, as Sevly suggested. Everything from // to the next linebreak will be ignored.

zompist bboard

Two tools for conlangers

Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers

Re: Two tools for conlangers