Two tools for conlangers
Two tools for conlangers
I have done some Javascript coding recently. The results might be of interest to other conlangers, so I’m posting them here:
The Derivizer.
A simple tool that you can use while building/ expanding your conlang’s lexicon. You can enter (some of) your language’s root words and derivational affixes, and use this script to suggest a few random derivatives and/or compounds, for which you can then try to come up with nice idiomatic meanings. I’ve started writing this tool back in October 2012, and I’ve found it quite useful already. Of course, the exact degree of usefulness depends on the data you enter – I find it works best when you limit the input to a certain domain, e.g. only noun roots from a single semantic field, and only derivational affixes that can be attached to these nouns.
The Frequentizer.
I wanted to do a corpus-based phoneme frequency analysis for one of my conlangs today, but I couldn’t find a suitable online tool for this task in a quick round of googling, so I decided to write one myself. Even in the very first version, it can already (a) provide separate figures for vowels and consonants (and you can even define what counts as a vowel in your conlang), (b) handle user-defined di- and trigraphs correctly by treating them as single segments, and (c) arbitrarily combine different letters into a single phoneme, for instance accented and unaccented vowels. In a future version, the Frequentizer may also be able to assign the same grapheme to different phonemes depending on an orthographically predictable environment, but don’t hold your breath…
I hope these tools will prove useful to some of you. There’s not much documentation for either at this point, but I recommend taking a look at the example data and testing some of the different settings in order to get a feeling for how they work. Have fun!
(Both of these are beta versions, so I can't guarantee that the scripts always work as they should... )
The Derivizer.
A simple tool that you can use while building/ expanding your conlang’s lexicon. You can enter (some of) your language’s root words and derivational affixes, and use this script to suggest a few random derivatives and/or compounds, for which you can then try to come up with nice idiomatic meanings. I’ve started writing this tool back in October 2012, and I’ve found it quite useful already. Of course, the exact degree of usefulness depends on the data you enter – I find it works best when you limit the input to a certain domain, e.g. only noun roots from a single semantic field, and only derivational affixes that can be attached to these nouns.
The Frequentizer.
I wanted to do a corpus-based phoneme frequency analysis for one of my conlangs today, but I couldn’t find a suitable online tool for this task in a quick round of googling, so I decided to write one myself. Even in the very first version, it can already (a) provide separate figures for vowels and consonants (and you can even define what counts as a vowel in your conlang), (b) handle user-defined di- and trigraphs correctly by treating them as single segments, and (c) arbitrarily combine different letters into a single phoneme, for instance accented and unaccented vowels. In a future version, the Frequentizer may also be able to assign the same grapheme to different phonemes depending on an orthographically predictable environment, but don’t hold your breath…
I hope these tools will prove useful to some of you. There’s not much documentation for either at this point, but I recommend taking a look at the example data and testing some of the different settings in order to get a feeling for how they work. Have fun!
(Both of these are beta versions, so I can't guarantee that the scripts always work as they should... )
Blog: audmanh.wordpress.com
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
- KathTheDragon
- Smeric
- Posts: 2139
- Joined: Thu Apr 25, 2013 4:48 am
- Location: Brittania
Re: Two tools for conlangers
Um, your Frequentizer doesn't want to work for my con-lang. It only seems to recognise what you put in the special graphemes box, and even then, it won't count anything.
Re: Two tools for conlangers
I forgot to provide a default value for one internal variable, to be used when the Special Graphemes box is empty. Thanks for the error report! Should be fixed now.
Blog: audmanh.wordpress.com
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
Re: Two tools for conlangers
You're a beautiful person for doing this. Thank you.cedh audmanh wrote:The Derivizer.
- KathTheDragon
- Smeric
- Posts: 2139
- Joined: Thu Apr 25, 2013 4:48 am
- Location: Brittania
Re: Two tools for conlangers
Works great now!cedh audmanh wrote:I forgot to provide a default value for one internal variable, to be used when the Special Graphemes box is empty. Thanks for the error report! Should be fixed now.
-
- Lebom
- Posts: 80
- Joined: Sun Jan 03, 2010 6:11 pm
- Location: Austin, TX
Re: Two tools for conlangers
Super cool!
-
- Smeric
- Posts: 1258
- Joined: Mon Jun 01, 2009 3:07 pm
- Location: Miracle, Inc. Headquarters
- Contact:
Re: Two tools for conlangers
It is very lovely.masako wrote:You're a beautiful person for doing this. Thank you.cedh audmanh wrote:The Derivizer.
I wonder if I can use it for a language with multiple prefix and suffix slots (as in Georgian), cedh?
[bɹ̠ˤʷɪs.təɫ]
Nōn quālibet inīquā cupiditāte illectus hoc agō
Yo te pongo en tu lugar...
Taisc mach Daró
Nōn quālibet inīquā cupiditāte illectus hoc agō
Yo te pongo en tu lugar...
Taisc mach Daró
- Curlyjimsam
- Lebom
- Posts: 205
- Joined: Wed Dec 29, 2004 11:57 am
- Location: Elsewhere
- Contact:
Re: Two tools for conlangers
These two are awesome.
Re: Two tools for conlangers
Not directly. But of course you can do it in several passes: First, enter only affixes of the innermost layer in the dependent morphemes box and use them to derive som complex stems. Second, add these stems to the lexical bases box, and put outer-layer affixes in the dependent morphemes box.Bristel wrote:I wonder if I can use it for a language with multiple prefix and suffix slots (as in Georgian), cedh?cedh audmanh wrote:The Derivizer.
(Doing something like this directly would be an interesting feature for a future version, and there's the added motivational bonus that I'd have a use for it in several of my own languages. However, I don't have an idea offhand how to approach this without having to define separate input boxes for each slot...)
Blog: audmanh.wordpress.com
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
-
- Lebom
- Posts: 91
- Joined: Wed Oct 10, 2007 9:26 pm
-
- Smeric
- Posts: 1258
- Joined: Mon Jun 01, 2009 3:07 pm
- Location: Miracle, Inc. Headquarters
- Contact:
Re: Two tools for conlangers
I'd love that, really. I'm having a hard time quickly deriving words for Proto-Takayo, which is heavily polysynthetic, and even waiting a while for this to come about would be absolutely fine.cedh audmanh wrote:Not directly. But of course you can do it in several passes: First, enter only affixes of the innermost layer in the dependent morphemes box and use them to derive som complex stems. Second, add these stems to the lexical bases box, and put outer-layer affixes in the dependent morphemes box.Bristel wrote:I wonder if I can use it for a language with multiple prefix and suffix slots (as in Georgian), cedh?cedh audmanh wrote:The Derivizer.
(Doing something like this directly would be an interesting feature for a future version, and there's the added motivational bonus that I'd have a use for it in several of my own languages. However, I don't have an idea offhand how to approach this without having to define separate input boxes for each slot...)
The affix slots can probably be filled up to 9 times, 4 before a root, and 5 after, and that's just the verb with a possible noun root incorporated. Not sure how I'm going to handle nouns, which is probably what this tool would be mostly useful for.
[bɹ̠ˤʷɪs.təɫ]
Nōn quālibet inīquā cupiditāte illectus hoc agō
Yo te pongo en tu lugar...
Taisc mach Daró
Nōn quālibet inīquā cupiditāte illectus hoc agō
Yo te pongo en tu lugar...
Taisc mach Daró
Re: Two tools for conlangers
These are really helpful. Do you mind if I incorporate the Frequentizer into my online dictionary (with attribution, of course)? I'd like to pass the input in from the name and pronunciation fields in the database and pass them to your javascript, making it easy to maintain frequency statistics for the words in my lexicon.
Re: Two tools for conlangers
I like this idea, just go ahead!Sevly wrote:These are really helpful. Do you mind if I incorporate the Frequentizer into my online dictionary (with attribution, of course)? I'd like to pass the input in from the name and pronunciation fields in the database and pass them to your javascript, making it easy to maintain frequency statistics for the words in my lexicon.
In other news, I have just uploaded a new version of the Frequentizer. There are now separate grapheme fields for consonants and vowels which both work like the old Special Graphemes field. You can also choose what to do with characters that don't appear in those fields. And of course, the most significant change: The tool now has a basic understanding of syllable structure, so it can give separate statistics for onset and coda consonants. By default, it will treat the first consonant of every cluster plus all word-final consonants as belonging to a syllable coda; if you need different rules, you can add a syllable divider character (by default, the MIDDLE DOT ·) to your text corpus in places where the built-in rules do not give the intended syllabification.
Blog: audmanh.wordpress.com
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
Re: Two tools for conlangers
Nice.
The Frequentizer would be even nicer if it reported rhyme frequencies,
and also if it could count the absence of codas as null codas.
The Frequentizer would be even nicer if it reported rhyme frequencies,
and also if it could count the absence of codas as null codas.
-
- Sanci
- Posts: 70
- Joined: Fri May 04, 2012 4:27 am
- Location: Caernarfon, Gwynedd, Wales
Re: Two tools for conlangers
Man!!!! Thank you!!!!
F*** it's awesome!!!
F*** it's awesome!!!
languages I speak Hebrew, English, Welsh, Russian
languages I learn Latin, Arabic
languages I learn Latin, Arabic
Re: Two tools for conlangers
Cool!
Thank you, Cedh, especially for the Frequentizer!
Also, I suspect the Derivizer can be used to generate paradigms, with only minor modifications?
Thank you, Cedh, especially for the Frequentizer!
Also, I suspect the Derivizer can be used to generate paradigms, with only minor modifications?
And zero onsets?Corundum wrote:The Frequentizer would be even nicer if it reported rhyme frequencies,
and also if it could count the absence of codas as null codas.
Basilius
Re: Two tools for conlangers
the frequentizer is fucking awesome
awesome i tell you !!
i used it to check if phonemes in a corpus follow zipf's law, using graphemes in spanish as a proxy, and looks like they don't.
awesome i tell you !!
i used it to check if phonemes in a corpus follow zipf's law, using graphemes in spanish as a proxy, and looks like they don't.
Re: Two tools for conlangers
A more sophisticated syllable model is definitely on my to-do-list. Also, consonant cluster frequencies.Basilius wrote:And zero onsets?Corundum wrote:The Frequentizer would be even nicer if it reported rhyme frequencies,
and also if it could count the absence of codas as null codas.
I'm not so sure about this one. It might work for a purely agglutinative language, but once you have sandhi effects at the stem-affix boundary you'd need a software where you can specify (morpho)phonological rules. A sound change applier such as GSCA, which supports output for multiple dialects that you can repurpose for different morphological forms of the same word, is probably a much better starting point for a paradigm generator.Basilius wrote:Also, I suspect the Derivizer can be used to generate paradigms, with only minor modifications?
Blog: audmanh.wordpress.com
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
Re: Two tools for conlangers
Done.cedh audmanh wrote:A more sophisticated syllable model is definitely on my to-do-list. Also, consonant cluster frequencies.Basilius wrote:And zero onsets?Corundum wrote:The Frequentizer would be even nicer if it reported rhyme frequencies,
and also if it could count the absence of codas as null codas.
I've just uploaded version 0.3 of the Frequentizer, which has an improved syllable model that supports full and null onsets or codas, intervocalic consonants or consonant clusters, and can restrict the analysis to syllables in a certain position in the word. This means you can now ask for things like “vowels in word-medial syllables” or “syllable onsets in non-final syllables”. The program also presents the results in a shiny visual diagram now (made with the Chart.js library), and it has a proper open source license (Free BSD).
Last edited by Cedh on Fri Jun 07, 2013 2:05 pm, edited 1 time in total.
Blog: audmanh.wordpress.com
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
- KathTheDragon
- Smeric
- Posts: 2139
- Joined: Thu Apr 25, 2013 4:48 am
- Location: Brittania
Re: Two tools for conlangers
Is there any way to get the Frequentizer to recognise a consonant as syllabic without using another character? For example, my conlang has syllabic r as an allophone to consonantal r. This is a pain when dealing with syllables, as the word for 'brother' is niptr.
Re: Two tools for conlangers
The Frequentizer is looking really good. A simple request that would be helpful to me—could we have comments in the text corpus, something like having the parser ignore text from // to the end of the line?
Re: Two tools for conlangers
Comments in the corpus should be easy to implement; that's definitely something for the next version.
Syllabic consonants are more difficult, (a) because the program currently doesn't really know what to do with segments that are defined as both consonants and vowels (it currently treats them as vowels by accident), (b) because consonant syllabicity is very much context-dependent, and (c) because not all consonants can become syllabic, so I would probably need to teach the program something about the sonority hierarchy. It's a good idea for a future feature though.
Syllabic consonants are more difficult, (a) because the program currently doesn't really know what to do with segments that are defined as both consonants and vowels (it currently treats them as vowels by accident), (b) because consonant syllabicity is very much context-dependent, and (c) because not all consonants can become syllabic, so I would probably need to teach the program something about the sonority hierarchy. It's a good idea for a future feature though.
Blog: audmanh.wordpress.com
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
Re: Two tools for conlangers
Version 0.4 of the Frequentizer is up. It can now give some word-level statistics, restrict the analysis to words of a certain length, determine the most commonly used bi- and trigrams within words, and report the frequency of syllable shapes of the type CV, CCV, CVC etc. And it supports comments in the corpus, as Sevly suggested. Everything from // to the next linebreak will be ignored.
Blog: audmanh.wordpress.com
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ