Two tools for conlangers

Substantial postings about constructed languages and constructed worlds in general. Good place to mention your own or evaluate someone else's. Put quick questions in C&C Quickies instead.
Post Reply
Cedh
Sanno
Sanno
Posts: 938
Joined: Tue Nov 14, 2006 10:30 am
Location: Tübingen, Germany
Contact:

Two tools for conlangers

Post by Cedh »

I have done some Javascript coding recently. The results might be of interest to other conlangers, so I’m posting them here:

The Derivizer.
A simple tool that you can use while building/ expanding your conlang’s lexicon. You can enter (some of) your language’s root words and derivational affixes, and use this script to suggest a few random derivatives and/or compounds, for which you can then try to come up with nice idiomatic meanings. I’ve started writing this tool back in October 2012, and I’ve found it quite useful already. Of course, the exact degree of usefulness depends on the data you enter – I find it works best when you limit the input to a certain domain, e.g. only noun roots from a single semantic field, and only derivational affixes that can be attached to these nouns.

The Frequentizer.
I wanted to do a corpus-based phoneme frequency analysis for one of my conlangs today, but I couldn’t find a suitable online tool for this task in a quick round of googling, so I decided to write one myself. Even in the very first version, it can already (a) provide separate figures for vowels and consonants (and you can even define what counts as a vowel in your conlang), (b) handle user-defined di- and trigraphs correctly by treating them as single segments, and (c) arbitrarily combine different letters into a single phoneme, for instance accented and unaccented vowels. In a future version, the Frequentizer may also be able to assign the same grapheme to different phonemes depending on an orthographically predictable environment, but don’t hold your breath…

I hope these tools will prove useful to some of you. There’s not much documentation for either at this point, but I recommend taking a look at the example data and testing some of the different settings in order to get a feeling for how they work. Have fun!

(Both of these are beta versions, so I can't guarantee that the scripts always work as they should... ;) )

User avatar
KathTheDragon
Smeric
Smeric
Posts: 2139
Joined: Thu Apr 25, 2013 4:48 am
Location: Brittania

Re: Two tools for conlangers

Post by KathTheDragon »

Um, your Frequentizer doesn't want to work for my con-lang. It only seems to recognise what you put in the special graphemes box, and even then, it won't count anything.

Cedh
Sanno
Sanno
Posts: 938
Joined: Tue Nov 14, 2006 10:30 am
Location: Tübingen, Germany
Contact:

Re: Two tools for conlangers

Post by Cedh »

I forgot to provide a default value for one internal variable, to be used when the Special Graphemes box is empty. Thanks for the error report! Should be fixed now.

User avatar
masako
Smeric
Smeric
Posts: 1731
Joined: Sat Nov 06, 2004 4:31 pm
Location: 가매
Contact:

Re: Two tools for conlangers

Post by masako »

cedh audmanh wrote:The Derivizer.
You're a beautiful person for doing this. Thank you.

User avatar
KathTheDragon
Smeric
Smeric
Posts: 2139
Joined: Thu Apr 25, 2013 4:48 am
Location: Brittania

Re: Two tools for conlangers

Post by KathTheDragon »

cedh audmanh wrote:I forgot to provide a default value for one internal variable, to be used when the Special Graphemes box is empty. Thanks for the error report! Should be fixed now.
Works great now!

Gray Richardson
Lebom
Lebom
Posts: 80
Joined: Sun Jan 03, 2010 6:11 pm
Location: Austin, TX

Re: Two tools for conlangers

Post by Gray Richardson »

Super cool!

Bristel
Smeric
Smeric
Posts: 1258
Joined: Mon Jun 01, 2009 3:07 pm
Location: Miracle, Inc. Headquarters
Contact:

Re: Two tools for conlangers

Post by Bristel »

masako wrote:
cedh audmanh wrote:The Derivizer.
You're a beautiful person for doing this. Thank you.
It is very lovely.

I wonder if I can use it for a language with multiple prefix and suffix slots (as in Georgian), cedh?
[bɹ̠ˤʷɪs.təɫ]
Nōn quālibet inīquā cupiditāte illectus hoc agō
Yo te pongo en tu lugar...
Taisc mach Daró

User avatar
Curlyjimsam
Lebom
Lebom
Posts: 205
Joined: Wed Dec 29, 2004 11:57 am
Location: Elsewhere
Contact:

Re: Two tools for conlangers

Post by Curlyjimsam »

These look amazing. Thankyou.

User avatar
Click
Avisaru
Avisaru
Posts: 620
Joined: Sun Mar 04, 2012 11:53 am

Re: Two tools for conlangers

Post by Click »

These two are awesome.

Cedh
Sanno
Sanno
Posts: 938
Joined: Tue Nov 14, 2006 10:30 am
Location: Tübingen, Germany
Contact:

Re: Two tools for conlangers

Post by Cedh »

Bristel wrote:
cedh audmanh wrote:The Derivizer.
I wonder if I can use it for a language with multiple prefix and suffix slots (as in Georgian), cedh?
Not directly. But of course you can do it in several passes: First, enter only affixes of the innermost layer in the dependent morphemes box and use them to derive som complex stems. Second, add these stems to the lexical bases box, and put outer-layer affixes in the dependent morphemes box.

(Doing something like this directly would be an interesting feature for a future version, and there's the added motivational bonus that I'd have a use for it in several of my own languages. However, I don't have an idea offhand how to approach this without having to define separate input boxes for each slot...)

Linguist Wannabe
Lebom
Lebom
Posts: 91
Joined: Wed Oct 10, 2007 9:26 pm

Re: Two tools for conlangers

Post by Linguist Wannabe »

Cool!

Bristel
Smeric
Smeric
Posts: 1258
Joined: Mon Jun 01, 2009 3:07 pm
Location: Miracle, Inc. Headquarters
Contact:

Re: Two tools for conlangers

Post by Bristel »

cedh audmanh wrote:
Bristel wrote:
cedh audmanh wrote:The Derivizer.
I wonder if I can use it for a language with multiple prefix and suffix slots (as in Georgian), cedh?
Not directly. But of course you can do it in several passes: First, enter only affixes of the innermost layer in the dependent morphemes box and use them to derive som complex stems. Second, add these stems to the lexical bases box, and put outer-layer affixes in the dependent morphemes box.

(Doing something like this directly would be an interesting feature for a future version, and there's the added motivational bonus that I'd have a use for it in several of my own languages. However, I don't have an idea offhand how to approach this without having to define separate input boxes for each slot...)
I'd love that, really. I'm having a hard time quickly deriving words for Proto-Takayo, which is heavily polysynthetic, and even waiting a while for this to come about would be absolutely fine.

The affix slots can probably be filled up to 9 times, 4 before a root, and 5 after, and that's just the verb with a possible noun root incorporated. Not sure how I'm going to handle nouns, which is probably what this tool would be mostly useful for.
[bɹ̠ˤʷɪs.təɫ]
Nōn quālibet inīquā cupiditāte illectus hoc agō
Yo te pongo en tu lugar...
Taisc mach Daró

User avatar
Sevly
Lebom
Lebom
Posts: 214
Joined: Sat Mar 31, 2007 10:50 pm
Location: (x, y, z, t)

Re: Two tools for conlangers

Post by Sevly »

These are really helpful. Do you mind if I incorporate the Frequentizer into my online dictionary (with attribution, of course)? I'd like to pass the input in from the name and pronunciation fields in the database and pass them to your javascript, making it easy to maintain frequency statistics for the words in my lexicon.

Cedh
Sanno
Sanno
Posts: 938
Joined: Tue Nov 14, 2006 10:30 am
Location: Tübingen, Germany
Contact:

Re: Two tools for conlangers

Post by Cedh »

Sevly wrote:These are really helpful. Do you mind if I incorporate the Frequentizer into my online dictionary (with attribution, of course)? I'd like to pass the input in from the name and pronunciation fields in the database and pass them to your javascript, making it easy to maintain frequency statistics for the words in my lexicon.
I like this idea, just go ahead!

In other news, I have just uploaded a new version of the Frequentizer. There are now separate grapheme fields for consonants and vowels which both work like the old Special Graphemes field. You can also choose what to do with characters that don't appear in those fields. And of course, the most significant change: The tool now has a basic understanding of syllable structure, so it can give separate statistics for onset and coda consonants. By default, it will treat the first consonant of every cluster plus all word-final consonants as belonging to a syllable coda; if you need different rules, you can add a syllable divider character (by default, the MIDDLE DOT ·) to your text corpus in places where the built-in rules do not give the intended syllabification.

Corundum
Sanci
Sanci
Posts: 24
Joined: Thu Sep 20, 2007 10:37 pm
Location: at deictic center

Re: Two tools for conlangers

Post by Corundum »

Nice.

The Frequentizer would be even nicer if it reported rhyme frequencies,

and also if it could count the absence of codas as null codas.

legolasean
Sanci
Sanci
Posts: 70
Joined: Fri May 04, 2012 4:27 am
Location: Caernarfon, Gwynedd, Wales

Re: Two tools for conlangers

Post by legolasean »

Man!!!! Thank you!!!!
F*** it's awesome!!!
languages I speak Hebrew, English, Welsh, Russian
languages I learn Latin, Arabic

User avatar
Basilius
Avisaru
Avisaru
Posts: 398
Joined: Thu Nov 16, 2006 5:43 am
Location: Moscow, Russia

Re: Two tools for conlangers

Post by Basilius »

Cool!

Thank you, Cedh, especially for the Frequentizer!

Also, I suspect the Derivizer can be used to generate paradigms, with only minor modifications?
Corundum wrote:The Frequentizer would be even nicer if it reported rhyme frequencies,

and also if it could count the absence of codas as null codas.
And zero onsets?
Basilius

User avatar
Torco
Smeric
Smeric
Posts: 2372
Joined: Thu Aug 30, 2007 10:45 pm
Location: Santiago de Chile

Re: Two tools for conlangers

Post by Torco »

the frequentizer is fucking awesome

awesome i tell you !!

i used it to check if phonemes in a corpus follow zipf's law, using graphemes in spanish as a proxy, and looks like they don't.

Cedh
Sanno
Sanno
Posts: 938
Joined: Tue Nov 14, 2006 10:30 am
Location: Tübingen, Germany
Contact:

Re: Two tools for conlangers

Post by Cedh »

Basilius wrote:
Corundum wrote:The Frequentizer would be even nicer if it reported rhyme frequencies,
and also if it could count the absence of codas as null codas.
And zero onsets?
A more sophisticated syllable model is definitely on my to-do-list. Also, consonant cluster frequencies.
Basilius wrote:Also, I suspect the Derivizer can be used to generate paradigms, with only minor modifications?
I'm not so sure about this one. It might work for a purely agglutinative language, but once you have sandhi effects at the stem-affix boundary you'd need a software where you can specify (morpho)phonological rules. A sound change applier such as GSCA, which supports output for multiple dialects that you can repurpose for different morphological forms of the same word, is probably a much better starting point for a paradigm generator.

Cedh
Sanno
Sanno
Posts: 938
Joined: Tue Nov 14, 2006 10:30 am
Location: Tübingen, Germany
Contact:

Re: Two tools for conlangers

Post by Cedh »

cedh audmanh wrote:
Basilius wrote:
Corundum wrote:The Frequentizer would be even nicer if it reported rhyme frequencies,
and also if it could count the absence of codas as null codas.
And zero onsets?
A more sophisticated syllable model is definitely on my to-do-list. Also, consonant cluster frequencies.
Done.

I've just uploaded version 0.3 of the Frequentizer, which has an improved syllable model that supports full and null onsets or codas, intervocalic consonants or consonant clusters, and can restrict the analysis to syllables in a certain position in the word. This means you can now ask for things like “vowels in word-medial syllables” or “syllable onsets in non-final syllables”. The program also presents the results in a shiny visual diagram now (made with the Chart.js library), and it has a proper open source license (Free BSD).
Last edited by Cedh on Fri Jun 07, 2013 2:05 pm, edited 1 time in total.

User avatar
KathTheDragon
Smeric
Smeric
Posts: 2139
Joined: Thu Apr 25, 2013 4:48 am
Location: Brittania

Re: Two tools for conlangers

Post by KathTheDragon »

Is there any way to get the Frequentizer to recognise a consonant as syllabic without using another character? For example, my conlang has syllabic r as an allophone to consonantal r. This is a pain when dealing with syllables, as the word for 'brother' is niptr.

User avatar
Sevly
Lebom
Lebom
Posts: 214
Joined: Sat Mar 31, 2007 10:50 pm
Location: (x, y, z, t)

Re: Two tools for conlangers

Post by Sevly »

The Frequentizer is looking really good. A simple request that would be helpful to me—could we have comments in the text corpus, something like having the parser ignore text from // to the end of the line?

Cedh
Sanno
Sanno
Posts: 938
Joined: Tue Nov 14, 2006 10:30 am
Location: Tübingen, Germany
Contact:

Re: Two tools for conlangers

Post by Cedh »

Comments in the corpus should be easy to implement; that's definitely something for the next version.

Syllabic consonants are more difficult, (a) because the program currently doesn't really know what to do with segments that are defined as both consonants and vowels (it currently treats them as vowels by accident), (b) because consonant syllabicity is very much context-dependent, and (c) because not all consonants can become syllabic, so I would probably need to teach the program something about the sonority hierarchy. It's a good idea for a future feature though.

Cedh
Sanno
Sanno
Posts: 938
Joined: Tue Nov 14, 2006 10:30 am
Location: Tübingen, Germany
Contact:

Re: Two tools for conlangers

Post by Cedh »

Version 0.4 of the Frequentizer is up. It can now give some word-level statistics, restrict the analysis to words of a certain length, determine the most commonly used bi- and trigrams within words, and report the frequency of syllable shapes of the type CV, CCV, CVC etc. And it supports comments in the corpus, as Sevly suggested. Everything from // to the next linebreak will be ignored.

Post Reply