Page 1 of 1
Methods of generating lexicon
Posted: Wed Oct 19, 2011 2:25 pm
by Dothraki_physicist
What are your favorite ways of inventing words for your conlangs? Do you prefer to work with texts and invent words as needed, or come up with them randomly? I want to see the varying range of opinions.
Re: Methods of generating lexicon
Posted: Wed Oct 19, 2011 2:33 pm
by Qwynegold
I use lists, come up with them randomly and invent them when I need them.
Re: Methods of generating lexicon
Posted: Wed Oct 19, 2011 3:45 pm
by blank stare II
I often look up word lists in other languages, for instance baby name lists, then put them through sound changes to make them fit my phonology and phonotactics. For instance, if the baby name list gave me these three names
HINTO: means "blue hair" in Dakota Sioux.
HONOVI: claimed to mean "strong" or "strong deer" in Hopi.
HOTAH: means "gray" or "brown." in Sioux.
They would become
Ciŋkø "blue"
Cøŋobhi "Strong"
Cota? "brown"
I often come up with a long list of meaningless words, then when I need a word I scan the list to find one that seems like it fits the meaning.
Re: Methods of generating lexicon
Posted: Wed Oct 19, 2011 5:37 pm
by Okuno
I generally come up with words as needed, though it can be a real pain. The plan is eventually (after the specific conlang has been tested well enough) that I'll figure out NSM in the lang, then move on to stuff like the Swaedish list, kinship lists, names and food lists. So, it's pretty much a hybrid system, with steady as-I-go building along with a relatively short period of from-list building.
I never really make up one word at a time, though. Like, I'll figure out that I need these dozen words for the next section of text, I'll go and craft a bunch of phoneme-sequences. Then, I'll try to associate the sound with the definition, generally ending up with extra sounds and glosses left over for the next iteration.
Re: Methods of generating lexicon
Posted: Wed Oct 19, 2011 7:09 pm
by Psykie
I have a few methods:
Firstly, I use a very simple list. Like a naming language level of simple.
Then, I start thinking about actual people's names in the language, and then assign the syllables meaning.
Then I combine these roots from both methods and just find out what I can make (in a "Doodle-God" style, just adding concepts together and thinking about what they could mean.)
Then finally I write lists of associated words. E.g. If I have the word for "building", I can make the words for "to build", "builder" etc.
I also come up with words as I need them when translating, but I don't count this as a different method, as I use the same derivation method as I do for the previous systems.
Oh, and sometimes when I am away form my files, I will write a bunch of random sounds down and then ask people who know baout my language (and the culture it is based around) what they think it would mean. For instance, I got the word "get" (timber for carpentry) from the word "ga" (wood) in this way.
Re: Methods of generating lexicon
Posted: Thu Oct 20, 2011 12:50 am
by Ollock
I used to come up with words on the fly, but more recently I have used awkwords to generate roots. I will plug in my phonotactics and generate 3000 roots, which I keep in a Google Doc to be assigned later. When I assign them, I italicize them in the doc (keeping them there in case I catch a whim to deliberately create a homophone).
Re: Methods of generating lexicon
Posted: Thu Oct 20, 2011 12:52 am
by Nannalu
I sometimes use awkwords but that's quite rare.
Re: Methods of generating lexicon
Posted: Thu Oct 20, 2011 12:57 am
by Bristel
I use Awkwords and remove the roots that are disallowed.
I haven't gotten used to the syntax rules of phoneme weight and stuff, so most of the roots in my conlangs have an inordinate amount of phonemes that should be more rare.
Awkwords is down right now, apparently someone didn't pay their hosting fee.
Does anyone know the person who created Awkwords?
Re: Methods of generating lexicon
Posted: Thu Oct 20, 2011 12:59 am
by Bob Johnson
I used awkwords for about 5 seconds before spending 5 minutes to write a better version of it in Python.
Re: Methods of generating lexicon
Posted: Thu Oct 20, 2011 1:00 am
by Nannalu
It's still here for the time being:
http://bprhad.wz.cz/awkwords/
Re: Methods of generating lexicon
Posted: Thu Oct 20, 2011 1:23 am
by Ollock
Bob Johnson wrote:I used awkwords for about 5 seconds before spending 5 minutes to write a better version of it in Python.
Would you care to share your version with us?
Re: Methods of generating lexicon
Posted: Thu Oct 20, 2011 2:15 am
by Bob Johnson
It ain't pretty. Feel free to use it for whatever.
Code: Select all
#!/usr/bin/python
import sys
import random
import re
from time import time
rx_line = re.compile("^\s*([^(=]+?)\s*(\((\s*[0-9]+\s*)\))?\s*=(.*?)\s+$")
main_list = []
main_dict = {}
main_tots = {}
def load(fname):
global main_list
sep = "/"
f = open(fname, 'r')
for line in f:
if line[0] in "#\n\r": continue
mat = rx_line.match(line)
if not mat:
print "# Could not parse line: " + line,
continue
key = mat.group(1)
if key == "/":
sep = mat.group(4)
continue
freq = int(mat.group(3) or "10")
exp = mat.group(4).split(sep)
for x in exp:
main_list += [[key, freq, x]]
f.close()
def build():
global main_dict
global main_tots
main_dict = {}
main_tots = {}
for x in main_list:
key = x[0]
try:
tot = main_tots[key]
list = main_dict[key]
except:
tot = 0
list = []
main_tots[key] = tot + x[1]
main_dict[key] = list + [[x[1], x[2]]]
def expand_one(key, depth):
r = random.randint(1,main_tots[key])
list = main_dict[key]
for x in list:
if r <= x[0]:
return expand(x[1], depth+1)
else:
r -= x[0]
raise Exception("internal inconsistency: randint out of range")
def expand(pat, depth):
if pat == "": return ""
if depth > 25: return "[...]"
ret = ""
while pat:
flag = 0
for x in main_dict.keys():
n = len(x)
if n > len(pat): continue
if pat[:n] == x:
ret += expand_one(x, depth);
pat = pat[n:]
flag = 1
break
if not flag:
ret += pat[0]
pat = pat[1:]
return ret
if len(sys.argv) < 4:
print "Usage: wordgen <spec file> <count> <word pattern>"
sys.exit(1)
load(sys.argv[1])
build()
for x in range(0,int(sys.argv[2])):
print expand(sys.argv[3],0) + "\t\t"
Then (in case you don't have enough wizardry to decode the regex) the spec file looks like this:
Code: Select all
C(128)=t
C(112)=p
C(101)=k
C( 91)=n
C( 86)=s
C( 83)=d
C( 82)=r
C( 74)=g
C( 59)=m
C( 46)=b
C( 34)=x
C( 25)=ŋ
C( 20)=v
C( 16)=z
C( 15)=f
C( 12)=h
C( 9)=w
C( 7)=j
V=i/e/a/o/u
F=C
I(100)=
I(900)=C
T(200)=
T(800)=F
S=IVT
W( 1)=V
W( 30)=CV
W( 19)=VF
W(100)=CVF
W(500)=SS
W(300)=SSS
W( 50)=SSSS
L1=S
L2=SS
L3=SSS
L4=SSSS
# at beginning of line is a comment, blank lines ignored, no / in right-hand sides. Deeply recursive symbols will stop after 25 expansions.
Symbols are case sensitive; they don't have to be 1 character but remember to flip the switch to
More Magic if you try longer ones.
The numbers in parens are probabilities out of the total for that symbol; you can also put x/y/z on the right side which will all have the same probability, so switching from awkwords should be easy enough.
You then run it as ./gen.py specfile.txt 1000 W
to get 1000 'words' of various lengths as defined by the W symbol.
(Hopefully this doesn't crash the board this time)
Re: Methods of generating lexicon
Posted: Thu Oct 20, 2011 9:04 am
by Izambri
lordofthestrings wrote:What are your favorite ways of inventing words for your conlangs? Do you prefer to work with texts and invent words as needed, or come up with them randomly? I want to see the varying range of opinions.
That comes in different ways:
1. Translating texts or sentences One of my favorites, specially if the piece I translate is interesting. I search for texts (my conworld stuff, LOTR, etc.) or I take some translating challenges here, in the ZBB.
2. Expanding lexicons from a single word/root If I create a new word I can trace the way to its original root and, from that root I can create much more words. I really like that too, since etymology is one of my favourites parts in linguistics. So I put a lot of care on that when making my conlangs. For Hellesan I have its mother tongue, Peran, and the mother tongue of this one, Sate, which is the ultimate main source for Hellesan roots.
3. Asking myself how would the word for X in my conlang Many words in natural languages have odd etymologies or evolutions, and that's an important source of inspiration in my word cretaion.
About my methods to create new words:
1. Creating words from nothing One of the main sources for word creation. Put together some vowels and consonants in a nice sounding fashion and it's almost done; then I modify the word to fit the conlang's syllabic structure. It's merely an aesthetic thing.
2. Creating words from natural languages In my conworld there's an alternative PIE, so some roots are taken from it, and then slightly modified. So if I stole from natural languages, I take roots (adapted afterwards), not words.
3. Creating words randomly Basically using word generators, for toponymy.
4. Isolating word or roots from existing names That is, from personal names or placenames in a certain conlang I isolate parts of it (generally roots) and I assign them meanings. Sometimes I find that a segment of a proper name already coincides with an existing root, so the meaning of that name (or part of its meaning) it's already determinated.
5. Creating words from existing words This is making composed words, or derivating words from other words (postverbal nouns, etc.).
Re: Methods of generating lexicon
Posted: Thu Oct 20, 2011 11:59 am
by Alces
For an a priori language, I use a generator (usually hand-coded for the specific language in Python) to get some basic roots, often tweaking them from the generated form to make them sound better, then usually start by assigning meanings for Swadesh list words and whatever other meanings I think of. Then when making daughter languages I try to derive as much words I can out of this small basic lexicon, though still adding words to the original language if needed.
Re: Methods of generating lexicon
Posted: Thu Oct 20, 2011 12:02 pm
by din
I've worked on word derivation extensively, because it's my favourite part, so I generally try to combine elements of the things I have and see what they could mean, and how that meaning could have changed over time.
I sometimes use thematic word lists. I also often just create the words I need to form an interesting example sentence for the grammar document, and get carried away and derive a bunch of related words, or words that use the same morphemes.
All in all it's a long, slow process that involves a lot of love and care ;)
Re: Methods of generating lexicon
Posted: Thu Oct 20, 2011 7:59 pm
by Soap
for me there are several methods:
1) Word generators. I tend to use these only for conlangs that are mostly CV, which works because the great Ursprache of all my conlangs, Eʔoqaaniam, was almost entirely CV (except for a few syllabic consonants like the final -m in the name). Pabappa started out that way, but not much of the original word-gen content remains since Ive completely recast the language with a parent language of its own.
2) "babble". I look at something and try to give it a name without thinking about what Im saying. Usually I come up with something that sounds like baby talk,. I did this mostly when I was younger and most of my words seemed to begin with vowels and have at most one consonant. This is how I started workingwith tonal conlangs.
3) Tinkering with existing words to try to mend together word families ... e.g. like the time I discovered the word for "apron" was only one letter off of something that could be derived from "tied in back", so I changed it. More extreme examples of this are when I turned "ambo" into "sanala" in a notebook somewhere.
4) Mathematically adding words to other words in an alphabet in which all the letters have numeric values. I havent touched this since about 10 years ago because Ive gotten away from loglangs. But it was great when I had it. In the original Moonshine, *EVERY* word could be tied to every other word with math.
5) Loans, of course, and deriving from people's names into common terms, best if buried by long chains of sound changes, e.g. Magdalene > maudlin, Bethlehem > bedlam is great since half of the English speakers dont even realize they used to be proper names.
6) Borrowing from old dead conlangs like what I used to work on in the 90s and which cant be integrated into my new conworld. e.g. the name "Lypelpyp" comes from a list of names 2005 or so, much older than the language it now belongs to. Not much left of that though.
7) Also, dreams. Like the time I dreamt about a wonderful land called Qoqendoq and after sound changes it ended up as a word spelled kakancak.
Re: Methods of generating lexicon
Posted: Thu Nov 03, 2011 5:32 pm
by Bedelato
Just a couple of weeks ago I discovered the esoteric programming language
Thue. It's basically just string rewriting; the program is an initial string and a set of rewrite rules, which are applied at random.
I've used it to make a Bengedian word generator. Hasn't seen much use yet, but yeah.
Here's a link to a nice Thue interpreter in Java.
Re: Methods of generating lexicon
Posted: Mon Aug 31, 2015 2:52 pm
by Pabappa
Just curious if anyone has the "better" version of Awkwords online somewhere.
http://bprhad.wz.cz/awkwords/ still exists, but the site it says it has moved to is gone. Therefore I dont know if Im looking at the old version or the new, or something entirely different.
Re: Methods of generating lexicon
Posted: Mon Aug 31, 2015 2:58 pm
by Cedh
SoapBubbles wrote:Just curious if anyone has the "better" version of Awkwords online somewhere.
http://bprhad.wz.cz/awkwords/ still exists, but the site it says it has moved to is gone. Therefore I dont know if Im looking at the old version or the new, or something entirely different.
http://akana.conlang.org/tools/awkwords/
Re: Methods of generating lexicon
Posted: Mon Aug 31, 2015 4:53 pm
by CatDoom
When I was coming up with roots for a recent a priori conlang idea I had, I pulled out the Oxford Introduction to Proto-Indo-European and the Proto-Indo-European world, which has a nice thematic dictionary that takes up most of the book. Similar to the method blank stare II described, I took roots I liked and altered them to conform to my language's phonology and phonotactics. I didn't apply any consistent sound changes, since I wasn't actually trying to make anything like an Indo-European languages; I just sort of substituted sounds on the fly that were vaguely similar, and made changes when I didn't like the results.
The method's not a whole lot different from just taking a Buck List or something similar and making up words for each entry, but having the Indo-European roots was useful to me as inspiration, if that makes sense. The method does have some obvious weaknesses: for one, it only works if you use a lexicon from a language that's structurally similar to the one you're making (I knew I wanted monosyllabic roots but no tones, which was the main reason I went with Indo-European). Furthermore, unless you deliberately choose to combine or split vocabulary terms here or these, you're going to wind up with a language that assigns meanings to words in exactly the same way that the model you're using does, which is kind of boring.
Re: Methods of generating lexicon
Posted: Mon Aug 31, 2015 7:15 pm
by احمکي ارش-ھجن
I often make up words, but I constantly and carefully consider the etymology those words might have.