Methods of generating lexicon

Substantial postings about constructed languages and constructed worlds in general. Good place to mention your own or evaluate someone else's. Put quick questions in C&C Quickies instead.
Post Reply
User avatar
Dothraki_physicist
Sanci
Sanci
Posts: 15
Joined: Fri Oct 22, 2010 11:02 am
Location: Republic of Cascadia

Methods of generating lexicon

Post by Dothraki_physicist »

What are your favorite ways of inventing words for your conlangs? Do you prefer to work with texts and invent words as needed, or come up with them randomly? I want to see the varying range of opinions.
Sheogorath wrote:You know, I was there for that whole sordid affair. Marvelous times! Butterflies, blood, a Fox and a severed head... Oh, and the cheese! To die for.

User avatar
Qwynegold
Smeric
Smeric
Posts: 1606
Joined: Thu May 24, 2007 11:34 pm
Location: Stockholm

Re: Methods of generating lexicon

Post by Qwynegold »

I use lists, come up with them randomly and invent them when I need them.
Image
My most recent quiz:
Eurovision Song Contest 2018

User avatar
blank stare II
Lebom
Lebom
Posts: 127
Joined: Sat Jun 18, 2005 4:34 pm
Location: second to the right and straight on till morning
Contact:

Re: Methods of generating lexicon

Post by blank stare II »

I often look up word lists in other languages, for instance baby name lists, then put them through sound changes to make them fit my phonology and phonotactics. For instance, if the baby name list gave me these three names

HINTO: means "blue hair" in Dakota Sioux.
HONOVI: claimed to mean "strong" or "strong deer" in Hopi.
HOTAH: means "gray" or "brown." in Sioux.

They would become
Ciŋkø "blue"
Cøŋobhi "Strong"
Cota? "brown"

I often come up with a long list of meaningless words, then when I need a word I scan the list to find one that seems like it fits the meaning.
I get a big kick out of playing my own language game–it’s a unique thrill only conlangers know.
- J Burke

User avatar
Okuno
Sanci
Sanci
Posts: 31
Joined: Fri Aug 05, 2011 12:29 pm

Re: Methods of generating lexicon

Post by Okuno »

I generally come up with words as needed, though it can be a real pain. The plan is eventually (after the specific conlang has been tested well enough) that I'll figure out NSM in the lang, then move on to stuff like the Swaedish list, kinship lists, names and food lists. So, it's pretty much a hybrid system, with steady as-I-go building along with a relatively short period of from-list building.

I never really make up one word at a time, though. Like, I'll figure out that I need these dozen words for the next section of text, I'll go and craft a bunch of phoneme-sequences. Then, I'll try to associate the sound with the definition, generally ending up with extra sounds and glosses left over for the next iteration.
My Conlang Site which pretty much only has Tayéin.
Still under construction, but at least I did some photoshop.

User avatar
Psykie
Niš
Niš
Posts: 8
Joined: Sat Dec 29, 2007 1:01 pm
Location: Balwyn, VIC, Australia
Contact:

Re: Methods of generating lexicon

Post by Psykie »

I have a few methods:

Firstly, I use a very simple list. Like a naming language level of simple.

Then, I start thinking about actual people's names in the language, and then assign the syllables meaning.

Then I combine these roots from both methods and just find out what I can make (in a "Doodle-God" style, just adding concepts together and thinking about what they could mean.)

Then finally I write lists of associated words. E.g. If I have the word for "building", I can make the words for "to build", "builder" etc.

I also come up with words as I need them when translating, but I don't count this as a different method, as I use the same derivation method as I do for the previous systems.

Oh, and sometimes when I am away form my files, I will write a bunch of random sounds down and then ask people who know baout my language (and the culture it is based around) what they think it would mean. For instance, I got the word "get" (timber for carpentry) from the word "ga" (wood) in this way.

Ollock
Avisaru
Avisaru
Posts: 523
Joined: Sun Apr 30, 2006 3:04 pm
Contact:

Re: Methods of generating lexicon

Post by Ollock »

I used to come up with words on the fly, but more recently I have used awkwords to generate roots. I will plug in my phonotactics and generate 3000 roots, which I keep in a Google Doc to be assigned later. When I assign them, I italicize them in the doc (keeping them there in case I catch a whim to deliberately create a homophone).
George Corley
Producer and Moderating Host, Conlangery Podcast

User avatar
Nannalu
Avisaru
Avisaru
Posts: 698
Joined: Sun Aug 15, 2010 5:00 pm
Location: United Kingdom

Re: Methods of generating lexicon

Post by Nannalu »

I sometimes use awkwords but that's quite rare.
næn:älʉː

Bristel
Smeric
Smeric
Posts: 1258
Joined: Mon Jun 01, 2009 3:07 pm
Location: Miracle, Inc. Headquarters
Contact:

Re: Methods of generating lexicon

Post by Bristel »

I use Awkwords and remove the roots that are disallowed.

I haven't gotten used to the syntax rules of phoneme weight and stuff, so most of the roots in my conlangs have an inordinate amount of phonemes that should be more rare.

Awkwords is down right now, apparently someone didn't pay their hosting fee.

Does anyone know the person who created Awkwords?
[bɹ̠ˤʷɪs.təɫ]
Nōn quālibet inīquā cupiditāte illectus hoc agō
Yo te pongo en tu lugar...
Taisc mach Daró

Bob Johnson
Avisaru
Avisaru
Posts: 704
Joined: Fri Dec 03, 2010 9:41 am
Location: NY, USA

Re: Methods of generating lexicon

Post by Bob Johnson »

I used awkwords for about 5 seconds before spending 5 minutes to write a better version of it in Python.

User avatar
Nannalu
Avisaru
Avisaru
Posts: 698
Joined: Sun Aug 15, 2010 5:00 pm
Location: United Kingdom

Re: Methods of generating lexicon

Post by Nannalu »

It's still here for the time being: http://bprhad.wz.cz/awkwords/
næn:älʉː

Ollock
Avisaru
Avisaru
Posts: 523
Joined: Sun Apr 30, 2006 3:04 pm
Contact:

Re: Methods of generating lexicon

Post by Ollock »

Bob Johnson wrote:I used awkwords for about 5 seconds before spending 5 minutes to write a better version of it in Python.
Would you care to share your version with us?
George Corley
Producer and Moderating Host, Conlangery Podcast

Bob Johnson
Avisaru
Avisaru
Posts: 704
Joined: Fri Dec 03, 2010 9:41 am
Location: NY, USA

Re: Methods of generating lexicon

Post by Bob Johnson »

It ain't pretty. Feel free to use it for whatever.

Code: Select all

#!/usr/bin/python

import sys
import random
import re
from time import time

rx_line = re.compile("^\s*([^(=]+?)\s*(\((\s*[0-9]+\s*)\))?\s*=(.*?)\s+$")
main_list = []
main_dict = {}
main_tots = {}

def load(fname):
	global main_list
	sep = "/"

	f = open(fname, 'r')

	for line in f:
		if line[0] in "#\n\r": continue

		mat = rx_line.match(line)
		if not mat:
			print "# Could not parse line: " + line,
			continue

		key = mat.group(1)
		if key == "/":
			sep = mat.group(4)
			continue
		freq = int(mat.group(3) or "10")
		exp = mat.group(4).split(sep)

		for x in exp:
			main_list += [[key, freq, x]]

	f.close()

def build():
	global main_dict
	global main_tots

	main_dict = {}
	main_tots = {}

	for x in main_list:
		key = x[0]

		try:
			tot = main_tots[key]
			list = main_dict[key]
		except:
			tot = 0
			list = []

		main_tots[key] = tot + x[1]
		main_dict[key] = list + [[x[1], x[2]]]

def expand_one(key, depth):
	r = random.randint(1,main_tots[key])
	list = main_dict[key]
	
	for x in list:
		if r <= x[0]:
			return expand(x[1], depth+1)
		else:
			r -= x[0]
	
	raise Exception("internal inconsistency: randint out of range")

def expand(pat, depth):
	if pat == "": return ""
	if depth > 25: return "[...]"

	ret = ""

	while pat:
		flag = 0

		for x in main_dict.keys():
			n = len(x)

			if n > len(pat): continue
			if pat[:n] == x:
				ret += expand_one(x, depth);
				pat = pat[n:]
				flag = 1
				break

		if not flag:
			ret += pat[0]
			pat = pat[1:]

	return ret

if len(sys.argv) < 4:
	print "Usage: wordgen <spec file> <count> <word pattern>"
	sys.exit(1)

load(sys.argv[1])
build()

for x in range(0,int(sys.argv[2])):
	print expand(sys.argv[3],0) + "\t\t"
Then (in case you don't have enough wizardry to decode the regex) the spec file looks like this:

Code: Select all

C(128)=t
C(112)=p
C(101)=k
C( 91)=n
C( 86)=s
C( 83)=d
C( 82)=r
C( 74)=g
C( 59)=m
C( 46)=b
C( 34)=x
C( 25)=ŋ
C( 20)=v
C( 16)=z
C( 15)=f
C( 12)=h
C(  9)=w
C(  7)=j

V=i/e/a/o/u

F=C

I(100)=
I(900)=C
T(200)=
T(800)=F
S=IVT

W(  1)=V
W( 30)=CV
W( 19)=VF
W(100)=CVF
W(500)=SS
W(300)=SSS
W( 50)=SSSS

L1=S
L2=SS
L3=SSS
L4=SSSS
# at beginning of line is a comment, blank lines ignored, no / in right-hand sides. Deeply recursive symbols will stop after 25 expansions.

Symbols are case sensitive; they don't have to be 1 character but remember to flip the switch to More Magic if you try longer ones.

The numbers in parens are probabilities out of the total for that symbol; you can also put x/y/z on the right side which will all have the same probability, so switching from awkwords should be easy enough.

You then run it as ./gen.py specfile.txt 1000 W
to get 1000 'words' of various lengths as defined by the W symbol.

(Hopefully this doesn't crash the board this time)

User avatar
Izambri
Smeric
Smeric
Posts: 1556
Joined: Sun Apr 04, 2004 4:27 pm
Location: Catalonia

Re: Methods of generating lexicon

Post by Izambri »

lordofthestrings wrote:What are your favorite ways of inventing words for your conlangs? Do you prefer to work with texts and invent words as needed, or come up with them randomly? I want to see the varying range of opinions.
That comes in different ways:

1. Translating texts or sentences One of my favorites, specially if the piece I translate is interesting. I search for texts (my conworld stuff, LOTR, etc.) or I take some translating challenges here, in the ZBB.

2. Expanding lexicons from a single word/root If I create a new word I can trace the way to its original root and, from that root I can create much more words. I really like that too, since etymology is one of my favourites parts in linguistics. So I put a lot of care on that when making my conlangs. For Hellesan I have its mother tongue, Peran, and the mother tongue of this one, Sate, which is the ultimate main source for Hellesan roots.

3. Asking myself how would the word for X in my conlang Many words in natural languages have odd etymologies or evolutions, and that's an important source of inspiration in my word cretaion.

About my methods to create new words:

1. Creating words from nothing One of the main sources for word creation. Put together some vowels and consonants in a nice sounding fashion and it's almost done; then I modify the word to fit the conlang's syllabic structure. It's merely an aesthetic thing.

2. Creating words from natural languages In my conworld there's an alternative PIE, so some roots are taken from it, and then slightly modified. So if I stole from natural languages, I take roots (adapted afterwards), not words.

3. Creating words randomly Basically using word generators, for toponymy.

4. Isolating word or roots from existing names That is, from personal names or placenames in a certain conlang I isolate parts of it (generally roots) and I assign them meanings. Sometimes I find that a segment of a proper name already coincides with an existing root, so the meaning of that name (or part of its meaning) it's already determinated.

5. Creating words from existing words This is making composed words, or derivating words from other words (postverbal nouns, etc.).
Un llapis mai dibuixa sense una mà.

User avatar
Alces
Lebom
Lebom
Posts: 87
Joined: Sun Jan 03, 2010 6:09 pm
Location: Merseyside, England, UK
Contact:

Re: Methods of generating lexicon

Post by Alces »

For an a priori language, I use a generator (usually hand-coded for the specific language in Python) to get some basic roots, often tweaking them from the generated form to make them sound better, then usually start by assigning meanings for Swadesh list words and whatever other meanings I think of. Then when making daughter languages I try to derive as much words I can out of this small basic lexicon, though still adding words to the original language if needed.

User avatar
din
Avisaru
Avisaru
Posts: 779
Joined: Wed Jan 10, 2007 10:02 pm
Location: Brussels

Re: Methods of generating lexicon

Post by din »

I've worked on word derivation extensively, because it's my favourite part, so I generally try to combine elements of the things I have and see what they could mean, and how that meaning could have changed over time.

I sometimes use thematic word lists. I also often just create the words I need to form an interesting example sentence for the grammar document, and get carried away and derive a bunch of related words, or words that use the same morphemes.

All in all it's a long, slow process that involves a lot of love and care ;)
— o noth sidiritt Tormiott

User avatar
Soap
Smeric
Smeric
Posts: 1228
Joined: Sun Feb 16, 2003 2:57 pm
Location: Scattered disc
Contact:

Re: Methods of generating lexicon

Post by Soap »

for me there are several methods:

1) Word generators. I tend to use these only for conlangs that are mostly CV, which works because the great Ursprache of all my conlangs, Eʔoqaaniam, was almost entirely CV (except for a few syllabic consonants like the final -m in the name). Pabappa started out that way, but not much of the original word-gen content remains since Ive completely recast the language with a parent language of its own.

2) "babble". I look at something and try to give it a name without thinking about what Im saying. Usually I come up with something that sounds like baby talk,. I did this mostly when I was younger and most of my words seemed to begin with vowels and have at most one consonant. This is how I started workingwith tonal conlangs.

3) Tinkering with existing words to try to mend together word families ... e.g. like the time I discovered the word for "apron" was only one letter off of something that could be derived from "tied in back", so I changed it. More extreme examples of this are when I turned "ambo" into "sanala" in a notebook somewhere.

4) Mathematically adding words to other words in an alphabet in which all the letters have numeric values. I havent touched this since about 10 years ago because Ive gotten away from loglangs. But it was great when I had it. In the original Moonshine, *EVERY* word could be tied to every other word with math.

5) Loans, of course, and deriving from people's names into common terms, best if buried by long chains of sound changes, e.g. Magdalene > maudlin, Bethlehem > bedlam is great since half of the English speakers dont even realize they used to be proper names.

6) Borrowing from old dead conlangs like what I used to work on in the 90s and which cant be integrated into my new conworld. e.g. the name "Lypelpyp" comes from a list of names 2005 or so, much older than the language it now belongs to. Not much left of that though.

7) Also, dreams. Like the time I dreamt about a wonderful land called Qoqendoq and after sound changes it ended up as a word spelled kakancak.
Sunàqʷa the Sea Lamprey says:
Image

Bedelato
Lebom
Lebom
Posts: 193
Joined: Sat Oct 30, 2010 1:13 pm
Location: Another place

Re: Methods of generating lexicon

Post by Bedelato »

Just a couple of weeks ago I discovered the esoteric programming language Thue. It's basically just string rewriting; the program is an initial string and a set of rewrite rules, which are applied at random.

I've used it to make a Bengedian word generator. Hasn't seen much use yet, but yeah.

Here's a link to a nice Thue interpreter in Java.
At, casteda dus des ometh coisen at tusta o diédem thum čisbugan. Ai, thiosa če sane búem mos sil, ne?
Also, I broke all your metal ropes and used them to feed the cheeseburgers. Yes, today just keeps getting better, doesn't it?

User avatar
Pabappa
Lebom
Lebom
Posts: 210
Joined: Tue Sep 14, 2004 5:51 pm
Location: the Peyron Apartments
Contact:

Re: Methods of generating lexicon

Post by Pabappa »

Just curious if anyone has the "better" version of Awkwords online somewhere. http://bprhad.wz.cz/awkwords/ still exists, but the site it says it has moved to is gone. Therefore I dont know if Im looking at the old version or the new, or something entirely different.
And now Sunàqʷa the Sea Lamprey with our weather report:
Image

Cedh
Sanno
Sanno
Posts: 938
Joined: Tue Nov 14, 2006 10:30 am
Location: Tübingen, Germany
Contact:

Re: Methods of generating lexicon

Post by Cedh »

SoapBubbles wrote:Just curious if anyone has the "better" version of Awkwords online somewhere. http://bprhad.wz.cz/awkwords/ still exists, but the site it says it has moved to is gone. Therefore I dont know if Im looking at the old version or the new, or something entirely different.
http://akana.conlang.org/tools/awkwords/

CatDoom
Avisaru
Avisaru
Posts: 739
Joined: Fri Sep 20, 2013 1:12 am

Re: Methods of generating lexicon

Post by CatDoom »

When I was coming up with roots for a recent a priori conlang idea I had, I pulled out the Oxford Introduction to Proto-Indo-European and the Proto-Indo-European world, which has a nice thematic dictionary that takes up most of the book. Similar to the method blank stare II described, I took roots I liked and altered them to conform to my language's phonology and phonotactics. I didn't apply any consistent sound changes, since I wasn't actually trying to make anything like an Indo-European languages; I just sort of substituted sounds on the fly that were vaguely similar, and made changes when I didn't like the results.

The method's not a whole lot different from just taking a Buck List or something similar and making up words for each entry, but having the Indo-European roots was useful to me as inspiration, if that makes sense. The method does have some obvious weaknesses: for one, it only works if you use a lexicon from a language that's structurally similar to the one you're making (I knew I wanted monosyllabic roots but no tones, which was the main reason I went with Indo-European). Furthermore, unless you deliberately choose to combine or split vocabulary terms here or these, you're going to wind up with a language that assigns meanings to words in exactly the same way that the model you're using does, which is kind of boring.

User avatar
احمکي ارش-ھجن
Avisaru
Avisaru
Posts: 516
Joined: Mon Dec 02, 2013 12:45 pm

Re: Methods of generating lexicon

Post by احمکي ارش-ھجن »

I often make up words, but I constantly and carefully consider the etymology those words might have.
ʾAšol ḵavad pulqam ʾifbižen lav ʾifšimeḻ lit maseḡrad lav lit n͛ubad. ʾUpulasim ṗal sa-panžun lav sa-ḥadṇ lav ṗal šarmaḵeš lit ʾaẏṭ waẏyadanun wižqanam.
- Article 1 of the Universal Declaration of Human Rights.

Post Reply