Methods of generating lexicon
- Dothraki_physicist
- Sanci
- Posts: 15
- Joined: Fri Oct 22, 2010 11:02 am
- Location: Republic of Cascadia
Methods of generating lexicon
What are your favorite ways of inventing words for your conlangs? Do you prefer to work with texts and invent words as needed, or come up with them randomly? I want to see the varying range of opinions.
Sheogorath wrote:You know, I was there for that whole sordid affair. Marvelous times! Butterflies, blood, a Fox and a severed head... Oh, and the cheese! To die for.
Re: Methods of generating lexicon
I use lists, come up with them randomly and invent them when I need them.
- blank stare II
- Lebom
- Posts: 127
- Joined: Sat Jun 18, 2005 4:34 pm
- Location: second to the right and straight on till morning
- Contact:
Re: Methods of generating lexicon
I often look up word lists in other languages, for instance baby name lists, then put them through sound changes to make them fit my phonology and phonotactics. For instance, if the baby name list gave me these three names
HINTO: means "blue hair" in Dakota Sioux.
HONOVI: claimed to mean "strong" or "strong deer" in Hopi.
HOTAH: means "gray" or "brown." in Sioux.
They would become
Ciŋkø "blue"
Cøŋobhi "Strong"
Cota? "brown"
I often come up with a long list of meaningless words, then when I need a word I scan the list to find one that seems like it fits the meaning.
HINTO: means "blue hair" in Dakota Sioux.
HONOVI: claimed to mean "strong" or "strong deer" in Hopi.
HOTAH: means "gray" or "brown." in Sioux.
They would become
Ciŋkø "blue"
Cøŋobhi "Strong"
Cota? "brown"
I often come up with a long list of meaningless words, then when I need a word I scan the list to find one that seems like it fits the meaning.
I get a big kick out of playing my own language game–it’s a unique thrill only conlangers know.
- J Burke
- J Burke
Re: Methods of generating lexicon
I generally come up with words as needed, though it can be a real pain. The plan is eventually (after the specific conlang has been tested well enough) that I'll figure out NSM in the lang, then move on to stuff like the Swaedish list, kinship lists, names and food lists. So, it's pretty much a hybrid system, with steady as-I-go building along with a relatively short period of from-list building.
I never really make up one word at a time, though. Like, I'll figure out that I need these dozen words for the next section of text, I'll go and craft a bunch of phoneme-sequences. Then, I'll try to associate the sound with the definition, generally ending up with extra sounds and glosses left over for the next iteration.
I never really make up one word at a time, though. Like, I'll figure out that I need these dozen words for the next section of text, I'll go and craft a bunch of phoneme-sequences. Then, I'll try to associate the sound with the definition, generally ending up with extra sounds and glosses left over for the next iteration.
My Conlang Site which pretty much only has Tayéin.
Still under construction, but at least I did some photoshop.
Still under construction, but at least I did some photoshop.
Re: Methods of generating lexicon
I have a few methods:
Firstly, I use a very simple list. Like a naming language level of simple.
Then, I start thinking about actual people's names in the language, and then assign the syllables meaning.
Then I combine these roots from both methods and just find out what I can make (in a "Doodle-God" style, just adding concepts together and thinking about what they could mean.)
Then finally I write lists of associated words. E.g. If I have the word for "building", I can make the words for "to build", "builder" etc.
I also come up with words as I need them when translating, but I don't count this as a different method, as I use the same derivation method as I do for the previous systems.
Oh, and sometimes when I am away form my files, I will write a bunch of random sounds down and then ask people who know baout my language (and the culture it is based around) what they think it would mean. For instance, I got the word "get" (timber for carpentry) from the word "ga" (wood) in this way.
Firstly, I use a very simple list. Like a naming language level of simple.
Then, I start thinking about actual people's names in the language, and then assign the syllables meaning.
Then I combine these roots from both methods and just find out what I can make (in a "Doodle-God" style, just adding concepts together and thinking about what they could mean.)
Then finally I write lists of associated words. E.g. If I have the word for "building", I can make the words for "to build", "builder" etc.
I also come up with words as I need them when translating, but I don't count this as a different method, as I use the same derivation method as I do for the previous systems.
Oh, and sometimes when I am away form my files, I will write a bunch of random sounds down and then ask people who know baout my language (and the culture it is based around) what they think it would mean. For instance, I got the word "get" (timber for carpentry) from the word "ga" (wood) in this way.
Re: Methods of generating lexicon
I used to come up with words on the fly, but more recently I have used awkwords to generate roots. I will plug in my phonotactics and generate 3000 roots, which I keep in a Google Doc to be assigned later. When I assign them, I italicize them in the doc (keeping them there in case I catch a whim to deliberately create a homophone).
George Corley
Producer and Moderating Host, Conlangery Podcast
Producer and Moderating Host, Conlangery Podcast
-
- Smeric
- Posts: 1258
- Joined: Mon Jun 01, 2009 3:07 pm
- Location: Miracle, Inc. Headquarters
- Contact:
Re: Methods of generating lexicon
I use Awkwords and remove the roots that are disallowed.
I haven't gotten used to the syntax rules of phoneme weight and stuff, so most of the roots in my conlangs have an inordinate amount of phonemes that should be more rare.
Awkwords is down right now, apparently someone didn't pay their hosting fee.
Does anyone know the person who created Awkwords?
I haven't gotten used to the syntax rules of phoneme weight and stuff, so most of the roots in my conlangs have an inordinate amount of phonemes that should be more rare.
Awkwords is down right now, apparently someone didn't pay their hosting fee.
Does anyone know the person who created Awkwords?
[bɹ̠ˤʷɪs.təɫ]
Nōn quālibet inīquā cupiditāte illectus hoc agō
Yo te pongo en tu lugar...
Taisc mach Daró
Nōn quālibet inīquā cupiditāte illectus hoc agō
Yo te pongo en tu lugar...
Taisc mach Daró
-
- Avisaru
- Posts: 704
- Joined: Fri Dec 03, 2010 9:41 am
- Location: NY, USA
Re: Methods of generating lexicon
I used awkwords for about 5 seconds before spending 5 minutes to write a better version of it in Python.
Re: Methods of generating lexicon
It's still here for the time being: http://bprhad.wz.cz/awkwords/
næn:älʉː
Re: Methods of generating lexicon
Would you care to share your version with us?Bob Johnson wrote:I used awkwords for about 5 seconds before spending 5 minutes to write a better version of it in Python.
George Corley
Producer and Moderating Host, Conlangery Podcast
Producer and Moderating Host, Conlangery Podcast
-
- Avisaru
- Posts: 704
- Joined: Fri Dec 03, 2010 9:41 am
- Location: NY, USA
Re: Methods of generating lexicon
It ain't pretty. Feel free to use it for whatever.
Then (in case you don't have enough wizardry to decode the regex) the spec file looks like this:
# at beginning of line is a comment, blank lines ignored, no / in right-hand sides. Deeply recursive symbols will stop after 25 expansions.
Symbols are case sensitive; they don't have to be 1 character but remember to flip the switch to More Magic if you try longer ones.
The numbers in parens are probabilities out of the total for that symbol; you can also put x/y/z on the right side which will all have the same probability, so switching from awkwords should be easy enough.
You then run it as ./gen.py specfile.txt 1000 W
to get 1000 'words' of various lengths as defined by the W symbol.
(Hopefully this doesn't crash the board this time)
Code: Select all
#!/usr/bin/python
import sys
import random
import re
from time import time
rx_line = re.compile("^\s*([^(=]+?)\s*(\((\s*[0-9]+\s*)\))?\s*=(.*?)\s+$")
main_list = []
main_dict = {}
main_tots = {}
def load(fname):
global main_list
sep = "/"
f = open(fname, 'r')
for line in f:
if line[0] in "#\n\r": continue
mat = rx_line.match(line)
if not mat:
print "# Could not parse line: " + line,
continue
key = mat.group(1)
if key == "/":
sep = mat.group(4)
continue
freq = int(mat.group(3) or "10")
exp = mat.group(4).split(sep)
for x in exp:
main_list += [[key, freq, x]]
f.close()
def build():
global main_dict
global main_tots
main_dict = {}
main_tots = {}
for x in main_list:
key = x[0]
try:
tot = main_tots[key]
list = main_dict[key]
except:
tot = 0
list = []
main_tots[key] = tot + x[1]
main_dict[key] = list + [[x[1], x[2]]]
def expand_one(key, depth):
r = random.randint(1,main_tots[key])
list = main_dict[key]
for x in list:
if r <= x[0]:
return expand(x[1], depth+1)
else:
r -= x[0]
raise Exception("internal inconsistency: randint out of range")
def expand(pat, depth):
if pat == "": return ""
if depth > 25: return "[...]"
ret = ""
while pat:
flag = 0
for x in main_dict.keys():
n = len(x)
if n > len(pat): continue
if pat[:n] == x:
ret += expand_one(x, depth);
pat = pat[n:]
flag = 1
break
if not flag:
ret += pat[0]
pat = pat[1:]
return ret
if len(sys.argv) < 4:
print "Usage: wordgen <spec file> <count> <word pattern>"
sys.exit(1)
load(sys.argv[1])
build()
for x in range(0,int(sys.argv[2])):
print expand(sys.argv[3],0) + "\t\t"
Code: Select all
C(128)=t
C(112)=p
C(101)=k
C( 91)=n
C( 86)=s
C( 83)=d
C( 82)=r
C( 74)=g
C( 59)=m
C( 46)=b
C( 34)=x
C( 25)=ŋ
C( 20)=v
C( 16)=z
C( 15)=f
C( 12)=h
C( 9)=w
C( 7)=j
V=i/e/a/o/u
F=C
I(100)=
I(900)=C
T(200)=
T(800)=F
S=IVT
W( 1)=V
W( 30)=CV
W( 19)=VF
W(100)=CVF
W(500)=SS
W(300)=SSS
W( 50)=SSSS
L1=S
L2=SS
L3=SSS
L4=SSSS
Symbols are case sensitive; they don't have to be 1 character but remember to flip the switch to More Magic if you try longer ones.
The numbers in parens are probabilities out of the total for that symbol; you can also put x/y/z on the right side which will all have the same probability, so switching from awkwords should be easy enough.
You then run it as ./gen.py specfile.txt 1000 W
to get 1000 'words' of various lengths as defined by the W symbol.
(Hopefully this doesn't crash the board this time)
Re: Methods of generating lexicon
That comes in different ways:lordofthestrings wrote:What are your favorite ways of inventing words for your conlangs? Do you prefer to work with texts and invent words as needed, or come up with them randomly? I want to see the varying range of opinions.
1. Translating texts or sentences One of my favorites, specially if the piece I translate is interesting. I search for texts (my conworld stuff, LOTR, etc.) or I take some translating challenges here, in the ZBB.
2. Expanding lexicons from a single word/root If I create a new word I can trace the way to its original root and, from that root I can create much more words. I really like that too, since etymology is one of my favourites parts in linguistics. So I put a lot of care on that when making my conlangs. For Hellesan I have its mother tongue, Peran, and the mother tongue of this one, Sate, which is the ultimate main source for Hellesan roots.
3. Asking myself how would the word for X in my conlang Many words in natural languages have odd etymologies or evolutions, and that's an important source of inspiration in my word cretaion.
About my methods to create new words:
1. Creating words from nothing One of the main sources for word creation. Put together some vowels and consonants in a nice sounding fashion and it's almost done; then I modify the word to fit the conlang's syllabic structure. It's merely an aesthetic thing.
2. Creating words from natural languages In my conworld there's an alternative PIE, so some roots are taken from it, and then slightly modified. So if I stole from natural languages, I take roots (adapted afterwards), not words.
3. Creating words randomly Basically using word generators, for toponymy.
4. Isolating word or roots from existing names That is, from personal names or placenames in a certain conlang I isolate parts of it (generally roots) and I assign them meanings. Sometimes I find that a segment of a proper name already coincides with an existing root, so the meaning of that name (or part of its meaning) it's already determinated.
5. Creating words from existing words This is making composed words, or derivating words from other words (postverbal nouns, etc.).
Un llapis mai dibuixa sense una mà.
Re: Methods of generating lexicon
For an a priori language, I use a generator (usually hand-coded for the specific language in Python) to get some basic roots, often tweaking them from the generated form to make them sound better, then usually start by assigning meanings for Swadesh list words and whatever other meanings I think of. Then when making daughter languages I try to derive as much words I can out of this small basic lexicon, though still adding words to the original language if needed.
Re: Methods of generating lexicon
I've worked on word derivation extensively, because it's my favourite part, so I generally try to combine elements of the things I have and see what they could mean, and how that meaning could have changed over time.
I sometimes use thematic word lists. I also often just create the words I need to form an interesting example sentence for the grammar document, and get carried away and derive a bunch of related words, or words that use the same morphemes.
All in all it's a long, slow process that involves a lot of love and care ;)
I sometimes use thematic word lists. I also often just create the words I need to form an interesting example sentence for the grammar document, and get carried away and derive a bunch of related words, or words that use the same morphemes.
All in all it's a long, slow process that involves a lot of love and care ;)
— o noth sidiritt Tormiott
Re: Methods of generating lexicon
for me there are several methods:
1) Word generators. I tend to use these only for conlangs that are mostly CV, which works because the great Ursprache of all my conlangs, Eʔoqaaniam, was almost entirely CV (except for a few syllabic consonants like the final -m in the name). Pabappa started out that way, but not much of the original word-gen content remains since Ive completely recast the language with a parent language of its own.
2) "babble". I look at something and try to give it a name without thinking about what Im saying. Usually I come up with something that sounds like baby talk,. I did this mostly when I was younger and most of my words seemed to begin with vowels and have at most one consonant. This is how I started workingwith tonal conlangs.
3) Tinkering with existing words to try to mend together word families ... e.g. like the time I discovered the word for "apron" was only one letter off of something that could be derived from "tied in back", so I changed it. More extreme examples of this are when I turned "ambo" into "sanala" in a notebook somewhere.
4) Mathematically adding words to other words in an alphabet in which all the letters have numeric values. I havent touched this since about 10 years ago because Ive gotten away from loglangs. But it was great when I had it. In the original Moonshine, *EVERY* word could be tied to every other word with math.
5) Loans, of course, and deriving from people's names into common terms, best if buried by long chains of sound changes, e.g. Magdalene > maudlin, Bethlehem > bedlam is great since half of the English speakers dont even realize they used to be proper names.
6) Borrowing from old dead conlangs like what I used to work on in the 90s and which cant be integrated into my new conworld. e.g. the name "Lypelpyp" comes from a list of names 2005 or so, much older than the language it now belongs to. Not much left of that though.
7) Also, dreams. Like the time I dreamt about a wonderful land called Qoqendoq and after sound changes it ended up as a word spelled kakancak.
1) Word generators. I tend to use these only for conlangs that are mostly CV, which works because the great Ursprache of all my conlangs, Eʔoqaaniam, was almost entirely CV (except for a few syllabic consonants like the final -m in the name). Pabappa started out that way, but not much of the original word-gen content remains since Ive completely recast the language with a parent language of its own.
2) "babble". I look at something and try to give it a name without thinking about what Im saying. Usually I come up with something that sounds like baby talk,. I did this mostly when I was younger and most of my words seemed to begin with vowels and have at most one consonant. This is how I started workingwith tonal conlangs.
3) Tinkering with existing words to try to mend together word families ... e.g. like the time I discovered the word for "apron" was only one letter off of something that could be derived from "tied in back", so I changed it. More extreme examples of this are when I turned "ambo" into "sanala" in a notebook somewhere.
4) Mathematically adding words to other words in an alphabet in which all the letters have numeric values. I havent touched this since about 10 years ago because Ive gotten away from loglangs. But it was great when I had it. In the original Moonshine, *EVERY* word could be tied to every other word with math.
5) Loans, of course, and deriving from people's names into common terms, best if buried by long chains of sound changes, e.g. Magdalene > maudlin, Bethlehem > bedlam is great since half of the English speakers dont even realize they used to be proper names.
6) Borrowing from old dead conlangs like what I used to work on in the 90s and which cant be integrated into my new conworld. e.g. the name "Lypelpyp" comes from a list of names 2005 or so, much older than the language it now belongs to. Not much left of that though.
7) Also, dreams. Like the time I dreamt about a wonderful land called Qoqendoq and after sound changes it ended up as a word spelled kakancak.
Sunàqʷa the Sea Lamprey says:
Re: Methods of generating lexicon
Just a couple of weeks ago I discovered the esoteric programming language Thue. It's basically just string rewriting; the program is an initial string and a set of rewrite rules, which are applied at random.
I've used it to make a Bengedian word generator. Hasn't seen much use yet, but yeah.
Here's a link to a nice Thue interpreter in Java.
I've used it to make a Bengedian word generator. Hasn't seen much use yet, but yeah.
Here's a link to a nice Thue interpreter in Java.
At, casteda dus des ometh coisen at tusta o diédem thum čisbugan. Ai, thiosa če sane búem mos sil, ne?
Also, I broke all your metal ropes and used them to feed the cheeseburgers. Yes, today just keeps getting better, doesn't it?
Also, I broke all your metal ropes and used them to feed the cheeseburgers. Yes, today just keeps getting better, doesn't it?
Re: Methods of generating lexicon
Just curious if anyone has the "better" version of Awkwords online somewhere. http://bprhad.wz.cz/awkwords/ still exists, but the site it says it has moved to is gone. Therefore I dont know if Im looking at the old version or the new, or something entirely different.
And now Sunàqʷa the Sea Lamprey with our weather report:
Re: Methods of generating lexicon
http://akana.conlang.org/tools/awkwords/SoapBubbles wrote:Just curious if anyone has the "better" version of Awkwords online somewhere. http://bprhad.wz.cz/awkwords/ still exists, but the site it says it has moved to is gone. Therefore I dont know if Im looking at the old version or the new, or something entirely different.
Blog: audmanh.wordpress.com
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ
Re: Methods of generating lexicon
When I was coming up with roots for a recent a priori conlang idea I had, I pulled out the Oxford Introduction to Proto-Indo-European and the Proto-Indo-European world, which has a nice thematic dictionary that takes up most of the book. Similar to the method blank stare II described, I took roots I liked and altered them to conform to my language's phonology and phonotactics. I didn't apply any consistent sound changes, since I wasn't actually trying to make anything like an Indo-European languages; I just sort of substituted sounds on the fly that were vaguely similar, and made changes when I didn't like the results.
The method's not a whole lot different from just taking a Buck List or something similar and making up words for each entry, but having the Indo-European roots was useful to me as inspiration, if that makes sense. The method does have some obvious weaknesses: for one, it only works if you use a lexicon from a language that's structurally similar to the one you're making (I knew I wanted monosyllabic roots but no tones, which was the main reason I went with Indo-European). Furthermore, unless you deliberately choose to combine or split vocabulary terms here or these, you're going to wind up with a language that assigns meanings to words in exactly the same way that the model you're using does, which is kind of boring.
The method's not a whole lot different from just taking a Buck List or something similar and making up words for each entry, but having the Indo-European roots was useful to me as inspiration, if that makes sense. The method does have some obvious weaknesses: for one, it only works if you use a lexicon from a language that's structurally similar to the one you're making (I knew I wanted monosyllabic roots but no tones, which was the main reason I went with Indo-European). Furthermore, unless you deliberately choose to combine or split vocabulary terms here or these, you're going to wind up with a language that assigns meanings to words in exactly the same way that the model you're using does, which is kind of boring.
- احمکي ارش-ھجن
- Avisaru
- Posts: 516
- Joined: Mon Dec 02, 2013 12:45 pm
Re: Methods of generating lexicon
I often make up words, but I constantly and carefully consider the etymology those words might have.
ʾAšol ḵavad pulqam ʾifbižen lav ʾifšimeḻ lit maseḡrad lav lit n͛ubad. ʾUpulasim ṗal sa-panžun lav sa-ḥadṇ lav ṗal šarmaḵeš lit ʾaẏṭ waẏyadanun wižqanam.
- Article 1 of the Universal Declaration of Human Rights.
- Article 1 of the Universal Declaration of Human Rights.