What should I store in a Lexicon Database?

Discussion of natural languages, or language in general.
Post Reply
Fixsme
Sanci
Sanci
Posts: 32
Joined: Fri Jan 02, 2015 1:45 pm
Location: Paris, France

What should I store in a Lexicon Database?

Post by Fixsme »

Hi everyone.

I have a question about Lexicon databases. Recently, I have seen this post about viewtopic.php?f=7&t=44484 about building a database for comparing romance languages. So I had a question : what should one store in a database for linguistic comparison?

Should I keep only roots? All the patterns? Should I add Sandhi?

For instance, in Parisian French: "tout" (all), is pronounced [tu], but the feminine version is "toute", [tut], plural is "tous" [tus], plural feminine is "toutes" [tut].
But if I say "Tout animal doit boire" ("Every animal must drink") one would say [tut animal dwa bwaʁ] Should I keep this pronounciation of "tout?"

Thank you.

User avatar
mèþru
Smeric
Smeric
Posts: 1984
Joined: Thu Oct 29, 2015 6:44 am
Location: suburbs of Mrin
Contact:

Re: What should I store in a Lexicon Database?

Post by mèþru »

Whenever making a database, the first thing you must know is what you're making it for. This will affect what type of information you put.
ìtsanso, God In The Mountain, may our names inspire the deepest feelings of fear in urkos and all his ilk, for we have saved another man from his lies! I welcome back to the feast hall kal, who will never gamble again! May the eleven gods bless him!
kårroť

Radagast the Third
Sanci
Sanci
Posts: 17
Joined: Sun Oct 16, 2016 10:08 am

Re: What should I store in a Lexicon Database?

Post by Radagast the Third »

The databases I am building are meant to be used for the purpose automated language classification and lexicostatistical dating. For this reason I am using a standardized wordlist (Sarah Gudschinsky's 200 word list) with some minor cultural modifications for the specific families. The type of vocabulary is meant to be basic but also sufficiently randomly chosen that it should be comparable in its representation of rates of language change across different language families.

User avatar
KathTheDragon
Smeric
Smeric
Posts: 2139
Joined: Thu Apr 25, 2013 4:48 am
Location: Brittania

Re: What should I store in a Lexicon Database?

Post by KathTheDragon »

You do know that lexicostatistical dating and really any kind of glottochronology is (mostly) bogus, right? I mean, compare modern English and Icelandic with their 13th century ancestors. Hint: the Icelanders can still kinda understand written Old Icelandic.

Radagast the Third
Sanci
Sanci
Posts: 17
Joined: Sun Oct 16, 2016 10:08 am

Re: What should I store in a Lexicon Database?

Post by Radagast the Third »

Ten years ago I would have said the same myself and scoffed at the idea (if you search for old threads in this forum you can probably find me doing so). But lexicostatistics is currently experiencing a revival in historical linguistics, with the current fad for bayesian methods across the historical sciences. It is of course true that we know that different languages have progressed at different rates of lexical replacement, but there is something to be said for estimating historical depths through lexical replacement for languages that are known to be related and which have existed in comparable linguistic ecologies. In any case it is interesting to compare rates of lexical replacement across languages with known histories such as Romance and English, which is the reason I am trying to build those two databases to serve as a baseline. It might be interesting to make one for Norse languages as well.

Fixsme
Sanci
Sanci
Posts: 32
Joined: Fri Jan 02, 2015 1:45 pm
Location: Paris, France

Re: What should I store in a Lexicon Database?

Post by Fixsme »

That's interesting. I was trying to do basically the same thing as Radagast.
Except I was going first for a cognate alinment system. You give two lists of words and the machine tells you which part of the word corresponds and you can even compare it to a random model.

Grzegorz Kondrak has done some interesting things : http://webdocs.cs.ualberta.ca/~kondrak/ ... aline.html
Aline is still a deterministic algorithm, but it have some good results. Its limitations is the basically the alphabet. Computers are much better at crunching numbers than at managing alphabets.

Others are using markov chains which are a bayesian algorithms.

Post Reply