Page 1 of 1

What should I store in a Lexicon Database?

Posted: Sun Oct 23, 2016 5:03 am
by Fixsme
Hi everyone.

I have a question about Lexicon databases. Recently, I have seen this post about viewtopic.php?f=7&t=44484 about building a database for comparing romance languages. So I had a question : what should one store in a database for linguistic comparison?

Should I keep only roots? All the patterns? Should I add Sandhi?

For instance, in Parisian French: "tout" (all), is pronounced [tu], but the feminine version is "toute", [tut], plural is "tous" [tus], plural feminine is "toutes" [tut].
But if I say "Tout animal doit boire" ("Every animal must drink") one would say [tut animal dwa bwaʁ] Should I keep this pronounciation of "tout?"

Thank you.

Re: What should I store in a Lexicon Database?

Posted: Sun Oct 23, 2016 7:10 am
by mèþru
Whenever making a database, the first thing you must know is what you're making it for. This will affect what type of information you put.

Re: What should I store in a Lexicon Database?

Posted: Sun Oct 23, 2016 3:00 pm
by Radagast the Third
The databases I am building are meant to be used for the purpose automated language classification and lexicostatistical dating. For this reason I am using a standardized wordlist (Sarah Gudschinsky's 200 word list) with some minor cultural modifications for the specific families. The type of vocabulary is meant to be basic but also sufficiently randomly chosen that it should be comparable in its representation of rates of language change across different language families.

Re: What should I store in a Lexicon Database?

Posted: Sun Oct 23, 2016 7:27 pm
by KathTheDragon
You do know that lexicostatistical dating and really any kind of glottochronology is (mostly) bogus, right? I mean, compare modern English and Icelandic with their 13th century ancestors. Hint: the Icelanders can still kinda understand written Old Icelandic.

Re: What should I store in a Lexicon Database?

Posted: Mon Oct 24, 2016 2:31 am
by Radagast the Third
Ten years ago I would have said the same myself and scoffed at the idea (if you search for old threads in this forum you can probably find me doing so). But lexicostatistics is currently experiencing a revival in historical linguistics, with the current fad for bayesian methods across the historical sciences. It is of course true that we know that different languages have progressed at different rates of lexical replacement, but there is something to be said for estimating historical depths through lexical replacement for languages that are known to be related and which have existed in comparable linguistic ecologies. In any case it is interesting to compare rates of lexical replacement across languages with known histories such as Romance and English, which is the reason I am trying to build those two databases to serve as a baseline. It might be interesting to make one for Norse languages as well.

Re: What should I store in a Lexicon Database?

Posted: Sun Oct 30, 2016 4:44 am
by Fixsme
That's interesting. I was trying to do basically the same thing as Radagast.
Except I was going first for a cognate alinment system. You give two lists of words and the machine tells you which part of the word corresponds and you can even compare it to a random model.

Grzegorz Kondrak has done some interesting things : http://webdocs.cs.ualberta.ca/~kondrak/ ... aline.html
Aline is still a deterministic algorithm, but it have some good results. Its limitations is the basically the alphabet. Computers are much better at crunching numbers than at managing alphabets.

Others are using markov chains which are a bayesian algorithms.