Hi everyone.
I have a question about Lexicon databases. Recently, I have seen this post about viewtopic.php?f=7&t=44484 about building a database for comparing romance languages. So I had a question : what should one store in a database for linguistic comparison?
Should I keep only roots? All the patterns? Should I add Sandhi?
For instance, in Parisian French: "tout" (all), is pronounced [tu], but the feminine version is "toute", [tut], plural is "tous" [tus], plural feminine is "toutes" [tut].
But if I say "Tout animal doit boire" ("Every animal must drink") one would say [tut animal dwa bwaʁ] Should I keep this pronounciation of "tout?"
Thank you.
What should I store in a Lexicon Database?
Re: What should I store in a Lexicon Database?
Whenever making a database, the first thing you must know is what you're making it for. This will affect what type of information you put.
ìtsanso, God In The Mountain, may our names inspire the deepest feelings of fear in urkos and all his ilk, for we have saved another man from his lies! I welcome back to the feast hall kal, who will never gamble again! May the eleven gods bless him!
kårroť
kårroť
-
- Sanci
- Posts: 17
- Joined: Sun Oct 16, 2016 10:08 am
Re: What should I store in a Lexicon Database?
The databases I am building are meant to be used for the purpose automated language classification and lexicostatistical dating. For this reason I am using a standardized wordlist (Sarah Gudschinsky's 200 word list) with some minor cultural modifications for the specific families. The type of vocabulary is meant to be basic but also sufficiently randomly chosen that it should be comparable in its representation of rates of language change across different language families.
- KathTheDragon
- Smeric
- Posts: 2139
- Joined: Thu Apr 25, 2013 4:48 am
- Location: Brittania
Re: What should I store in a Lexicon Database?
You do know that lexicostatistical dating and really any kind of glottochronology is (mostly) bogus, right? I mean, compare modern English and Icelandic with their 13th century ancestors. Hint: the Icelanders can still kinda understand written Old Icelandic.
-
- Sanci
- Posts: 17
- Joined: Sun Oct 16, 2016 10:08 am
Re: What should I store in a Lexicon Database?
Ten years ago I would have said the same myself and scoffed at the idea (if you search for old threads in this forum you can probably find me doing so). But lexicostatistics is currently experiencing a revival in historical linguistics, with the current fad for bayesian methods across the historical sciences. It is of course true that we know that different languages have progressed at different rates of lexical replacement, but there is something to be said for estimating historical depths through lexical replacement for languages that are known to be related and which have existed in comparable linguistic ecologies. In any case it is interesting to compare rates of lexical replacement across languages with known histories such as Romance and English, which is the reason I am trying to build those two databases to serve as a baseline. It might be interesting to make one for Norse languages as well.
Re: What should I store in a Lexicon Database?
That's interesting. I was trying to do basically the same thing as Radagast.
Except I was going first for a cognate alinment system. You give two lists of words and the machine tells you which part of the word corresponds and you can even compare it to a random model.
Grzegorz Kondrak has done some interesting things : http://webdocs.cs.ualberta.ca/~kondrak/ ... aline.html
Aline is still a deterministic algorithm, but it have some good results. Its limitations is the basically the alphabet. Computers are much better at crunching numbers than at managing alphabets.
Others are using markov chains which are a bayesian algorithms.
Except I was going first for a cognate alinment system. You give two lists of words and the machine tells you which part of the word corresponds and you can even compare it to a random model.
Grzegorz Kondrak has done some interesting things : http://webdocs.cs.ualberta.ca/~kondrak/ ... aline.html
Aline is still a deterministic algorithm, but it have some good results. Its limitations is the basically the alphabet. Computers are much better at crunching numbers than at managing alphabets.
Others are using markov chains which are a bayesian algorithms.