zompist bboard

THIS IS AN ARCHIVE ONLY - see Ephemera
It is currently Sun Sep 22, 2019 3:09 am

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 6 posts ] 
Author Message
PostPosted: Sun Oct 23, 2016 5:03 am 
Sanci
Sanci

Joined: Fri Jan 02, 2015 1:45 pm
Posts: 32
Location: Paris, France
Hi everyone.

I have a question about Lexicon databases. Recently, I have seen this post about http://www.incatena.org/viewtopic.php?f=7&t=44484 about building a database for comparing romance languages. So I had a question : what should one store in a database for linguistic comparison?

Should I keep only roots? All the patterns? Should I add Sandhi?

For instance, in Parisian French: "tout" (all), is pronounced [tu], but the feminine version is "toute", [tut], plural is "tous" [tus], plural feminine is "toutes" [tut].
But if I say "Tout animal doit boire" ("Every animal must drink") one would say [tut animal dwa bwaʁ] Should I keep this pronounciation of "tout?"

Thank you.


Top
 Profile  
 
PostPosted: Sun Oct 23, 2016 7:10 am 
Smeric
Smeric
User avatar

Joined: Thu Oct 29, 2015 6:44 am
Posts: 1998
Location: suburbs of Mrin
Whenever making a database, the first thing you must know is what you're making it for. This will affect what type of information you put.

_________________
ìtsanso, God In The Mountain, may our names inspire the deepest feelings of fear in urkos and all his ilk, for we have saved another man from his lies! I welcome back to the feast hall kal, who will never gamble again! May the eleven gods bless him!
kårroť


Top
 Profile  
 
PostPosted: Sun Oct 23, 2016 3:00 pm 
Sanci
Sanci

Joined: Sun Oct 16, 2016 10:08 am
Posts: 17
The databases I am building are meant to be used for the purpose automated language classification and lexicostatistical dating. For this reason I am using a standardized wordlist (Sarah Gudschinsky's 200 word list) with some minor cultural modifications for the specific families. The type of vocabulary is meant to be basic but also sufficiently randomly chosen that it should be comparable in its representation of rates of language change across different language families.


Top
 Profile  
 
PostPosted: Sun Oct 23, 2016 7:27 pm 
Smeric
Smeric
User avatar

Joined: Thu Apr 25, 2013 4:48 am
Posts: 2144
Location: Britannia
You do know that lexicostatistical dating and really any kind of glottochronology is (mostly) bogus, right? I mean, compare modern English and Icelandic with their 13th century ancestors. Hint: the Icelanders can still kinda understand written Old Icelandic.


Top
 Profile  
 
PostPosted: Mon Oct 24, 2016 2:31 am 
Sanci
Sanci

Joined: Sun Oct 16, 2016 10:08 am
Posts: 17
Ten years ago I would have said the same myself and scoffed at the idea (if you search for old threads in this forum you can probably find me doing so). But lexicostatistics is currently experiencing a revival in historical linguistics, with the current fad for bayesian methods across the historical sciences. It is of course true that we know that different languages have progressed at different rates of lexical replacement, but there is something to be said for estimating historical depths through lexical replacement for languages that are known to be related and which have existed in comparable linguistic ecologies. In any case it is interesting to compare rates of lexical replacement across languages with known histories such as Romance and English, which is the reason I am trying to build those two databases to serve as a baseline. It might be interesting to make one for Norse languages as well.


Top
 Profile  
 
PostPosted: Sun Oct 30, 2016 4:44 am 
Sanci
Sanci

Joined: Fri Jan 02, 2015 1:45 pm
Posts: 32
Location: Paris, France
That's interesting. I was trying to do basically the same thing as Radagast.
Except I was going first for a cognate alinment system. You give two lists of words and the machine tells you which part of the word corresponds and you can even compare it to a random model.

Grzegorz Kondrak has done some interesting things : http://webdocs.cs.ualberta.ca/~kondrak/cgi-bin/demo/aline/aline.html
Aline is still a deterministic algorithm, but it have some good results. Its limitations is the basically the alphabet. Computers are much better at crunching numbers than at managing alphabets.

Others are using markov chains which are a bayesian algorithms.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group