Language data format

Discussion of natural languages, or language in general.
Post Reply
bob2356
Niš
Niš
Posts: 5
Joined: Thu Feb 09, 2012 2:45 pm
Location: Singapore

Language data format

Post by bob2356 »

Hello all!

I am currently investigating existing data formats to store information about languages (phonology, phonotactics, morphology, syntax, etc.). I was hoping I could find something that I could use to make a couple of tools while still being able to use other tools found on the internet.
It turns out I couldn't find such standard in the conlanging community. I investigated the Computerized Conlang Creator Project but there's very few and what there is on the website is more focused on file format than data format (the part on data format is more of a first draf proposal than a usable format).
I then ran into the Lexical Markup Framework. An academic effort that seem very extensive and complete. However, as with most academic work, no proper useable tool is available for it (I know about COLDIC but man do academic people have no sense of practicity and usability).

Do you know any other data format for describing languages? Do you use one?
I'd be curious to know if anyone has tried to use LMF. If so, please share your experience. It seems very complete but also possibly too focused on natural languages. I couldn't find a descriptor for the way sounds are produced, for example.

Any insight or experience sharing on the matter would be appreciated.

User avatar
Terra
Avisaru
Avisaru
Posts: 571
Joined: Tue May 24, 2005 10:01 am

Re: Language data format

Post by Terra »

I then ran into the Lexical Markup Framework. An academic effort that seem very extensive and complete. However, as with most academic work, no proper useable tool is available for it (I know about COLDIC but man do academic people have no sense of practicity and usability).
What's so bad about LMF? ( http://www.lexicalmarkupframework.org/ ) It seems to be just XML. There's tons of editors and tools and hooks for XML.

User avatar
vec
Avisaru
Avisaru
Posts: 639
Joined: Tue Sep 16, 2003 10:42 am
Location: Reykjavík, Iceland
Contact:

Re: Language data format

Post by vec »

The use of wikitables, organized horizontally from labial to glottal, vertically from stop to approximant, à la IPA, on Wikipedia, seems to be the most standardized way.
vec

bob2356
Niš
Niš
Posts: 5
Joined: Thu Feb 09, 2012 2:45 pm
Location: Singapore

Re: Language data format

Post by bob2356 »

I think I may not have been clear enough because of a poor choice of word. Sorry for that. I am actually looking for a data model, not a format. The fact that the model (or data structure if you will) is formatted in XML or YAML or JSON or binary is irrelevant to me (converters from any of these formats to another exist).
Terra wrote:What's so bad about LMF? ( http://www.lexicalmarkupframework.org/ ) It seems to be just XML. There's tons of editors and tools and hooks for XML.
Nothing is wrong with LMF! I find it very extensive and complete, as stated in my original post. I realize it wasn't very clear in my post but the fact that this data model is formatted in XML or whatever else isn't of importance to me.
I expressed concern that however complete it is, it is aimed at representing natural languages so it lacks some features that can be useful for more "exotic" conlangs (like a representation of how sounds are produced for sounds that are outside the realm of the IPA, and here again I am being restrictive because I'm only mentioning languages based on sound to carry information). And also, my initial goal was to find a data model that I could use with my own tools but that would still be compatible with existing tools from the conlanging community. Which LMF hasn't. But then again, I now understand that there's no such uniformized data model in the community (despite individual efforts at creating one that I could find in my research).
vec wrote:The use of wikitables, organized horizontally from labial to glottal, vertically from stop to approximant, à la IPA, on Wikipedia, seems to be the most standardized way.
Here again, it was just me not being clear with what I'm looking for. I hope my above retake on it will have cleared it up.

I think I'll stick with LMF for now. Anyone knows other data models for representing languages or has experience working with LMF?

Post Reply