zompist bboard

THIS IS AN ARCHIVE ONLY - see Ephemera
It is currently Thu Jul 18, 2019 2:31 pm

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 10 posts ] 
Author Message
PostPosted: Tue Mar 08, 2011 2:35 pm 
Smeric
Smeric
User avatar

Joined: Sat Apr 12, 2003 1:48 pm
Posts: 1128
Location: Litareng, Keynami
Hi,

I just want to ask whether anyone of you knows or has written a program to generate interlinears from text input using a given template. To illustrate what I'd like to be able to do:

Enter:

    Ø edareng telbyamanas.
    Ø eda-reng telb-yam-an-as
    COP that-A.INAN show-PTCP-NMLZ-P
    This is an example.

Get out:

Code:
<example src="John Doe (2011)">
    <gloss>
        <word>Ø</word>
        <morphemes>Ø</morphemes>
        <analysis>COP</analysis>
        <translation>be</translation>
    </gloss>
    <gloss>
        <word>edareng</word>
        <morphemes>eda-reng</morphemes>
        <analysis>that-A.INAN</analysis>
        <translation>this</translation>
    </gloss>
    <gloss>
        <word>telbyamanas</word>
        <morphemes>telb-yam-an-as</morphemes>
        <analysis>show-PTCP-NMLZ-P</analysis>
        <translation>example</translation>
    </gloss>
    <whole>This is an example.</whole>
</example>


given the template you defined.

This'd be incredibly helpful for longer texts, because to make interlinears of them that can be manipulated with HTML and CSS always takes me ages.

I guess I could write such a program myself (I suppose it'd be rather trivial with either Python or PERL, but I don't know either), but before I reinvent the wheel I wanted to ask whether someone has already done this work.


Top
 Profile  
 
PostPosted: Tue Mar 08, 2011 3:48 pm 
Sanno
Sanno

Joined: Tue Nov 14, 2006 10:30 am
Posts: 939
Location: Tübingen, Germany
I've made something similar as a template for MediaWiki. See http://www.frathwiki.com/Template:Gl; it also requires http://www.frathwiki.com/Template:Glend and some CSS. For formatting the input, I paste the plain-text interlinear into an empty OpenOffice Calc spreadsheet with {space} as a separator, copy&paste the spreadsheet contents into the second tab with the Transpose option checked, rearrange the sentences so they're all below each other, copy&paste the result into my plain text editor, do three or four rounds of search&replace to create the wikicode, and finally copy&paste the source into the wiki. This works for me because I usually publish my conlanging stuff in a wiki context, but of course it's still not quite automatic.

The closest thing to what you're asking for that I know of is Arthaey's Interlinearizer, which is written in Perl and generates HTML output. It could probably be adapted to an XML format like the one in your example quite easily.

_________________
Blog: audmanh.wordpress.com
Conlangs: Ronc Tyu | Buruya Nzaysa | Doayâu | Tmaśareʔ


Top
 Profile  
 
PostPosted: Sun Apr 10, 2011 3:38 pm 
Avisaru
Avisaru
User avatar

Joined: Sat Oct 28, 2006 9:01 pm
Posts: 387
Location: Hyperborea
Didn't we do something similar in another thread a while ago? Nah, I can't find it either.
Anyway, I can make you a Perl program, if you like. Shouldn't be very hard.

I just need to know:
- how you intend to get those word-for-word translations; maybe there should be another line for those - or do you intend to use a dictionary file?
- what should be in the src part
- how you want to separate multiple sentences, since presumably you don't want the program to handle just one sentence; the easiest from programming perspective would probably be to just write the next one directly after, but maybe an empty line in between would make it easier to read

_________________
My conlang lessons
My problem blog


Top
 Profile  
 
PostPosted: Sun Apr 10, 2011 4:20 pm 
Smeric
Smeric
User avatar

Joined: Sat Apr 12, 2003 1:48 pm
Posts: 1128
Location: Litareng, Keynami
Well, ideally it would get the word-by-word translation from my MySQL dictionary, but I wouldn't want to have the program analyze the morphemes for me automatically. That'd probably make it too complicated. An input as in the example in my OP would be just as good, I guess. You'll probably have an input of several lines anyway, so I guess it might be best to delimit sentences with something like \n\n or \n--\n:

Code:
Zees eez sentence wun.
zees ee-z sentence wun-Ø
this is-BLAH sentence one-FOO
This is sentence one.
(--)
Zees eez sentence tooh.
zees ee-z sentence too-h
this is-BLAH sentence two-BAR
This is sentence two.


Top
 Profile  
 
PostPosted: Sun Apr 10, 2011 5:10 pm 
Sumerul
Sumerul
User avatar

Joined: Mon Dec 22, 2003 12:35 pm
Posts: 3600
Location: Tokyo
the other thing i think you'd want is instead of
Code:
        <morphemes>telb-yam-an-as</morphemes>

you should have something more like
Code:
        <morphemebreakdown>
         <morpheme>telb</morpheme>
         <morpheme>yam</morpheme>
         <morpheme>an</morpheme>
         <morpheme>as</morpheme>
        </morphemebreakdown>


I don't honestly see how having it in XML format does you any good, though. I don't really get what it's for.


Top
 Profile  
 
PostPosted: Sun Apr 10, 2011 5:48 pm 
Avisaru
Avisaru
User avatar

Joined: Sat Oct 28, 2006 9:01 pm
Posts: 387
Location: Hyperborea
Finlay: Yep, I though about that too - that makes sense if he wants an XML database, but it seems he's just looking for a neat HTML file for web use, or something.

How about this?
I've skipped the word-for-word translations for now. The rest seems to work.
Code:
#!/usr/bin/perl
open IN,shift;
open OUT,'>',shift;
print OUT '<example src="">',"\n";
while(<IN>){
@word=split / /,<IN>;
@morph=split / /,<IN>;
@anal=split / /,<IN>;
$trans=<IN>;
chomp($trans);
for(@word){
$morph=shift @morph;
$anal=shift @anal;
chomp($_,$morph,$anal);
print OUT "\t<gloss>\n\t\t<word>$_</word>\n\t\t<morphemes>$morph</morphemes>\n\t\t<analysis>$anal</analysis>\n\t</gloss>\n";
}
print OUT "\t<whole>$trans</whole>\n";
}
print OUT '</example>';


Usage: progname inputfile outputfile
The infile should contain the input on the format you gave. There should be a separator; any line will do, but not empty. The first line should also be a separator, but not the last line.
The outfile might have to exist and be empty. Or something.
Give it a try.

_________________
My conlang lessons
My problem blog


Top
 Profile  
 
PostPosted: Sun Apr 10, 2011 5:54 pm 
Smeric
Smeric
User avatar

Joined: Sat Apr 12, 2003 1:48 pm
Posts: 1128
Location: Litareng, Keynami
It could be transformed into HTML easily, and so by having divs right-floating, you get perfectly aligned glosses and don't have to use tables, which are rather inconvenient in HTML as well, also because they don't wrap at page borders.


Top
 Profile  
 
PostPosted: Sun Apr 10, 2011 8:02 pm 
Avisaru
Avisaru
User avatar

Joined: Wed Oct 24, 2007 7:39 pm
Posts: 859
Location: The Eastern Establishment
Morphophonemics might present problems.


Top
 Profile  
 
PostPosted: Mon Apr 11, 2011 2:08 am 
Smeric
Smeric
User avatar

Joined: Sat Apr 12, 2003 1:48 pm
Posts: 1128
Location: Litareng, Keynami
Why would I run into trouble with morphophonemics? I said that for my needs, it doesn't need to automatically fetch my words from the database and break them down. Chuma's script looks like it does what I want (Thanks a ton! But how can I get text into the program?), i.e. produce the glossing format used here¹ from entering plain text like in my example in the first post in this thread. Strictly speaking, I don't even need the "src=" thing, but when I wrote the OP I thought giving a generalized example in XML would make things more easily comprehensible. Also, XML is quite adaptable to all kinds of stuff.

¹) No, those are not tables, look at the code and resize the width of the window, and you'll see why I prefer this. And you'll also see then why it's kind of tedious to hand-write that all. On the other hand, yes, tables would be friendlier because they don't break when you switch off CSS. But tables are just as tedious to make in HTML.


Top
 Profile  
 
PostPosted: Tue Apr 12, 2011 4:23 am 
Avisaru
Avisaru
User avatar

Joined: Sat Oct 28, 2006 9:01 pm
Posts: 387
Location: Hyperborea
Guitarplayer wrote:
how can I get text into the program?

You provide an input file and an output file, as arguments to the program.

_________________
My conlang lessons
My problem blog


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group