Page 1 of 1

A program to generate interlinears from a given template?

Posted: Tue Mar 08, 2011 2:35 pm
by Jipí
Hi,

I just want to ask whether anyone of you knows or has written a program to generate interlinears from text input using a given template. To illustrate what I'd like to be able to do:

Enter:
  • Ø edareng telbyamanas.
    Ø eda-reng telb-yam-an-as
    COP that-A.INAN show-PTCP-NMLZ-P
    This is an example.
Get out:

Code: Select all

<example src="John Doe (2011)">
    <gloss>
        <word>Ø</word>
        <morphemes>Ø</morphemes>
        <analysis>COP</analysis>
        <translation>be</translation>
    </gloss>
    <gloss>
        <word>edareng</word>
        <morphemes>eda-reng</morphemes>
        <analysis>that-A.INAN</analysis>
        <translation>this</translation>
    </gloss>
    <gloss>
        <word>telbyamanas</word>
        <morphemes>telb-yam-an-as</morphemes>
        <analysis>show-PTCP-NMLZ-P</analysis>
        <translation>example</translation>
    </gloss>
    <whole>This is an example.</whole>
</example>
given the template you defined.

This'd be incredibly helpful for longer texts, because to make interlinears of them that can be manipulated with HTML and CSS always takes me ages.

I guess I could write such a program myself (I suppose it'd be rather trivial with either Python or PERL, but I don't know either), but before I reinvent the wheel I wanted to ask whether someone has already done this work.

Re: A program to generate interlinears from a given template

Posted: Tue Mar 08, 2011 3:48 pm
by Cedh
I've made something similar as a template for MediaWiki. See http://www.frathwiki.com/Template:Gl; it also requires http://www.frathwiki.com/Template:Glend and some CSS. For formatting the input, I paste the plain-text interlinear into an empty OpenOffice Calc spreadsheet with {space} as a separator, copy&paste the spreadsheet contents into the second tab with the Transpose option checked, rearrange the sentences so they're all below each other, copy&paste the result into my plain text editor, do three or four rounds of search&replace to create the wikicode, and finally copy&paste the source into the wiki. This works for me because I usually publish my conlanging stuff in a wiki context, but of course it's still not quite automatic.

The closest thing to what you're asking for that I know of is Arthaey's Interlinearizer, which is written in Perl and generates HTML output. It could probably be adapted to an XML format like the one in your example quite easily.

Re: A program to generate interlinears from a given template

Posted: Sun Apr 10, 2011 3:38 pm
by Chuma
Didn't we do something similar in another thread a while ago? Nah, I can't find it either.
Anyway, I can make you a Perl program, if you like. Shouldn't be very hard.

I just need to know:
- how you intend to get those word-for-word translations; maybe there should be another line for those - or do you intend to use a dictionary file?
- what should be in the src part
- how you want to separate multiple sentences, since presumably you don't want the program to handle just one sentence; the easiest from programming perspective would probably be to just write the next one directly after, but maybe an empty line in between would make it easier to read

Re: A program to generate interlinears from a given template

Posted: Sun Apr 10, 2011 4:20 pm
by Jipí
Well, ideally it would get the word-by-word translation from my MySQL dictionary, but I wouldn't want to have the program analyze the morphemes for me automatically. That'd probably make it too complicated. An input as in the example in my OP would be just as good, I guess. You'll probably have an input of several lines anyway, so I guess it might be best to delimit sentences with something like \n\n or \n--\n:

Code: Select all

Zees eez sentence wun.
zees ee-z sentence wun-Ø
this is-BLAH sentence one-FOO
This is sentence one.
(--)
Zees eez sentence tooh.
zees ee-z sentence too-h
this is-BLAH sentence two-BAR
This is sentence two.

Re: A program to generate interlinears from a given template

Posted: Sun Apr 10, 2011 5:10 pm
by finlay
the other thing i think you'd want is instead of

Code: Select all

        <morphemes>telb-yam-an-as</morphemes>
you should have something more like

Code: Select all

        <morphemebreakdown>
         <morpheme>telb</morpheme>
         <morpheme>yam</morpheme>
         <morpheme>an</morpheme>
         <morpheme>as</morpheme>
        </morphemebreakdown>
I don't honestly see how having it in XML format does you any good, though. I don't really get what it's for.

Re: A program to generate interlinears from a given template

Posted: Sun Apr 10, 2011 5:48 pm
by Chuma
Finlay: Yep, I though about that too - that makes sense if he wants an XML database, but it seems he's just looking for a neat HTML file for web use, or something.

How about this?
I've skipped the word-for-word translations for now. The rest seems to work.

Code: Select all

#!/usr/bin/perl
open IN,shift;
open OUT,'>',shift;
print OUT '<example src="">',"\n";
while(<IN>){
@word=split / /,<IN>;
@morph=split / /,<IN>;
@anal=split / /,<IN>;
$trans=<IN>;
chomp($trans);
for(@word){
$morph=shift @morph;
$anal=shift @anal;
chomp($_,$morph,$anal);
print OUT "\t<gloss>\n\t\t<word>$_</word>\n\t\t<morphemes>$morph</morphemes>\n\t\t<analysis>$anal</analysis>\n\t</gloss>\n";
}
print OUT "\t<whole>$trans</whole>\n";
}
print OUT '</example>';
Usage: progname inputfile outputfile
The infile should contain the input on the format you gave. There should be a separator; any line will do, but not empty. The first line should also be a separator, but not the last line.
The outfile might have to exist and be empty. Or something.
Give it a try.

Re: A program to generate interlinears from a given template

Posted: Sun Apr 10, 2011 5:54 pm
by Jipí
It could be transformed into HTML easily, and so by having divs right-floating, you get perfectly aligned glosses and don't have to use tables, which are rather inconvenient in HTML as well, also because they don't wrap at page borders.

Re: A program to generate interlinears from a given template

Posted: Sun Apr 10, 2011 8:02 pm
by dhok
Morphophonemics might present problems.

Re: A program to generate interlinears from a given template

Posted: Mon Apr 11, 2011 2:08 am
by Jipí
Why would I run into trouble with morphophonemics? I said that for my needs, it doesn't need to automatically fetch my words from the database and break them down. Chuma's script looks like it does what I want (Thanks a ton! But how can I get text into the program?), i.e. produce the glossing format used here¹ from entering plain text like in my example in the first post in this thread. Strictly speaking, I don't even need the "src=" thing, but when I wrote the OP I thought giving a generalized example in XML would make things more easily comprehensible. Also, XML is quite adaptable to all kinds of stuff.

¹) No, those are not tables, look at the code and resize the width of the window, and you'll see why I prefer this. And you'll also see then why it's kind of tedious to hand-write that all. On the other hand, yes, tables would be friendlier because they don't break when you switch off CSS. But tables are just as tedious to make in HTML.

Re: A program to generate interlinears from a given template

Posted: Tue Apr 12, 2011 4:23 am
by Chuma
Guitarplayer wrote:how can I get text into the program?
You provide an input file and an output file, as arguments to the program.