A program to generate interlinears from a given template?

Discussions worth keeping around later.
Post Reply
User avatar
Jipí
Smeric
Smeric
Posts: 1128
Joined: Sat Apr 12, 2003 1:48 pm
Location: Litareng, Keynami
Contact:

A program to generate interlinears from a given template?

Post by Jipí »

Hi,

I just want to ask whether anyone of you knows or has written a program to generate interlinears from text input using a given template. To illustrate what I'd like to be able to do:

Enter:
  • Ø edareng telbyamanas.
    Ø eda-reng telb-yam-an-as
    COP that-A.INAN show-PTCP-NMLZ-P
    This is an example.
Get out:

Code: Select all

<example src="John Doe (2011)">
    <gloss>
        <word>Ø</word>
        <morphemes>Ø</morphemes>
        <analysis>COP</analysis>
        <translation>be</translation>
    </gloss>
    <gloss>
        <word>edareng</word>
        <morphemes>eda-reng</morphemes>
        <analysis>that-A.INAN</analysis>
        <translation>this</translation>
    </gloss>
    <gloss>
        <word>telbyamanas</word>
        <morphemes>telb-yam-an-as</morphemes>
        <analysis>show-PTCP-NMLZ-P</analysis>
        <translation>example</translation>
    </gloss>
    <whole>This is an example.</whole>
</example>
given the template you defined.

This'd be incredibly helpful for longer texts, because to make interlinears of them that can be manipulated with HTML and CSS always takes me ages.

I guess I could write such a program myself (I suppose it'd be rather trivial with either Python or PERL, but I don't know either), but before I reinvent the wheel I wanted to ask whether someone has already done this work.

Cedh
Sanno
Sanno
Posts: 938
Joined: Tue Nov 14, 2006 10:30 am
Location: Tübingen, Germany
Contact:

Re: A program to generate interlinears from a given template

Post by Cedh »

I've made something similar as a template for MediaWiki. See http://www.frathwiki.com/Template:Gl; it also requires http://www.frathwiki.com/Template:Glend and some CSS. For formatting the input, I paste the plain-text interlinear into an empty OpenOffice Calc spreadsheet with {space} as a separator, copy&paste the spreadsheet contents into the second tab with the Transpose option checked, rearrange the sentences so they're all below each other, copy&paste the result into my plain text editor, do three or four rounds of search&replace to create the wikicode, and finally copy&paste the source into the wiki. This works for me because I usually publish my conlanging stuff in a wiki context, but of course it's still not quite automatic.

The closest thing to what you're asking for that I know of is Arthaey's Interlinearizer, which is written in Perl and generates HTML output. It could probably be adapted to an XML format like the one in your example quite easily.

User avatar
Chuma
Avisaru
Avisaru
Posts: 387
Joined: Sat Oct 28, 2006 9:01 pm
Location: Hyperborea

Re: A program to generate interlinears from a given template

Post by Chuma »

Didn't we do something similar in another thread a while ago? Nah, I can't find it either.
Anyway, I can make you a Perl program, if you like. Shouldn't be very hard.

I just need to know:
- how you intend to get those word-for-word translations; maybe there should be another line for those - or do you intend to use a dictionary file?
- what should be in the src part
- how you want to separate multiple sentences, since presumably you don't want the program to handle just one sentence; the easiest from programming perspective would probably be to just write the next one directly after, but maybe an empty line in between would make it easier to read

User avatar
Jipí
Smeric
Smeric
Posts: 1128
Joined: Sat Apr 12, 2003 1:48 pm
Location: Litareng, Keynami
Contact:

Re: A program to generate interlinears from a given template

Post by Jipí »

Well, ideally it would get the word-by-word translation from my MySQL dictionary, but I wouldn't want to have the program analyze the morphemes for me automatically. That'd probably make it too complicated. An input as in the example in my OP would be just as good, I guess. You'll probably have an input of several lines anyway, so I guess it might be best to delimit sentences with something like \n\n or \n--\n:

Code: Select all

Zees eez sentence wun.
zees ee-z sentence wun-Ø
this is-BLAH sentence one-FOO
This is sentence one.
(--)
Zees eez sentence tooh.
zees ee-z sentence too-h
this is-BLAH sentence two-BAR
This is sentence two.

User avatar
finlay
Sumerul
Sumerul
Posts: 3600
Joined: Mon Dec 22, 2003 12:35 pm
Location: Tokyo

Re: A program to generate interlinears from a given template

Post by finlay »

the other thing i think you'd want is instead of

Code: Select all

        <morphemes>telb-yam-an-as</morphemes>
you should have something more like

Code: Select all

        <morphemebreakdown>
         <morpheme>telb</morpheme>
         <morpheme>yam</morpheme>
         <morpheme>an</morpheme>
         <morpheme>as</morpheme>
        </morphemebreakdown>
I don't honestly see how having it in XML format does you any good, though. I don't really get what it's for.

User avatar
Chuma
Avisaru
Avisaru
Posts: 387
Joined: Sat Oct 28, 2006 9:01 pm
Location: Hyperborea

Re: A program to generate interlinears from a given template

Post by Chuma »

Finlay: Yep, I though about that too - that makes sense if he wants an XML database, but it seems he's just looking for a neat HTML file for web use, or something.

How about this?
I've skipped the word-for-word translations for now. The rest seems to work.

Code: Select all

#!/usr/bin/perl
open IN,shift;
open OUT,'>',shift;
print OUT '<example src="">',"\n";
while(<IN>){
@word=split / /,<IN>;
@morph=split / /,<IN>;
@anal=split / /,<IN>;
$trans=<IN>;
chomp($trans);
for(@word){
$morph=shift @morph;
$anal=shift @anal;
chomp($_,$morph,$anal);
print OUT "\t<gloss>\n\t\t<word>$_</word>\n\t\t<morphemes>$morph</morphemes>\n\t\t<analysis>$anal</analysis>\n\t</gloss>\n";
}
print OUT "\t<whole>$trans</whole>\n";
}
print OUT '</example>';
Usage: progname inputfile outputfile
The infile should contain the input on the format you gave. There should be a separator; any line will do, but not empty. The first line should also be a separator, but not the last line.
The outfile might have to exist and be empty. Or something.
Give it a try.

User avatar
Jipí
Smeric
Smeric
Posts: 1128
Joined: Sat Apr 12, 2003 1:48 pm
Location: Litareng, Keynami
Contact:

Re: A program to generate interlinears from a given template

Post by Jipí »

It could be transformed into HTML easily, and so by having divs right-floating, you get perfectly aligned glosses and don't have to use tables, which are rather inconvenient in HTML as well, also because they don't wrap at page borders.

User avatar
dhok
Avisaru
Avisaru
Posts: 859
Joined: Wed Oct 24, 2007 7:39 pm
Location: The Eastern Establishment

Re: A program to generate interlinears from a given template

Post by dhok »

Morphophonemics might present problems.

User avatar
Jipí
Smeric
Smeric
Posts: 1128
Joined: Sat Apr 12, 2003 1:48 pm
Location: Litareng, Keynami
Contact:

Re: A program to generate interlinears from a given template

Post by Jipí »

Why would I run into trouble with morphophonemics? I said that for my needs, it doesn't need to automatically fetch my words from the database and break them down. Chuma's script looks like it does what I want (Thanks a ton! But how can I get text into the program?), i.e. produce the glossing format used here¹ from entering plain text like in my example in the first post in this thread. Strictly speaking, I don't even need the "src=" thing, but when I wrote the OP I thought giving a generalized example in XML would make things more easily comprehensible. Also, XML is quite adaptable to all kinds of stuff.

¹) No, those are not tables, look at the code and resize the width of the window, and you'll see why I prefer this. And you'll also see then why it's kind of tedious to hand-write that all. On the other hand, yes, tables would be friendlier because they don't break when you switch off CSS. But tables are just as tedious to make in HTML.

User avatar
Chuma
Avisaru
Avisaru
Posts: 387
Joined: Sat Oct 28, 2006 9:01 pm
Location: Hyperborea

Re: A program to generate interlinears from a given template

Post by Chuma »

Guitarplayer wrote:how can I get text into the program?
You provide an input file and an output file, as arguments to the program.

Post Reply