My MSc thesis: Semi-natural language processing

Discussion of natural languages, or language in general.
Post Reply
User avatar
Chuma
Avisaru
Avisaru
Posts: 387
Joined: Sat Oct 28, 2006 9:01 pm
Location: Hyperborea

My MSc thesis: Semi-natural language processing

Post by Chuma »

As some of you may recall, I'm working on a MSc thesis in computer science, which is sort of about conlanging. I'm getting close to the deadline, and the other people in the department don't really know an awful lot about linguistics, so I turn to you for feedback, dear ZBB.

(Maybe this should be in C&C. Mods, do as you please.)

The idea

Way back in the seventies, computer interpretation of natural languages relied on grammatical rules given by the programmer. As it turned out, that didn't work very well; real human language had so many exceptions that the rules failed most of the time. So instead, they started using machine learning. They would annotate a huge text, manually analysing how the words related to each other, and then the computer would make statistics and use that to try to guess other sentences. This is the method usually used today, and it often analyses 80-90% of the words correctly. That's not entirely satisfying, particularly since it might mean only 50% of the sentences completely correct, and there is a great deal of research being done on increasing this accuracy.

So I thought: What if we could simplify human language so that it becomes possible to analyse with formal rules? Would it necessarily be a far too limited language, or could we actually get it to look quite similar to natural language?

You might wonder where such a language would be used. In some applications, we have to use normal human language, we can't expect people to just go and change their grammar. In other applications we are happy to use a completely unnatural language, such as programming. There might be situations where a compromise could come in handy, maybe with future household robots, robot toys, or voice-controlled operating systems for disabled people.
Anyway, that's all speculation, and hopefully I don't need to justify conlanging for you guys.

I'm also making a little 3D-program which demonstrates the language. It's not very impressive, but if anyone wants I can send it. It requires Java and Perl.

Formal grammar

Unlike most languages, this one can be neatly written on a single page. This is only the most basic version of the grammar. I'm working on extending it.

Code: Select all

clause {
phrase*
}

phrase {
noun_phrase
| pronoun_phrase
| verb_phrase
}

noun_phrase {
noun_article attribute attribute*
}

pronoun_phrase {
pronoun attribute*
}

verb_phrase {
copula attribute*
}

attribute {
[para_article] <property> meta_attribute*
}

meta_attribute {
adverbial
| secondary_predicate
}

adverbial {
BEGINADV attribute* ENDADV
}

secondary_predicate {
BEGINSP clause ENDSP
}

noun_article {
ART.DEF.PLU.ABS
| ART.DEF.PLU.ERG
| ART.DEF.SING.ABS
| ART.DEF.SING.ERG
| ART.IND.PLU.ABS
| ART.IND.PLU.ERG
| ART.IND.SING.ABS
| ART.IND.SING.ERG
}

pronoun {
PRON.1P.PLU.ABS
| PRON.1P.PLU.ERG
| PRON.1P.SING.ABS
| PRON.1P.SING.ERG
| PRON.2P.PLU.ABS
| PRON.2P.PLU.ERG
| PRON.2P.SING.ABS
| PRON.2P.SING.ERG
| PRON.3P.PLU.ABS
| PRON.3P.PLU.ERG
| PRON.3P.NEUT.ABS
| PRON.3P.NEUT.ERG
| PRON.3P.MASC.ABS
| PRON.3P.MASC.ERG
| PRON.3P.FEM.ABS
| PRON.3P.FEM.ERG
}

copula {
COP.PRES
| COP.PAST
| COP.IMP
}

para_article {
PARA.ABS
| PARA.ERG
}
(Capital letters means literal word (i.e. "terminal symbol"), lowercase means word is defined. Square brackets means optional. Star means repeated any number of times, including zero times. Angle brackets means "other word", that there is not a fixed set of possibilities.)


Explanations

The language is rigidly head-initial, with free word order whenever possible. It can be thought of as units made up of a head word and a number of dependant units, with the head first and the dependants in any order.

A clause can be a whole sentence, but some sentences are written as several clauses. The phrases which make up a clause can be put in any order. The final version will have special types of clauses, marked with a clause head.

A normal clause has exactly one verb phrase, and one or two (or in rare cases zero) pronoun or noun phrases. The language technically allows muliples, so you can choose whether the program should use them.

All content words are formally the same part of speech, "properties". Add a noun article and you have a noun phrase, add a copula and you have a verb phrase, add nothing and you have an "attribute", similar to an adjective or adverb.

A noun phrase needs at least one attribute, but they can be of any type and in any order. You can say "the red" rather than "the red one". Saying "the giant German" is exactly the same as "the German giant".

A meta-attribute (or just "meta" for short) is simply the attribute of an attribute, as opposed to the attribute of a noun or verb. In "the very tall man runs quite fast", "very", "quite" and "fast" are metas, but "tall", "man", and "runs" are plain attributes.

Adverbials are just ordinary attributes which apply to another attribute, whereas a secondary predicate is a whole (partial) clause specifying in which way the object relates to the predicate which defines it. In this simplified they can only give the other argument to a defining transitive predicate, such as "bus" in "bus driver".

Cases, as usual, describe the relation of the noun to the current predicate. Absolutive is the patient/experiencer, either the single noun relating to an intransitive verb, or the object of a transitive verb. Ergative is the agent/instigator, the subject of a transitive verb (but can be used on its own, which would give something like an antipassive voice).

Para-articles describe the case relation of the noun to the defining predicate. It is normally left out if it is absolutive. They work a lot like the "-er" and "-ee" suffixes in English.


Examples

ART.SING.DEF.ERG mouse COP.PRES eat ART.SING.DEF.ABS cheese
"The mouse eats the cheese."

ART.SING.DEF.ERG mouse COP.PRES eat
"The mouse eats."

COP.PRES eat ART.SING.DEF.ABS cheese
"The cheese is eaten."

ART.SING.DEF.(?) cheese eat
"the cheese which is eaten"

ART.SING.DEF.(?) mouse PARA.ERG eat
"the mouse which eats"

ART.SING.DEF.(?) mouse run
"the mouse which runs"

ART.SING.DEF.(?) mouse run fast
"the fast mouse which runs"

ART.SING.DEF.(?) mouse run BEGINADV fast ENDADV
"the mouse which runs fast"

ART.SING.DEF.(?) mouse PARA.ERG eat BEGINSP ART.SING.DEF.ABS cheese ENDSP
"the mouse which eats the cheese"


What I would like to know

- General thoughts. (Altho you don't need to tell me it's a bad idea for a thesis - I've been working on it all year already.)

- What grammatical things would be reasonable to add next? Perhaps I don't need the entire complexity of natural language, but it should probably be a bit more advanced than this.

- Specifically, how should those metas be handled? The whole begin-end deal seems horribly unnatural. I have some ideas for changing them.

- Also, what about prepositions? How could they fit into the system? I consider them to be content words, but apart from that I don't know. Locative case? Clause-level prepositional phrases, or prepositions as attributes?

- Ideas for an example text. It should be short and not too grammatically complex. I tried "the north wind and the sun", but it has some slightly tricky grammar. It would also be good if it can easily be represented graphically, so I can show it in my little 3D-program.

- Reasonable translations for the form words, so that I can write it like almost-normal English. Some of those words have more or less obvious English equivalents, such as ART.DEF.SING.ABS = "the", but others don't. For example for the para-articles - what would be suitable English words for those?


Any comments are greatly appreciated.

User avatar
nebula wind phone
Sanci
Sanci
Posts: 67
Joined: Wed Mar 02, 2005 10:58 am
Location: Austin, Texas, USA
Contact:

Post by nebula wind phone »

Sounds like a cool project.

As for your questions: it might help if you had a specific domain in mind.

It wouldn't have to be a terribly complicated one. I mean, one of the classic examples of the rule-based AI you're talking about is Winograd's SHRDLU, and there the domain was a little toy world with a few colored blocks in it. But if you picked some domain, then that could help you figure out what to add next.

F'rinstance, if you were to decide "Okay, I want a language you could give driving directions in," then the next step would be spatial language: prepositions, spatial adverbs like "left" and "straight," things like that. If you said "I want a language that you could use for online shopping," then the next step might be numbers and measurements and whatnot, plus maybe comparatives and superlatives. (How would you translate "I want the largest hard drive that costs less than $150"?) If you said "I want a language you could hold math classes in," then you'd need numbers and also logical connectives. Etcetera.

(One thing you'll need in almost any domain: A way to ask questions, and a way to issue requests or commands. So maybe that oughta be the next step regardless. But still — I think sooner or later, settling on a domain will help. If nothing else, a well-delimited task makes it easier to convince your professors that you're finished. :mrgreen: :mrgreen: )
"When I was about 16 it occurred to me that conlanging might be a sin, but I changed my mind when I realized Adam and Eve were doing it before the Fall." —Mercator

User avatar
faiuwle
Avisaru
Avisaru
Posts: 512
Joined: Mon Feb 12, 2007 12:26 am
Location: MA north shore

Post by faiuwle »

What nebula wind phone said. You're never going to design an uber-language that will work for all contexts anyway.

Also, is this supposed to be a generalized syntax that could work with vocabulary in any language, or is it just meant for English speakers? If it's just for English speakers, I'm not sure why you'd want it ergative-absolutive when you could just use English's nominative/accusative as it already applies to existing verbs. Then you don't have to worry about roles in sentences like "the mouse runs" etc.

On grammar notes, I don't see why you need the para_article, and it makes no sense the way you described it, anyway.

Why aren't these the forms?

ART.SING.DEF.ERG mouse eat
"the mouse which eats"

"eat" is still an attribute here, not a verb, so it contrasts with

ART.SING.DEF.ERG mouse COP.PRES eat
"The mouse eats."

Also

ART.SING.DEF.ABS cheese eat
"the cheese which is eaten"

(clearly)

and why is the "eat" not included in the secondary predicate of the last one? It just divides up the sentence in an odd place that no one is used to.

Why not

ART.SING.DEF.ERG mouse BEGINSP COP.PRES eat ART.SING.DEF.ABS cheese ENDSP
"the mouse which eats the cheese"

?

As for making the secondary expression stuff more natural-sounding, just have BEGINSP be "that" (or "which") and ENDSP be ",":

the.SING.ERG mouse that does eat the.SING.ABS cheese,

(and your copula could be "does/did" since people are used to seeing that before verbs).

As far as the "fast mouse which runs" versus "mouse that runs fast" issue, you really just lack a way to include attributes as a part of the verb phrase. Then you could use secondary predicates and say

the mouse that does run fast,

as opposed to

the mouse fast run
(which would just be short for "the mouse fast that does run," anyway)
It's (broadly) [faɪ.ˈjuw.lɛ]
#define FEMALE

ConlangDictionary 0.3 3/15/14 (ZBB thread)

Quis vult in terra stare,
Cum possit volitare?

User avatar
Chuma
Avisaru
Avisaru
Posts: 387
Joined: Sat Oct 28, 2006 9:01 pm
Location: Hyperborea

Post by Chuma »

Thank you for your feedback.
nebula wind phone wrote:it might help if you had a specific domain in mind.
I sort of do. The idea is, there is a robot with some physical things which it can move around. For now, I have an example world with a few shapes which can be moved, resized and coloured.
But I would like the language to be more general. After all, I'm not going to analyse the contents of the sentences, just the syntax. If it's supposed to be convincing as a near-human language, I think it should be reasonably generally applicable.
nebula wind phone wrote:One thing you'll need in almost any domain: A way to ask questions, and a way to issue requests or commands.
That is true. There is a way to issue commands, with the imperative, but I haven't got questions yet (altho you might be able to manage a "tell me", or something). That should probably be added.
faiuwle wrote:Also, is this supposed to be a generalized syntax that could work with vocabulary in any language, or is it just meant for English speakers?
It should be possible to use in any language. Yes, the abs/erg looks weird, but it makes more sense particularly when involving adjectives as predicates. In the semantic analysis there is the agent and the patient, so it's convenient to have the same division in the syntactical analysis.
faiuwle wrote:Why aren't these the forms?

ART.SING.DEF.ERG mouse eat
"the mouse which eats"

"eat" is still an attribute here, not a verb, so it contrasts with

ART.SING.DEF.ERG mouse COP.PRES eat
"The mouse eats."
Yes, but
ART.SING.DEF.(?) mouse eat
means "the mouse which is eaten", regardless of which case it is in.
faiuwle wrote:Also

ART.SING.DEF.ABS cheese eat
"the cheese which is eaten"
Oh, you're wondering why I didn't write out the case there? Maybe I should have said. Since those aren't sentences, just loose noun phrases, it's impossible to say which case they should be in. That depends on their relation to the main verb.

Consider:

ART.SING.DEF.ERG PARA.ERG drive COP.PRES eat
"the driver eats"

ART.SING.DEF.ERG drive COP.PRES eat
"the driven (i.e. the passenger) eats"

ART.SING.DEF.ABS PARA.ERG drive COP.PRES eat
"the driver is eaten"

ART.SING.DEF.ABS drive COP.PRES eat
"the driven is eaten"

The noun article describes the relation to the predicate of the current sentence (here, "eat"), and the para describes the relation to the predicate which defines the noun (here, "drive").
faiuwle wrote:Why not

ART.SING.DEF.ERG mouse BEGINSP COP.PRES eat ART.SING.DEF.ABS cheese ENDSP
"the mouse which eats the cheese"
Maybe that would be more intuitive. The reason I have it like that is so that the head word of the attribute is always a property (except when it's a para), with the secondary predicate being a description of sorts of the property.
I think I might change the position of the para, tho, and use the English "er", "en", "ee". That should please the English speakers, and it might actually make more sense. Thus

ART.SING.DEF.ABS PARA.ERG drive COP.PRES eat
becomes
ART.SING.DEF.ABS drive PARA.ERG COP.PRES eat
faiuwle wrote:As for making the secondary expression stuff more natural-sounding, just have BEGINSP be "that" (or "which") and ENDSP be ","
Yes, perhaps, either "that" or "which". The comma might seem a little odd in spoken language.
faiuwle wrote:and your copula could be "does/did" since people are used to seeing that before verbs.
Maybe. I was thinking "is", which works much better with adjectives.
faiuwle wrote:As far as the "fast mouse which runs" versus "mouse that runs fast" issue, you really just lack a way to include attributes as a part of the verb phrase. Then you could use secondary predicates and say

the mouse that does run fast,

as opposed to

the mouse fast run
Hm. That would be longer, tho. I need to think about this one.

User avatar
faiuwle
Avisaru
Avisaru
Posts: 512
Joined: Mon Feb 12, 2007 12:26 am
Location: MA north shore

Post by faiuwle »

Ahh, I see. It looked like with the (?) you were saying you weren't sure, or something.

I still don't think the PARA stuff really works, though. It would feel more natural to simply stick a case-marking on the BEGINSP/that/which/whatever, e.g.

ART.SING.DEF.* mouse BEGINSP.ERG eat ... ENDSP
"the mouse that ate [the cheese, or whatever]"

vs.

ART.SING.DEF.* mouse BEGINSP.ABS eat ... ENDSP
"the mouse that was eaten"

You'd need multiple variations of "that" then, but you already need case-number variants of "the", so you could just model after those. Then your BEGINSP actually works a bit like a relative pronoun, and people are used to it. As far as parsing, it would just be a matter of identifying what the SP is modifying, and replacing BEGINSP with the word/sematic token of the referant.

For the copula - why not allow both "does" and "is"? Then all of these would be correct:

the.SING.ABS mouse is furry
the.SING.ABS mouse does furry

the.SING.ERG mouse is run
the.SING.ERG mouse does run

Then there is an idiomatic way to say both.
Yes, perhaps, either "that" or "which". The comma might seem a little odd in spoken language.
But interpreting spoken language is a whole other kettle of fish... You should be able to tell when the clause is over anyway, right?
Hm. That would be longer, tho. I need to think about this one.
IMO, longer is better if it's going to make it more intuitive to use. No one's ever going to give you a sentence that's embedded so deeply the computer has problems with it, as humans tend to have problems with more than about three levels anyway ("this is the malt the rat the cat the dog worried chased ate...").
It's (broadly) [faɪ.ˈjuw.lɛ]
#define FEMALE

ConlangDictionary 0.3 3/15/14 (ZBB thread)

Quis vult in terra stare,
Cum possit volitare?

User avatar
Chuma
Avisaru
Avisaru
Posts: 387
Joined: Sat Oct 28, 2006 9:01 pm
Location: Hyperborea

Post by Chuma »

This is turning into a surprisingly useful discussion. Wonderful thoughts of grammar are filling my head. :)
faiuwle wrote:It would feel more natural to simply stick a case-marking on the BEGINSP/that/which/whatever, e.g.
That would work, but only when there is a BEGINSP. I'm assuming that simpler cases are always far more common, so I optimise for them; in this case, "the one which eats" should be much more common than "the one which eats cheese".
faiuwle wrote:For the copula - why not allow both "does" and "is"?
Yes, any real life application would have multiple English words for each of the forms, so that you can say "is running" rather than "is run", as well as both "I" and "me", etc. This is a good suggestion, I might use it. Only problem is if I want to add a question word; that might also be "do"/"does".
faiuwle wrote:But interpreting spoken language is a whole other kettle of fish...
Yes, I won't actually use it on spoken language. It's still good to be able to at least read the stuff that has been written, so too much punctuation would feel weird, but maybe this one would be okay. Can't say I have any better idea.
faiuwle wrote:humans tend to have problems with more than about three levels anyway ("this is the malt the rat the cat the dog worried chased ate...").
True indeed. And I good example, I think I'll use that.


In a more advanced version, it would be convenient to have metas without end markers. Presumably most metas have only a single content word, such as "…who runs fast". For example

Code: Select all

adverbial {
BEGINADV attribute* ENDADV
| ONEADV attribute
}
It's also possible to join up ADV and SP, because they can be distinguished anyway by looking at the next word. But then I would lose the nifty principle that every unit can be identified by its head.

Again about
faiuwle wrote:the mouse that does run fast,

as opposed to

the mouse fast run
I have thought about it.

It seems like a good idea to express all sorts of metas just like ordinary clauses. We could do something like this:

Code: Select all

...
attribute {
<property>
| secondary_predicate
}

secondary_predicate {
beginsp clause ENDSP
}

beginsp {
BEGINSP_ABS
| BEGINSP_ERG
}
...
But it doesn't work. Consider again "the mouse which runs fast". Here, "fast" is a meta. Could we say "the mouse that…" and then stuff the whole "runs fast" into a secondary predicate? No, because "fast" would be a meta in that clause as well. We would fall into a black hole of endless SPs.

What if we change the secondary clause around a bit?

Code: Select all

...
attribute {
<property>
| secondary_predicate
}

secondary_predicate {
beginsp <property> attribute* (noun_phrase | pronoun_phrase)* ENDSP
}

beginsp {
BEGINSP_ABS
| BEGINSP_ERG
}
...
This way, we could say
the mouse that run fast,
We would lose the meaningless copula, and also get out of the black hole thing.

But there are two problems. First, I would like to maintain the principle that each unit consists of a head, which comes first, and a number of dependants, which can come in any order. This suggestion would ruin that, because the dependants of the beginsp would need a specific order.

Second, it feels a little odd together with the new one-word substructure; you would get a word which means "here comes a word which has a single meta after it" rather than "here comes a meta".


I suppose one possibility might be not to allow mulitple-attribute predicates, but use multiple predicates instead:
"The dog is big and brown."
ART.DEF.SING.ABS dog COP.PRES big brown
becomes
ART.DEF.SING.ABS dog COP.PRES big COP.PRES brown

In that sentence, it's not so great. But in this one it's better:
"The man runs very fast."
ART.DEF.SING.ABS man COP.PRES run BEGINADV fast BEGINADV very ENDADV ENDADV
becomes
ART.DEF.SING.ABS man COP.PRES run fast very

We don't completely get rid of the substructures (I'm fairly sure that's impossible). For example:

"The man runs fast and well"
ART.DEF.SING.ABS man COP.PRES run BEGINX fast well ENDX

I does sort of make the above problem easier, but it doesn't feel as nice and symmetric.


On a side note,
faiuwle wrote:the mouse fast run
(which would just be short for "the mouse fast that does run," anyway)
It would even be short for
the that does mouse, that does fast, that does run,
actually. Not that it makes much difference.

User avatar
faiuwle
Avisaru
Avisaru
Posts: 512
Joined: Mon Feb 12, 2007 12:26 am
Location: MA north shore

Post by faiuwle »

Chuma wrote:That would work, but only when there is a BEGINSP. I'm assuming that simpler cases are always far more common, so I optimise for them; in this case, "the one which eats" should be much more common than "the one which eats cheese".
Well, this is possibly an argument for using Nominative/Accusative, as then the unmarked case (with no BEGINSP) would default to nominative. Or, since this is just a syntax-based interface, you could just leave it as Subject/Object/Oblique/whatever; the internal logic could have some leeway in how it interpreted subjects and objects semantically, possibly with hardcoded or user-configurable exceptions for particular predicates.
Consider again "the mouse which runs fast". Here, "fast" is a meta. Could we say "the mouse that…" and then stuff the whole "runs fast" into a secondary predicate? No, because "fast" would be a meta in that clause as well. We would fall into a black hole of endless SPs.
Turtles all the way down, yes.

I was thinking actually changing your verb phrase:

Code: Select all

verb_phrase {
copula <property> attribute*
}
which is the same thing as you said, really, but down a level from the SP, so you don't have to have an SP just to have an adverb. It's got the same problem where it doesn't work with the head-tail system anymore, but I'm sure there will be other things that break it too.
I suppose one possibility might be not to allow mulitple-attribute predicates, but use multiple predicates instead:
"The dog is big and brown."
ART.DEF.SING.ABS dog COP.PRES big brown
becomes
ART.DEF.SING.ABS dog COP.PRES big COP.PRES brown

In that sentence, it's not so great. But in this one it's better:
"The man runs very fast."
ART.DEF.SING.ABS man COP.PRES run BEGINADV fast BEGINADV very ENDADV ENDADV
becomes
ART.DEF.SING.ABS man COP.PRES run fast very
And have an implied structure where everything is considered subordinate to what came directly before it (rather than on the same level) if it's in the same marked "clause"? Sure. And you could employ a comma (or something else) as a general-purpose indication that the current implied subclause is over and the parser should go back up a level, so:

"The dog is big and brown."
ART.DEF.SING.ABS dog COP.PRES big, brown

"The man runs very fast."
ART.DEF.SING.ABS man COP.PRES run fast very

"The man runs fast and well"
ART.DEF.SING.ABS man COP.PRES run fast, well

Maybe it's not as attractive, though.
It's (broadly) [faɪ.ˈjuw.lɛ]
#define FEMALE

ConlangDictionary 0.3 3/15/14 (ZBB thread)

Quis vult in terra stare,
Cum possit volitare?

User avatar
Chuma
Avisaru
Avisaru
Posts: 387
Joined: Sat Oct 28, 2006 9:01 pm
Location: Hyperborea

Post by Chuma »

faiuwle wrote:Well, this is possibly an argument for using Nominative/Accusative, as then the unmarked case (with no BEGINSP) would default to nominative.
But that would break a lot of other things. First, I don't want to have separate transitive and intransitive words; a sentence like "the one that burns" could be either or. Second, it's convenient that words like "red" are transitive, so that we can say "make something red" without the extra verb ("make"), but we don't want that to be the default.
faiuwle wrote:Or, since this is just a syntax-based interface, you could just leave it as Subject/Object/Oblique/whatever
So basically the interpreter should know whether it's a transitive verb or not? That would sort of defeat the purpose. The idea is that it should be able to interpret any sentence without knowing the meaning of the content words. Otherwise it would need a huge wordlist and still fail for unknown words.
faiuwle wrote:I was thinking actually changing your verb phrase:

Code: Select all

verb_phrase {
copula <property> attribute*
}
which is the same thing as you said, really, but down a level from the SP, so you don't have to have an SP just to have an adverb. It's got the same problem where it doesn't work with the head-tail system anymore, but I'm sure there will be other things that break it too.
Actually, if that's how you mean it, I don't think it would break the system; you would just consider the property to be the head of the attributes. It would be another version of the multiple-predicate system, only instead of the levels going 1 2 3 4... they would go 1 2 2 2... if you see what I mean. It would sort of work, but it's not really pretty.

User avatar
Chuma
Avisaru
Avisaru
Posts: 387
Joined: Sat Oct 28, 2006 9:01 pm
Location: Hyperborea

Post by Chuma »

All right, I've put together an "advanced" version. There are still a few things missing - for one thing, there are no logical operators ("and", "or" etc.). But I have to stop somewhere.

Code: Select all

clause {
primer phrase* ENDCL
}

phrase {
noun_phrase
| pronoun_phrase
| verb_phrase
}

noun_phrase {
noun_article attribute attribute*
}

pronoun_phrase {
pronoun attribute*
}

verb_phrase {
copula attribute*
}

attribute {
<property> meta_attribute*
}

meta_attribute {
para_article
| META.ONEADV <property>
| META.BEGINADV attribute* META.END
| nounlike_meta_article <property>
| META.BEGINSP phrase* META.END
}


noun_article {
ART.DEF.PLU.ABS
| ART.DEF.PLU.ERG
| ART.DEF.PLU.DAT
| ART.DEF.PLU.LOC
| ART.DEF.SING.ABS
| ART.DEF.SING.ERG
| ART.DEF.SING.DAT
| ART.DEF.SING.LOC
| ART.IND.PLU.ABS
| ART.IND.PLU.ERG
| ART.IND.PLU.DAT
| ART.IND.PLU.LOC
| ART.IND.SING.ABS
| ART.IND.SING.ERG
| ART.IND.SING.DAT
| ART.IND.SING.LOC
}

nounlike_meta_article {
META.DEF.PLU.ABS
| META.DEF.PLU.ERG
| META.DEF.PLU.DAT
| META.DEF.PLU.LOC
| META.DEF.SING.ABS
| META.DEF.SING.ERG
| META.DEF.SING.DAT
| META.DEF.SING.LOC
| META.IND.PLU.ABS
| META.IND.PLU.ERG
| META.IND.PLU.DAT
| META.IND.PLU.LOC
| META.IND.SING.ABS
| META.IND.SING.ERG
| META.IND.SING.DAT
| META.IND.SING.LOC
}

pronoun {
PRON.1P.PLU.ABS
| PRON.1P.PLU.ERG
| PRON.1P.PLU.DAT
| PRON.1P.PLU.LOC
| PRON.1P.SING.ABS
| PRON.1P.SING.ERG
| PRON.1P.SING.DAT
| PRON.1P.SING.LOC
| PRON.2P.PLU.ABS
| PRON.2P.PLU.ERG
| PRON.2P.PLU.DAT
| PRON.2P.PLU.LOC
| PRON.2P.SING.ABS
| PRON.2P.SING.ERG
| PRON.2P.SING.DAT
| PRON.2P.SING.LOC
| PRON.3P.PLU.ABS
| PRON.3P.PLU.ERG
| PRON.3P.PLU.DAT
| PRON.3P.PLU.LOC
| PRON.3P.NEUT.ABS
| PRON.3P.NEUT.ERG
| PRON.3P.NEUT.DAT
| PRON.3P.NEUT.LOC
| PRON.3P.MASC.ABS
| PRON.3P.MASC.ERG
| PRON.3P.MASC.DAT
| PRON.3P.MASC.LOC
| PRON.3P.FEM.ABS
| PRON.3P.FEM.ERG
| PRON.3P.FEM.DAT
| PRON.3P.FEM.LOC
| PRON.INTG.ABS
| PRON.INTG.ERG
| PRON.INTG.DAT
| PRON.INTG.LOC
}

copula {
COP.PRES
| COP.PERF
| COP.IMP
}

para_article {
PARA.ABS.PRES
| PARA.ERG.PRES
| PARA.DAT.PRES
| PARA.ABS.PERF
| PARA.ERG.PERF
| PARA.DAT.PERF
}

primer {
CL.DEF
| CL.QUE
| CL.OBJ
| CL.QUOTE
| CL.IF
| CL.THEN
| CL.ELSE
| CL.TOPIC
}
What's new?
First, two more cases. I like cases. The dative should be enough to cover most verbs, and it seemed reasonable to include locative as well. Without it, it would have been necessary to have locations as verb metas, which would have been a mess. You might wonder, why place but not manner and time? Well, manner seems pretty reasonable to have as verb metas. Time usually doesn't get involved in as complicated things - you might say "he stands before the bright red door" but expressions like "he stands before the bright red morning" are, altho possible, rarer, I think. But I suppose "locative" could be used for time as well. It's an option.

Second, interrogative pronouns. Pretty simple - replace the word you're wondering about with such a pronoun.

Third, single-word metas. Long awaited. The para is now also considered a kind of meta.

Oh, and present/perfect aspect for paras. (I know, "present" isn't the name of the aspect. But "IMP" would be very confusing.)

Last but most, "primers", for lack of a better word (unless you have a suggestion?). These act as the head of the clause, and define the role of the whole clause in relation to other clauses. They are:
CL.DEF - default, same as before
CL.QUE - yes/no question
CL.OBJ - the whole clause acts patient to the previous clause
CL.QUOTE - same, but the "clause" is just a string
CL.IF - for conditional constructions
CL.THEN - same
CL.ELSE - same
CL.TOPIC - the clause has only a single phrase, which defines "it" for the following clauses

I've also (just to make this post even longer) put together a list of lexicalisations for the most common words.

Code: Select all

COP.PRES			  is
COP.IMP			   be

ART.DEF.PLU.ABS	 the
ART.DEF.PLU.ERG	 bythe
ART.DEF.PLU.DAT	 tothe
ART.DEF.PLU.LOC	 atthe

PRON.1P.SING.ABS	me
PRON.1P.SING.ERG	byme
PRON.1P.SING.DAT	tome
PRON.1P.SING.LOC	atme

PRON.2P.SING.ABS	you
PRON.2P.SING.ERG	byyou
PRON.2P.SING.DAT	toyou
PRON.2P.SING.LOC	atyou

PRON.3P.NEUT.ABS	it
PRON.3P.NEUT.ERG	byit
PRON.3P.NEUT.DAT	toit
PRON.3P.NEUT.LOC	atit

PRON.INTG.ABS		what
PRON.INTG.ERG		bywhat
PRON.INTG.DAT		towhat
PRON.INTG.LOC		atwhat

PARA.ABS			  en
PARA.ERG			  er
PARA.DAT			  ee
PARA.LOC			  ???

META.ADV			  as
META.DEF.PLU.ABS	ofthe
META.DEF.PLU.ERG	ofbythe
META.DEF.PLU.DAT	oftothe
META.DEF.PLU.LOC	ofatthe

CL.DEF				 (no word)
CL.QUE				 do
CL.OBJ				 that
CL.QUOTE			  quote
CL.IF				  if
CL.THEN				then
CL.ELSE				else
CL.TOPIC			  about
ENDCL				  .
And some examples:

"The man runs fast."
the man is run as fast

"the man who runs fast"
the man run as fast

"The man is very big."
the man is big as very

"the very big man"
the man big as very


"the mouse eats the cheese"
bythe mouse is eat the cheese

"the mouse which eats the cheese"
the mouse eat er ofthe cheese

"the cheese which the mouse eats"
the cheese eat [en] ofbythe mouse


"The man is at the table."
the man is atthe table
(This is an empty predicate. Locatives don't form attributes.)

"The man is on the chair."
the man is on atthe chair
(Here we do have a predicate, "on".)

"The man sits on the chair."
the man is sit on atthe chair
(This is sort of cheating - we're not saying that he is sitting on the chair, but rather that he is sitting, and he is on, and that those two things take place at a chair. The rest should be obvious at least to a human, but it's possible to be more specific:)
the man is sit on ofatthe chair
(Now we are saying that it is the chair he is on, that he is not just being on something while being close to the chair. We can go further:)
the man is sit META.BEGINADV on META.DEF.SING.LOC table META.END
(Hopefully this isn't necessary.)

"the man at the table"
the man ofatthe table
(Now it's not an empty predicate, because we can't form a noun from an empty predicate. Instead, it's the act of being a man which is taking place at the table.)

"the man on the chair"
the man on ofatthe chair

"the man sitting on the chair"
the man sit on ofatthe chair


I know some of those lexicalised words are rather awkward, but that's the best I could come up with. As mentioned before, it's possible to have multiple options for when speaking to the robot, but the robot will always answer with the standard word.

And I just realised there are no interrogative metas. Might have to add that somehow.

User avatar
Chuma
Avisaru
Avisaru
Posts: 387
Joined: Sat Oct 28, 2006 9:01 pm
Location: Hyperborea

Post by Chuma »

Just to let you know, I've started uploading the stuff I write about the project. It can be found on
http://cybish.blogspot.com/

I'm working on what will hopefully be the final version of the language (within the project, anyway). Will post the grammar on the blog soon. Also finishing the demonstration program, which will now involve cute puppies instead of cubes. It's a Java program - PM me if you want to see it.

Post Reply