zompist bboard

Posted: **Thu Apr 22, 2010 7:02 am**

I'm finally getting started on my MSc thesis in computer science, and I have a new subject - something like "The limits of formal language in emulating natural language", or in the popular science version, "Learning to speak robot".

The idea is, there are two common ways of "talking" to machines; either you use machine languages (or programming languages), which humans have difficulties understanding, or you use human languages, which machines have difficulties understanding. My plan is to find a sort of middle ground.

When computers analyse natural language, they use statistical methods. It has turned out that using exact rules was too complicated, there are too many little exceptions and subtleties. Formal languages, on the other hand (such as programming languages) have no ambiguities or exceptions, and can be analysed with simple rules. I want to see how close you can get to a natural language without needing to use statistical methods.

The hypothetical language (and I probably won't make an actual language, even tho it would be nice) should be somewhat focused on physical worlds, rather than abstract mathematical things; thus a "robot language" rather than a programming language.
I probably won't deal with phonetics; we assume that the machine has heard what you said, or that the communication is written.

I haven't got as far as answers yet, I'm still dealing with questions. Which is what I would like some help with. What issues might arise? What do you need to deal with in order to make a formal language suitable for human communication? I've come up with a couple of loose ideas:
- feedback, confirming that you understand each other
- ellipsis, assuming things that are omitted
- emotions, making up for the lack of body language

I'm also looking for literature suggestions. I'm planning to read up on Lojban, and maybe even good old Esperanto will have something to add. I don't know much else to read.

Posted: **Thu Apr 22, 2010 12:00 pm**

Well, apropos of the RPG discussion in Ephemera, you might look at IF engines in general and Inform 7 in particular for the computers-parsing-natural-language stuff.

I'm not really familiar with the parsers outside of the Inform series, but the way those work is that the text of the first word is matched to an entry representing a (semantic) verb, which then has rules about what arguments that verb can take, where to find them in relation to the verb word, and what should be done based on the nature of the objects. The object names are mapped to actual objects in the game world, of course. Naturally, this is geared at responding to imperatives, but I believe people have written things that respond to questions as well. I haven't played around a lot with the more complicated structures, IIRC it does actually be able to parse things like "put the lamp on the table on the floor" successfully. There are undoubtedly people in rec.arts.int-fiction who will be willing to talk to you in depth about this stuff.

From a slightly different angle, Inform 7 source code itself actually reads like natural language (although, of course, it isn't, and there are more restrictive rules on what you can write and expect to be understood, though IMO it's still quite amazing the stuff that it can compile).

Posted: **Thu Apr 22, 2010 12:22 pm**

There is a computer system called Cyc which appears to be one of the more successful common-sense knowledge base ontology type deals you're probably sick of by now. Apparently, though, it makes, or made, adorable errors in the style of Data, like I forget the precise thing but one example of the chain of reasoning was:

I know a toothbrush is a thing that brushes teeth. I know people brush their teeth. Does that mean humans are toothbrushes?

Posted: **Sat Apr 24, 2010 5:01 am**

No animacy check.

zompist bboard

Semi-natural language - a conlanging MSc thesis

Semi-natural language - a conlanging MSc thesis