Semi-natural language - a conlanging MSc thesis
Posted: Thu Apr 22, 2010 7:02 am
I'm finally getting started on my MSc thesis in computer science, and I have a new subject - something like "The limits of formal language in emulating natural language", or in the popular science version, "Learning to speak robot".
The idea is, there are two common ways of "talking" to machines; either you use machine languages (or programming languages), which humans have difficulties understanding, or you use human languages, which machines have difficulties understanding. My plan is to find a sort of middle ground.
When computers analyse natural language, they use statistical methods. It has turned out that using exact rules was too complicated, there are too many little exceptions and subtleties. Formal languages, on the other hand (such as programming languages) have no ambiguities or exceptions, and can be analysed with simple rules. I want to see how close you can get to a natural language without needing to use statistical methods.
The hypothetical language (and I probably won't make an actual language, even tho it would be nice) should be somewhat focused on physical worlds, rather than abstract mathematical things; thus a "robot language" rather than a programming language.
I probably won't deal with phonetics; we assume that the machine has heard what you said, or that the communication is written.
I haven't got as far as answers yet, I'm still dealing with questions. Which is what I would like some help with. What issues might arise? What do you need to deal with in order to make a formal language suitable for human communication? I've come up with a couple of loose ideas:
- feedback, confirming that you understand each other
- ellipsis, assuming things that are omitted
- emotions, making up for the lack of body language
I'm also looking for literature suggestions. I'm planning to read up on Lojban, and maybe even good old Esperanto will have something to add. I don't know much else to read.
The idea is, there are two common ways of "talking" to machines; either you use machine languages (or programming languages), which humans have difficulties understanding, or you use human languages, which machines have difficulties understanding. My plan is to find a sort of middle ground.
When computers analyse natural language, they use statistical methods. It has turned out that using exact rules was too complicated, there are too many little exceptions and subtleties. Formal languages, on the other hand (such as programming languages) have no ambiguities or exceptions, and can be analysed with simple rules. I want to see how close you can get to a natural language without needing to use statistical methods.
The hypothetical language (and I probably won't make an actual language, even tho it would be nice) should be somewhat focused on physical worlds, rather than abstract mathematical things; thus a "robot language" rather than a programming language.
I probably won't deal with phonetics; we assume that the machine has heard what you said, or that the communication is written.
I haven't got as far as answers yet, I'm still dealing with questions. Which is what I would like some help with. What issues might arise? What do you need to deal with in order to make a formal language suitable for human communication? I've come up with a couple of loose ideas:
- feedback, confirming that you understand each other
- ellipsis, assuming things that are omitted
- emotions, making up for the lack of body language
I'm also looking for literature suggestions. I'm planning to read up on Lojban, and maybe even good old Esperanto will have something to add. I don't know much else to read.