Sunday, April 11, 2010

Debugging Natural Language

My officemate Ryan is currently in the midst of writing a research paper. A few days ago, he remarked that his first drafts are often clunky and lacking smooth logical flow.

For example, a common mistake he (and basically everyone) makes when writing a first draft of a technical report is to start using terminology before they're properly defined. These types of logical discontinuities are often most easily recognizable by someone else, as it can be hard for the writer to tease apart what was actually written from the idea that was meant to be conveyed.

If you stop to think about it, many aspects of proofreading are basically just debugging the natural language. Sure, we already have some pretty good spell checkers and grammar checkers these days. Those are akin to the syntax checkers we have in our compilers and interpreters for programming languages. But we don't have anything that checks for higher level concepts.

One could make the argument that all aspects of proofreading can be compared to debugging at some level of the abstraction, but the generality of that argument makes it in some ways less interesting. Certainly some aspects of proofreading are easier to design automated or semi-automated services for -- such as detecting whether certain terms are defined before they are first used. Come to think of it, it now seems kind of barbaric that we still rely so much on manual proofreading for such relatively low level editing assistance.

No comments: