The Winograd schema is a test designed to stretch computer language processing to its limit. If a machine can determine an intended referent in a sentence based only on clues from context, then the machine is using reasoning to parse the language, and, perhaps, it should be called intelligent.
Humans are pretty good at working out referents based on context. For example: "The trophy would not fit in the brown suitcase because it was too big." What was too big? Linguists will sometimes denote referents in text by using subscript numbers or letters. So if you want to solve that “x”, and work out what it refers to in that sentence, then you have to know that a suitcase usually contains things, and that a trophy usually doesn’t contain things. And you have to know how containers work, that larger things can’t fit inside smaller things. It seems simple to us, because we understand not only how syntax works and whether a sentence is grammatically valid, but also because we’ve also got experience of suitcases and trophies and reality itself. Solving that “x” requires comprehension.
Terry Winograd started this line of thinking in 1972 by proposing sentences that required that sort of context. "The city councilmen refused the demonstrators a permit because they 'feared violence' "or because they 'advocated revolution'." ...it was the 70s, there weren’t many councilwomen. And there was a lot of revolution. Winograd argued solving this sentence requires knowing not only who can issue permits or what a demonstrator is, but also the political interests of a council, the ways in which they’d view political change. That referent changes based purely on the context and the meaning of those last two words.
There are plenty of sentences where the “x” is ambiguous, where even humans don't have enough context to work out which pronoun refers to which person. If you’re writing a gay romance story, and both your main characters use the same pronouns and appear all the time in the same sentences, then it’s going to get confusing fast.
In more amateur work, like slash fiction, inexperienced writers often try to clear that up with synecdoche in place of pronouns. But that’s awkward. More experienced writers tend to use context and careful sentence construction, and readers never notice it. But Winograd schemas are designed to be unambiguous for humans, to the point where we wouldn’t even have to think about it, but difficult for computers.
Natural Language Processing, the ability to "understand" language, is in huge demand right now; it’s what powers Siri, and Alexa, and the Google Assistant. So many companies are trying to create code that can understand human requests, and let’s be honest, most of that code sucks right now. They often just rely on collections of sentences, each word tagged with its part of speech; they just pick up on keywords like “call” or “how much is” and extract what they hope are the right bits around them. That doesn’t work for a Winograd schema.
A team of researchers at the computer science department of New York University have assembled a list of 150 of them, as a test. So: “I spread the cloth on the table in order to protect it.” To protect what? Obvious if you understand what a cloth and a table are, and why you might spread a cloth on a table, but if those are just tagged as nouns with no context, then that’s completely ambiguous.
I found out that some had published a new version of GPT-2, which is the leading machine-learning text-generation system, and I figured that if I didn't address that here, a load of people in the comments would be all, "Well, what about machine learning? What about AI?" So I got a version of GPT-2, it's set up kind of like a Dungeons & Dragons adventure system I told it to create a suitcase and a trophy and to put the suitcase in the trophy to see what would happen.
"The moment you think about it, "you know what to do, so you do it with your eyes closed: "you place your hand on top of it, then with your other hand "you take out your gun from inside your jacket pocket, "take aim at it, pull back one inch…"
Yeah, if you think machine learning's going to save us, not yet!
And of course, sometimes words can switch parts of speech. "Main" is usually an adjective, but in video games, it can be a verb or a noun. Humans who’ve not heard that use before won’t understand it.
The solution for now? Accept that Siri’s not going to be great at complicated questions, and hire a team of underpaid contract workers to manually tag data, in order to help programs "learn" patterns. But those methods will continue to fall short, because computers are missing the breadth of knowledge that humans have access to. At least, for now. Artificial language processing remains 10 years away, just as it has for the last few decades.