Tuesday, May 18, 2010

Recognizing Sarcasm with Computer Algorithms




The pursuit of machine intelligence means we have to come up with ways to communicate with our computers in a way both entities can understand. But while computers process verbal commands in a straightforward fashion, humans tend to use more sophisticated speech forms, employing slang or symbols to convey an idea. So an Israeli research team has developed a machine algorithm that can recognize sarcasm.

SASI, a Semi-supervised Algorithm for Sarcasm Identification, can recognize sarcastic sentences in product reviews online with pretty astounding 77 percent precision. To create such an algorithm, the team scanned 66,000 Amazon.com product reviews, with three different human annotators tagging sentences for sarcasm. The team then identified certain sarcastic patterns that emerged in the reviews and created a classification algorithm that puts each statement into a sarcastic class.

The algorithms were then trained on that seed set of 80 sentences from the collection of reviews. These annotated sentences helped the algorithm learn what sorts of words and patterns distinguish sarcastic remarks – those that mean the opposite of what they literally convey, or that convey a sentiment inconsistent with the literal reading.

They then turned the algorithm loose on an evaluation set. Pattern evaluation efficiency scored accurately 81 percent of the time, while the overall precision of the pattern recognition/sarcasm categorizing algorithm was accurate in 77 percent of instances. Not bad for a computer’s first shot at interpreting the human sense of humor.

This isn’t all just so your Roomba gets the joke when you tell it it sucks. Computer programs that can recognize sarcastic statements could generate better personalized content and make better recommendations to human users by not mistaking a product review titled “keep your receipt” with a sound piece of online shopping advice. It could also benefit opinion-mining systems that troll the Web trying to measure public sentiment about a product or idea.

No comments: