Seminar on Crowdsourcing Word–Emotion Associations

  • Posted on: 11 May 2014
  • By: hadmin

Title: Crowdsourcing Word–Emotion Associations
(work by Dr. Saif Mohammad & others at National Research Council of Canada)

Speaker: Dr. Roland Kuhn
                Principal Research Officer
                National Research Council Canada 

Date: 12 December 2013 (Thursday)
Time: 10:00 am - 11:00 am
Venue: Rm 513, William M. W. Mong Engineering Building, CUHK 

Words have associations with emotions. For example, “delightful” and “yummy” indicate joy, “gloomy” and “cry” indicate sadness, and “shout” and “boiling” indicate anger. Identifying such associations is of substantial benefit for sentiment analysis and information visualization, which in turn have applications in commerce, education, intelligence, and health.

Crowdsourcing is a fast and inexpensive means to obtain massive amounts of annotations; however, unique challenges arise when annotating ostensibly subjective questions. I will present two approaches that leverage the combined strength and wisdom of the crowds to create large word-emotion, word-sentiment, and word-colour association lexicons: one approach has use Amazon’s Mechanical Turk, and the second approach capitalizes on hashtags in tweets. I will enumerate the challenges in both approaches and propose solutions to address them. I will also present the use of the generated lexicons for emotion analysis and personality trait classification. These lexicons gave us the competitive advantage in a recent SemEval-2013 competition on the sentiment analysis of tweets and SMS, where our submissions stood first in both tasks.


About the speaker:
After studying mathematical biology at the University of Toronto and the University of Chicago (where he explored computer simulation as a tool for studying the evolution of DNA), Roland developed an interest in natural language. In 1993, he received his Ph.D. in Computer Science from McGill University, with a thesis on applying decision trees to the understanding of spoken phrases.

In the course of his research career, Roland has studied a diverse set of problems in natural language processing, including automatic speech recognition, machine dialogue, speaker verification/identification, speech understanding, letter-to-sound systems, phoneme-based topic spotting, and most recently, machine translation. He has contributed new ideas to several of these areas, including the cache language model for speech recognition and eigenvoices for speaker adaptation and speaker verification/identification.

After working at the Centre de recherche informatique de Montréal (CRIM) as both a researcher and a senior researcher between 1992 and 1996, Roland held research and development positions with the Panasonic Speech Technology Laboratory in Santa Barbara, California (October 1996 to June 2004). He joined the National Research Council of Canada (NRC) in 2004. A citizen of Canada and Germany, Roland holds 30 US patents. He was a member of the IEEE Speech Technical Committee from 2002-2004 and he is a frequent reviewer and sometimes editor for journal and conference articles in the areas of machine translation and speech recognition.


Thursday, December 12, 2013 - 10:00 to 11:00