Seminar on Introduction to Phrase-Based Statistical Machine Translation

  • Posted on: 11 May 2014
Title: Introduction to Phrase-Based Statistical Machine Translation

Speaker: Dr. Roland Kuhn
              Princiapl Research Officer
              National Research Council Canada 

Date: 9 December 2013 (Monday)
Time: 11:00am – 12:00noon 
Venue: Rm 906, William M. W. Mong Engineering Building, CUHK 

This talk will introduce the basic concepts of phrase-based SMT (statistical machine translation):

  • Phrase-based decoding
  • The main information sources of an SMT system: the phrase table, the N-gram language model, and the reordering model
  • The history of MT: rule-based MT, the introduction of statistical MT by the IBM speech recognition group in 1990-1992, and phrase-based SMT
  • Metrics for MT quality (especially BLEU)
  • Loglinear model combination
  • Error-driven algorithms for learning weights on models in a loglinear combination for SMT (MERT and MIRA).

The talk will draw heavily on experience with the National Research Council of Canada’s Portage system, one of the top SMT systems in the world.


About the speaker:
After studying mathematical biology at the University of Toronto and the University of Chicago (where he explored computer simulation as a tool for studying the evolution of DNA), Roland developed an interest in natural language. In 1993, he received his Ph.D. in Computer Science from McGill University, with a thesis on applying decision trees to the understanding of spoken phrases.

In the course of his research career, Roland has studied a diverse set of problems in natural language processing, including automatic speech recognition, machine dialogue, speaker verification/identification, speech understanding, letter-to-sound systems, phoneme-based topic spotting, and most recently, machine translation. He has contributed new ideas to several of these areas, including the cache language model for speech recognition and eigenvoices for speaker adaptation and speaker verification/identification.

After working at the Centre de recherche informatique de Montréal (CRIM) as both a researcher and a senior researcher between 1992 and 1996, Roland held research and development positions with the Panasonic Speech Technology Laboratory in Santa Barbara, California (October 1996 to June 2004). He joined the National Research Council of Canada (NRC) in 2004. A citizen of Canada and Germany, Roland holds 30 US patents. He was a member of the IEEE Speech Technical Committee from 2002-2004 and he is a frequent reviewer and sometimes editor for journal and conference articles in the areas of machine translation and speech recognition.

