Conference program for EMNLP 2011
Wednesday, July 27, 2011
Opening remarks and Invited Talk
Location: Pentland — Chair: Paola Merlo
9:00—10:00Object Detection Grammars David McAllester
Session 1: Plenary session
Location: Pentland — Chair: Jason Eisner
11:00—11:25Fast and Robust Joint Models for Biomedical Event Extraction Sebastian Riedel and Andrew McCallum11:25—11:50Predicting Thread Discourse Structure over Technical Web Forums Li Wang, Marco Lui, Su Nam Kim, Joakim Nivre and Timothy Baldwin11:50—12:15Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation Yin-Wen Chang and Michael Collins12:15—12:40Optimal Search for Minimum Error Rate Training Michel Galley and Chris Quirk
Session 2A: Syntax and Parsing
Location: Pentland East — Chair: Stephen Clark
14:10—14:35Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance Shay B. Cohen, Dipanjan Das and Noah A. Smith14:35—15:00Multi-Source Transfer of Delexicalized Dependency Parsers Ryan McDonald, Slav Petrov and Keith Hall15:00—15:25SMT Helps Bitext Dependency Parsing Wenliang Chen, Jun'ichi Kazama, Min Zhang, Yoshimasa Tsuruoka, Yujie Zhang, Yiou Wang, Kentaro Torisawa and Haizhou Li15:25—15:50Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP Federico Sangati and Willem Zuidema
Session 2B: Semantics
Location: Prestonfield — Chair: Mirella Lapata
14:10—14:35A Generate and Rank Approach to Sentence Paraphrasing Prodromos Malakasiotis and Ion Androutsopoulos14:35—15:00Correcting Semantic Collocation Errors with L1-induced Paraphrases Daniel Dahlmeier and Hwee Tou Ng15:00—15:25Class Label Enhancement via Related Instances Zornitsa Kozareva, Konstantin Voevodski and Shanghua Teng15:25—15:50A Joint Model for Extended Semantic Role Labeling Vivek Srikumar and Dan Roth
Session 2C: Sentiment Analysis and Opinion Mining
Location: Pentland West — Chair: Bo Pang
14:10—14:35Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews Jianxing Yu, Zheng-Jun Zha, Meng Wang, Kai Wang and Tat-Seng Chua14:35—15:00Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng and Christopher D. Manning15:00—15:25Unsupervised Discovery of Discourse Relations for Eliminating Intra-sentence Polarity Ambiguities Lanjun Zhou, Binyang Li, Wei Gao, Zhongyu Wei and Kam-Fai Wong15:25—15:50Compositional Matrix-Space Models for Sentiment Analysis Ainur Yessenalina and Claire Cardie
Session 3A: Machine Translation
Location: Pentland East — Chair: Phil Blunsom
16:20—16:45Training a Parser for Machine Translation Reordering Jason Katz-Brown, Slav Petrov, Ryan McDonald, Franz Och, David Talbot, Hiroshi Ichikawa, Masakazu Seno and Hideto Kazawa16:45—17:10Inducing Sentence Structure from Parallel Corpora for Reordering John DeNero and Jakob Uszkoreit17:10—17:35Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax Jiajun Zhang, Feifei Zhai and Chengqing Zong17:35—18:00A novel dependency-to-string model for statistical machine translation Jun Xie, Haitao Mi and Qun Liu
Session 3B: NLP related Machine Learning
Location: Prestonfield — Chair: David Smith
16:20—16:45Bayesian Checking for Topic Models David Mimno and David Blei16:45—17:10Dual Decomposition with Many Overlapping Components Andre Martins, Noah Smith, Mario Figueiredo and Pedro Aguiar17:10—17:35Approximate Scalable Bounded Space Sketch for Large Data NLP Amit Goyal and Hal Daume III17:35—18:00Optimizing Semantic Coherence in Topic Models David Mimno, Hanna Wallach, Edmund Talley, Miriam Leenders and Andrew McCallum
Session 3C: Discourse Dialogue and Pragmatics
Location: Pentland West — Chair: Oliver Lemon
16:20—16:45A Weakly-supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo, Anna Korhonen and Thierry Poibeau16:45—17:10Linear Text Segmentation Using Affinity Propagation Anna Kazantseva and Stan Szpakowicz17:10—17:35Minimally Supervised Event Causality Identification Quang Do, Yee Seng Chan and Dan Roth17:35—18:00A Model of Discourse Predictions in Human Sentence Processing Amit Dubey, Frank Keller and Patrick Sturt
Thursday, July 28, 2011
Session 4: Plenary session
Location: Pentland — Chair: Michael Collins
9:05—9:30✔ Simple Effective Decipherment via Combinatorial Optimization Taylor Berg-Kirkpatrick and Dan KleinWe present a simple objective function that when optimized yields accurate solutions to both decipherment and cognate pair identification problems. The objective simultaneously scores a matching between two alphabets and a matching between two lexicons, each in a different language. We introduce a simple coordinate descent procedure that efficiently finds effective solutions to the resulting combinatorial optimization problem. Our system requires only a list of words in both languages as input, yet it competes with and surpasses several state-of-the-art systems that are both substantially more complex and make use of more information. [PDF]
9:30—9:55✔ Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João Graça and Benjamin SnyderIn this paper, we consider the problem of unsupervised morphological analysis from a new angle. Past work has endeavored to design unsupervised learning methods which explicitly or implicitly encode inductive biases appropriate to the task at hand. We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. We define a universal morphological feature space in which every language and its morphological analysis reside. We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled language which lies as close as possible in the feature space to a training language. We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDL-based approach. Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases. [PDF]
9:55—10:20Training a Log-Linear Parser with Loss Functions via Softmax-Margin Michael Auli and Adam Lopez
Session 5A: Machine Translation
Location: Pentland East — Chair: Philipp Koehn
11:00—11:25✔ Large-Scale Cognate Recovery David Hall and Dan KleinWe present a system for the large scale induction of cognate groups. Our model explains the evolution of cognates as a sequence of mutations and innovations along a phylogeny. On the task of identifying cognates from over 21,000 words in 218 different languages from the Oceanic language family, our model achieves a cluster purity score over 91%, while maintaining pairwise recall over 62%. [PDF]
11:25—11:50Domain Adaptation via Pseudo In-Domain Data Selection amittai axelrod, xiaodong he and jianfeng gao11:50—12:15Language Models for Machine Translation: Original vs. Translated Texts Gennadi Lembersky, Noam Ordan and Shuly Wintner12:15—12:40Better Evaluation Metrics Lead to Better Machine Translation Chang Liu, Daniel Dahlmeier and Hwee Tou Ng
Session 5B: Syntax and Parsing
Location: Prestonfield — Chair: Ryan McDonald
11:00—11:25Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation Reut Tsarfaty, Joakim Nivre and Evelina Andersson11:25—11:50Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus Emily M. Bender, Dan Flickinger, Stephan Oepen and Yi Zhang11:50—12:15Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming Kristian Woodsend and Mirella Lapata12:15—12:40Bootstrapping Semantic Parsers from Conversations Yoav Artzi and Luke Zettlemoyer
Session 5C: Summarization and Generation
Location: Pentland West — Chair: Johanna Moore
11:00—11:25Timeline Generation through Evolutionary Trans-Temporal Summarization Rui Yan, Liang Kong, Congrui Huang, Xiaojun Wan, Xiaoming Li and Yan Zhang11:25—11:50Corpus-Guided Sentence Generation of Natural Images Yezhou Yang, Ching Teo, Hal Daume III and Yiannis Aloimonos11:50—12:15Corroborating Text Evaluation Results with Heterogeneous Measures Enrique Amigó, Julio Gonzalo, Jesus Gimenez and Felisa Verdejo12:15—12:40Ranking Human and Machine Summarization Systems Peter Rankel, John Conroy, Eric Slud and Dianne O'Leary
Session 6A: Machine Translation
Location: Pentland East — Chair: Stefan Riezler
14:10—14:35Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel and Noah A. Smith14:35—15:00A Word Reordering Model for Improved Machine Translation Karthik Visweswariah, Rajakrishnan Rajkumar, Ankur Gandhe, Ananthakrishnan Ramanathan and Jiri Navratil15:00—15:25Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation Jason Riesa, Ann Irvine and Daniel Marcu15:25—15:50Efficient retrieval of tree translation examples for Syntax-Based Machine Translation Fabien Cromieres and Sadao Kurohashi
Session 6B: Semantics
Location: Prestonfield — Chair: Hwee Tou Ng
14:10—14:35A generative model for unsupervised discovery of relations and argument classes from clinical texts Bryan Rink and Sanda Harabagiu14:35—15:00Random Walk Inference and Learning in A Large Scale Knowledge Base Ni Lao, Tom Mitchell and William W. Cohen15:00—15:25Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases Matthias Hartung and Anette Frank15:25—15:50Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions Weiwei Guo and Mona Diab
Session 6C: Sentiment Analysis and Opinion Mining
Location: Pentland West — Chair: Benjamin Snyder
14:10—14:35✔ Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs Samuel Brody and Nicholas DiakopoulosWe present an automatic method which leverages word lengthening to adapt a sentiment lexicon specifically for Twitter and similar social messaging networks. The contributions of the paper are as follows. First, we call attention to lengthening as a widespread phenomenon in microblogs and social messaging, and demonstrate the importance of handling it correctly. We then show that lengthening is strongly associated with subjectivity and sentiment. Finally, we present an automatic method which leverages this association to detect domain-specific sentiment- and emotion-bearing words. We evaluate our method by comparison to human judgments, and analyze its strengths and weaknesses. Our results are of interest to anyone analyzing sentiment in microblogs and social networks, whether for research or commercial purposes. [PDF]
14:35—15:00Personalized Recommendation of User Comments via Factor Models Deepak Agarwal, Bee-Chung Chen and Bo Pang15:00—15:25Data-Driven Response Generation in Social Media Alan Ritter, Colin Cherry and William B. Dolan15:25—15:50Predicting a Scientific Community's Response to an Article Dani Yogatama, Michael Heilman, Brendan O'Connor, Chris Dyer, Bryan R. Routledge and Noah A. Smith
Session 7A: Phonology Morphology Tagging Chunking and Segmentation
Location: Pentland East — Chair: Noah Smith
16:20—16:45✔ Non-parametric Bayesian Segmentation of Japanese Noun Phrases Yugo Murawaki and Sadao KurohashiA key factor of high quality word segmentation for Japanese is a high-coverage dictionary, but it is costly to manually build such a lexical resource. Although external lexical resources for human readers are potentially good knowledge sources, they have not been utilized due to differences in segmentation criteria. To supplement a morphological dictionary with these resources, we propose a new task of Japanese noun phrase segmentation. We apply non-parametric Bayesian language models to segment each noun phrase in these resources according to the statistical behavior of its supposed constituents in text. For inference, we propose a novel block sampling procedure named hybrid type-based sampling, which has the ability to directly escape a local optimum that is not too distant from the global optimum. Experiments show that the proposed method efficiently corrects the initial segmentation given by a morphological analyzer. [PDF]
16:45—17:10✔ Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model Markus Dreyer and Jason EisnerWe present an inference algorithm that organizes observed words (tokens) into structured inflectional paradigms (types). It also naturally predicts the spelling of unobserved forms that are missing from these paradigms, and discovers inflectional principles (grammar) that generalize to wholly unobserved words.
Our Bayesian generative model of the data explicitly represents tokens, types, inflections, paradigms, and locally conditioned string edits. It assumes that inflected word tokens are generated from an infinite mixture of inflectional paradigms (string tuples). Each paradigm is sampled all at once from a graphical model, whose potential functions are weighted finite-state transducers with language-specific param- eters to be learned. These assumptions naturally lead to an elegant empirical Bayes inference procedure that exploits Monte Carlo EM, belief propagation, and dynamic programming. Given 50-100 seed paradigms, adding a 10-million-word corpus reduces prediction error for morphological inflections by up to 10%. [PDF]
17:10—17:35Multilayer Sequence Labeling Ai Azuma and Yuji Matsumoto17:35—18:00✔ A Bayesian Mixture Model for PoS Induction Using Multiple Features Christos Christodoulopoulos, Sharon Goldwater and Mark SteedmanIn this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. By using a mixture model rather than a sequence model (e.g., HMM), we are able to easily add multiple kinds of features, including those at both the type level (morphology features) and token level (context and alignment features, the latter from parallel corpora). Using only context features, our system yields results comparable to state-of-the art, far better than a similar model without the one-class-per-type constraint. Using the additional features provides added benefit, and our final system outperforms the best published results on most of the 25 corpora tested. [PDF]
Session 7B: Semantics
Location: Prestonfield — Chair: Mark Stevenson
16:20—16:45Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus Su Nam Kim and Preslav Nakov16:45—17:10Linguistic Redundancy in Twitter Fabio Massimo Zanzotto, Marco Pennaccchiotti and Kostas Tsioutsiouliklis17:10—17:35Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora Matteo Negri, Luisa Bentivogli, Yashar Mehdad, Danilo Giampiccolo and Alessandro Marchetti17:35—18:00Literal and Metaphorical Sense Identification through Concrete and Abstract Context Peter Turney, Yair Neuman, Dan Assaf and Yohai Cohen
Session 7C: Spoken Language and IR/QA
Location: Pentland West — Chair: Steve Renals
16:20—16:45Syntactic Decision Tree LMs: Random Selection or Intelligent Design? Denis Filimonov and Mary Harper16:45—17:10The Imagination of Crowds: Conversational AAC Language Modeling using Crowdsourcing and Large Data Sources Keith Vertanen and Per Ola Kristensson17:10—17:35Using Syntactic and Semantic Structural Kernels for Classifying Definition Questions in Jeopardy! Alessandro Moschitti, Jennifer Chu-carroll, Siddharth Patwardhan, James Fan and Giuseppe Riccardi17:35—18:00Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French Spence Green, Marie-Catherine de Marneffe, John Bauer and Christopher D. Manning
Friday, July 29, 2011
Session 8: Plenary session
Location: Pentland — Chair: Shuly Wintner
9:05—9:30Unsupervised Semantic Role Induction with Graph Partitioning Joel Lang and Mirella Lapata9:30—9:55Structural Opinion Mining for Graph-based Sentiment Representation Yuanbin Wu, Qi Zhang, Xuanjing Huang and Lide Wu9:55—10:20Summarize What You Are Interested In: An Optimization Framework for Interactive Personalized Summarization Rui Yan, Jian-Yun Nie and Xiaoming Li
Session 9A: Machine Translation
Location: Pentland — Chair: John DeNero
10:50—11:15Tuning as Ranking Mark Hopkins and Jonathan May11:15—11:40Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation. Ashish Venugopal, Jakob Uszkoreit, David Talbot, Franz Och and Juri Ganitkevitch11:40—12:05Hierarchical Phrase-based Translation Representations Gonzalo Iglesias, Cyril Allauzen, William Byrne, Adrià de Gispert and Michael Riley12:05—12:30✔ Improved Transliteration Mining Using Graph Reinforcement Ali El Kahki, Kareem Darwish, Ahmed Saad El Din, Mohamed Abd El-Wahab, Ahmed Hefny and Waleed AmmarMining of transliterations from comparable or parallel text can enhance natural language processing applications such as machine translation and cross language information retrieval. This paper presents an enhanced transliteration mining technique that uses a generative graph reinforcement model to infer mappings between source and target character sequences. An initial set of mappings are learned through automatic alignment of transliteration pairs at character sequence level. Then, these mappings are modeled using a bipartite graph. A graph reinforcement algorithm is then used to enrich the graph by inferring additional mappings. During graph reinforcement, appropriate link reweighting is used to promote good mappings and to demote bad ones. The enhanced transliteration mining technique is tested in the context of mining transliterations from parallel Wikipedia titles in 4 alphabet-based languages pairs, namely English-Arabic, English-Russian, English-Hindi, and English-Tamil. The improvements in F1-measure over the baseline system were 18.7, 1.0, 4.5, and 32.5 basis points for the four language pairs respectively. The results herein outperform the best reported results in the literature by 2.6, 4.8, 0.8, and 4.1 basis points for the four languages respectively. [PDF]
Session 9B: Semantics
Location: Prestonfield — Chair: Peter Turney
10:50—11:15Experimental Support for a Categorical Compositional Distributional Model of Meaning Edward Grefenstette and Mehrnoosh Sadrzadeh11:15—11:40Cross-Cutting Models of Lexical Semantics Joseph Reisinger and Raymond Mooney11:40—12:05Reducing Grounded Learning Tasks To Grammatical Inference Benjamin Börschinger, Bevan K. Jones and Mark Johnson12:05—12:30Relation Extraction with Relation Topics Chang Wang, James Fan, Aditya Kalyanpur and David Gondek
Session 9C: Information Extraction
Location: Kirkland — Chair: Alessandro Moschitti
10:50—11:15Extreme Extraction — Machine Reading in a Week Marjorie Freedman, Lance Ramshaw, Elizabeth Boschee, Ryan Gabbard, Gary Kratkiewicz, Nicolas Ward and Ralph Weischedel11:15—11:40Discovering Relations between Noun Categories Thahir Mohamed, Estevam Hruschka and Tom Mitchell11:40—12:05Structured Relation Discovery using Generative Models Limin Yao, Aria Haghighi, Sebastian Riedel and Andrew McCallum12:05—12:30Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances Burr Settles
Session 10A: Syntax and Parsing
Location: Pentland — Chair: Mark Steedman
14:10—14:35Third-order Variational Reranking on Packed-Shared Dependency Forests Katsuhiko Hayashi, Taro Watanabe, Masayuki Asahara and Yuji Matsumoto14:35—15:00Training dependency parsers by jointly optimizing multiple objectives Keith Hall, Ryan McDonald, Jason Katz-Brown and Michael Ringgaard15:00—15:25Structured Sparsity in Structured Prediction Andre Martins, Noah Smith, Mario Figueiredo and Pedro Aguiar15:25—15:50Lexical Generalization in CCG Grammar Induction for Semantic Parsing Tom Kwiatkowski, Luke Zettlemoyer, Sharon Goldwater and Mark Steedman
Session 10B: Information Extraction
Location: Prestonfield — Chair: Sebastian Riedel
14:10—14:35Named Entity Recognition in Tweets: An Experimental Study Alan Ritter, Sam Clark, Mausam Mausam and Oren Etzioni14:35—15:00Identifying Relations for Open Information Extraction Anthony Fader, Stephen Soderland and Oren Etzioni15:00—15:25Active Learning with Amazon Mechanical Turk Florian Laws, Christian Scheible and Hinrich Schütze15:25—15:50Bootstrapped Named Entity Recognition for Product Attribute Extraction Duangmanee Putthividhya and Junling Hu
Session 10C: Text Mining and NLP Applications
Location: Kirkland — Chair: Alexandre Klementiev
14:10—14:35Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI, Sachiko MASKAWA and Mizuki MORITA14:35—15:00A Simple Word Trigger Method for Social Tag Suggestion Zhiyuan Liu, Xinxiong Chen and Maosong Sun15:00—15:25Rumor has it: Identifying Misinformation in Microblogs Vahed Qazvinian, Emily Rosengren, Dragomir R. Radev and Qiaozhu Mei15:25—15:50Exploiting Parse Structures for Native Language Identification Sze-Meng Jojo Wong and Mark Dras
Best Paper Award and Closing
Location: Pentland — Chair: Mark Johnson
16:20—17:10A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng