Guest Talk: Natha Schneider

27 April 2015
Begin time: 
End time: 

Bridging the Gap: Integrated Development of Linguistic Resources and Analyzers for NLP

When building datasets and analyzers for NLP, the path from linguistic description to computational implementation need not be disjointed: the goals and methodologies of each can inform one another. This talk advocates a trajectory that tightly integrates linguistic data preparation and computational modeling methodologies. I will discuss analyzers built with a process encompassing 1) representation, 2) annotation, and 3) automation.

An example of this trajectory is a new framework for broad-coverage lexical semantic analysis, illustrated in the web reviews domain. First, we take a comprehensive approach to multiword expressions—such as "high school" and "put up with"—in sentence context (Schneider et al., 2014 in LREC and TACL). Our approach spans human annotation as well as statistical sequence modeling, with algorithmic enhancements to accommodate multiword expressions containing gaps. Second, we enrich the lexical segments with semantic supersense classes for noun, verb, and preposition expressions (Schneider et al., 2015 in NAACL-HLT and the Linguistic Annotation Workshop). This is joint work with others at CMU, CU-Boulder, and the University of Utah.

Also on the menu of the talk: an appetizer of recent NLP advances for syntax (and its interface with semantics). Dessert will preview ongoing and future work.