Title: The Statistical Problem of Language Acquisition
Speaker: Mark Steedman (Informatics, University of Edinburgh)
When: Thursday 15.45
Where: Aula Seminari, Via Salaria, 113 (third floor)
The talk reports recent work with Tom Kwiatkowski, Sharon Goldwater,
and Luke Zettlemoyer on semantic parser induction by machine from a
number of corpora pairing sentences with logical forms, including
GeoQuery, ATIS, and a corpus consisting of real child-directed utterance from
the CHILDES corpus.
The problem of semantic parser induction and child language acquisition
are both similar to the problem of inducing a grammar and a
parsing model from a treebank such as the Penn treebank, except that
the trees are unordered logical forms, in which the preterminals
are not aligned with words in the target language, and there may be
noise and spurious distracting logical forms supported by the context
but irrelevant to the utterance.
The talk shows that this class of problem can be solved if the child
or machine initially parses with the entire space of possibilities
that universal grammar allows under the assumptions of the Combinatory
Categorial theory of grammar (CCG), and learns a statistical
parsing model for that space using EM-related methods such
as Variational Bayes learning.
This can be done without all-or-none "parameter-setting" or attendant
"triggers", and without invoking any "subset principle" of the kind
proposed in linguistic theory, provided the system is presented with a
representative sample of reasonably short string-meaning pairs from
the target language.
Bio: Mark Steedman is Professor of Cognitive Science in the School of
Informatics at the University of Edinburgh, to which he moved in 1998
from the University of Pennsylvania, where he taught as
Professor in the Department of Computer and Information Science. He
is a Fellow of the British Academy, the Royal Society of Edinburgh,
the American Association for Artificial Intelligence, and the European
His research covers a range of problems in computational linguistics,
artificial intelligence, computer science, and cognitive science,
including syntax and semantics of natural language, and parsing and
comprehension of natural language discourse by humans and by machine
using Combinatory Categorial Grammar (CCG). Much of his current NLP
research concerns wide-coverage parsing for robust semantic
interpretation and natural language inference, and the problem of
inducing such grammars from data and grounded meaning representations,
including those arising in robotics domains. Some of his research
concerns the analysis of music using robust NLP methods.
Thursday, May 31, 2012
Saturday, May 26, 2012
A class on the BabelNet APIs for querying BabelNet and Wikipedia and for performing multilingual Word Sense Disambiguation, and other useful APIs for Wiktionary and other resources.
Knowledge-based Word Sense Disambiguation. The Lesk and Extended Lesk algorithm. Structural approaches: similarity measures and graph algorithms. Conceptual density. Structural Semantic Interconnections. Evaluation: precision, recall, F1, accuracy. Baselines. The Senseval and SemEval evaluation competitions. Applications of Word Sense Disambiguation. Issues: representation of word senses, domain WSD, the knowledge acquisition bottleneck.
Wednesday, May 16, 2012
Supervised Word Sense Disambiguation: pros and cons. Vector representation of context. Main supervised disambiguation paradigms: decision trees, neural networks, instance-based learning, Support Vector Machines. Unsupervised Word Sense Disambiguation: Word Sense Induction. Context-based clustering. Co-occurrence graphs: curvature clustering, HyperLex.
Friday, May 11, 2012
The NLP projects are online (2012_project.pdf). Please start your discussion! Topics today: introduction to Word Sense Disambiguation (WSD). Motivation. The typical WSD framework. Lexical sample vs. all-words. WSD viewed as lexical substitution and cross-lingual lexical substitution. Knowledge resources. Representation of context: flat and structured representations. Main approaches to WSD: Supervised, unsupervised and knowledge-based WSD. Two important dimensions: supervision and knowledge.
Friday, May 4, 2012
Lexemes, lexicon, lemmas and word forms. Word senses: monosemy vs. polysemy. Special kinds of polysemy. Computational sense representations: enumeration vs. generation. Graded word sense assignment. Encoding word senses: paper dictionaries, thesauri, machine-readable dictionary, computational lexicons. WordNet. Wordnets in other languages. BabelNet.