Introduction to syntax. Context-free grammars and languages. Treebanks. Normal forms. Dependency grammars. Syntactic parsing: top-down and bottom-up. Structural ambiguity. Backtracking vs. dynamic programming for parsing. The CKY algorithm. The Earley algorithm. Probabilistic CFGs (PCFGs). PCFGs for disambiguation: the probabilistic CKY algorithm. PCFGs for language modeling.
Home Page and Blog of the Multilingual NLP course @ Sapienza University of Rome
Thursday, April 11, 2013
Friday, April 5, 2013
Lecture 4: part-of-speech tagging
Introduction to part-of-speech (POS) tagging. POS tagsets: the Penn Treebank tagset and the Google Universal Tagset. Rule-based POS tagging. Stochastic part-of-speech tagging. Hidden markov models. Deleted interpolation. Linear and logistic regression: Maximum Entropy models. Transformation-based POS tagging. Handling out-of-vocabulary words.
Lecture 3: language models and smoothing
The third lecture was about language models. You discovered how important language models are and how we can approximate real language with them. N-gram models (unigrams, bigrams, trigrams) were discussed, together with their probability modeling and issues. We discussed perplexity and its close relationship with entropy, we introduced smoothing and interpolation techniques to deal with the issue of data sparsity.
Subscribe to:
Posts (Atom)