Multilingual Natural Language Processing @ Sapienza: March 2016

Saturday, March 19, 2016

Lecture 4: Part-of-Speech Tagging

Introduction to part-of-speech (POS) tagging. POS tagsets: the Penn Treebank tagset and the Google Universal Tagset. Rule-based POS tagging. Stochastic part-of-speech tagging. Hidden markov models. Deleted interpolation. Linear and logistic regression: Maximum Entropy models. Transformation-based POS tagging. Handling out-of-vocabulary words. The Stanford POS tagger.

Friday, March 11, 2016

Lecture 3: language modeling

We introduced N-gram models (unigrams, bigrams, trigrams), together with their probability modeling and issues. We discussed perplexity and its close relationship with entropy, we introduced smoothing and interpolation techniques to deal with the issue of data sparsity. The KYOTO and Berkley Language Model toolkits.

We also discussed the homework 1 in more detail (see slides on the class group).

Monday, March 7, 2016

Lecture 2: morphological analysis

We introduced words and morphemes. Before delving into morphology and morphological analysis, we introduced regular expressions as a powerful tool to deal with different forms of a word. We then introduced recent work on morphological analysis based on pattern generalization. We assigned homework 1 for Wiktionary-based morphological analysis.