Introduction to part-of-speech (POS) tagging. POS tagsets: the Penn Treebank tagset and the Google Universal Tagset. Rule-based POS tagging. Stochastic part-of-speech tagging. Hidden markov models. Deleted interpolation. Linear and logistic regression: Maximum Entropy models. Transformation-based POS tagging. Handling out-of-vocabulary words. The Stanford POS tagger.
Home Page and Blog of the Multilingual NLP course @ Sapienza University of Rome
Saturday, March 19, 2016
Friday, March 11, 2016
Lecture 3: language modeling
We introduced N-gram models (unigrams, bigrams, trigrams), together with their probability modeling and issues. We discussed perplexity and its close relationship with entropy, we introduced smoothing and interpolation techniques to deal with the issue of data sparsity. The KYOTO and Berkley Language Model toolkits.
We also discussed the homework 1 in more detail (see slides on the class group).
We also discussed the homework 1 in more detail (see slides on the class group).
Monday, March 7, 2016
Lecture 2: morphological analysis
We introduced words and morphemes. Before delving into morphology and morphological analysis, we introduced regular expressions as a powerful tool to deal with different forms of a word. We then introduced recent work on morphological analysis based on pattern generalization. We assigned homework 1 for Wiktionary-based morphological analysis.
Subscribe to:
Posts (Atom)