Multilingual Natural Language Processing @ Sapienza: March 2014

Saturday, March 29, 2014

Lecture 4: Part-of-Speech Tagging

Introduction to part-of-speech (POS) tagging. POS tagsets: the Penn Treebank tagset and the Google Universal Tagset. Rule-based POS tagging. Stochastic part-of-speech tagging. Hidden markov models. Deleted interpolation. Linear and logistic regression: Maximum Entropy models. Transformation-based POS tagging. Handling out-of-vocabulary words.

Monday, March 24, 2014

Lecture 3: language modeling (2)

The third lecture was about language models. You discovered how important language models are and how we can approximate real language with them. N-gram models (unigrams, bigrams, trigrams) were discussed, together with their probability modeling and issues. We discussed perplexity and its close relationship with entropy, we introduced smoothing and interpolation techniques to deal with the issue of data sparsity.

In the second part of the class, we discussed the first homework in more detail.

Tuesday, March 18, 2014

Lecture 2: morphology and language modeling (1)

We introduced words and morphemes. Before delving into morphology and morphological analysis, we introduced regular expressions as a powerful tool to deal with different forms of a word. We also introduced finite state transducers for encoding the lexicon and orthographic rules. Today's lecture is about language models. We discussed the importance of language models and how we can approximate real language with them. We also introduced N-gram models (unigrams, bigrams, trigrams), together with their probability modeling and issues.

In the last part I talked about the first part of homework 1 (deadline: April 30th)! Be sure you know all the details by participating in the discussions on the google group. Don't miss the next class on Friday 21st!

Friday, March 7, 2014

Lecture 1: introduction

We gave an introduction to the course and the field it is focused on, i.e., Natural Language Processing, with a focus on the Turing Test as a tool to understand whether "machines can think". We also discussed the pitfalls of the test, including Searle's Chinese Room argument. We then provided examples of tasks in desperate need for accurate NLP: machine translation, summarizaiton, machine reading, question answering, information retrieval.

First class: Today at 4pm!

Dear students,

the first class will be this afternoon at 4pm, in Viale Regina Elena, 295, yellow building (informatica e statistica), third floor, room G50. I already invited to the google discussion group all the students who signed up for the group. See you later!

Pages