Multilingual Natural Language Processing @ Sapienza: May 2015

Monday, May 25, 2015

Lecture 10: semantic similarity and relatedness / Natural Language Generation

What is semantic relatedness? String-based similarity measures. Longest common substring/subsequence; n-gram overlap. Knowledge-based approaches: Lesk; Leacock & Chodorow; Wu & Palmer. Corpus-based approaches: Vector-space models, Explicit Semantic Analysis (ESA). Align, Disambiguate and Walk. Cross-level semantic similarity.

Introduction to Natural Language Generation, by prof. Michael Zock.

Friday, May 15, 2015

Lecture 9: statistical machine translation

Introduction to Machine Translation. Rule-based vs. Statistical MT. Statistical MT: the noisy channel model. The language model and the translation model. The phrase-based translation model. Learning a model of training. Phrase-translation tables. Parallel corpora. Extracting phrases from word alignments. Word alignments.

IBM models for word alignment. Many-to-one and many-to-many alignments. IBM model 1 and the HMM alignment model. Training the alignment models: the Expectation Maximization (EM) algorithm. Symmetrizing alignments for phrase-based MT: symmetrizing by intersection; the growing heuristic. Calculating the phrase translation table. Decoding: stack decoding. Evaluation of MT systems. BLEU.

Presentation of the NLP projects.

Friday, May 8, 2015

Lecture 8: Neural networks, word embeddings and deep learning

Motivation. The perceptron. Input encoding, sum and activation functions; objective function. Linearity of the perceptron. Neural networks. Training. Backpropagation. Connection to Maximum Entropy. Connection to language. Vector representations. NN for the bigram language model. Word2vec: CBOW and skip-gram. Word embeddings. Deep learning. Language modeling with NN. The big picture.