Thursday, May 28, 2020

Lecture 25 (28/05/2020, Google meet, 3 hours): machine translation and homework 3

Introduction to machine translation (MT) and history of MT. Overview of statistical MT. The EM algorithm for word alignment in SMT. Beam search for decoding. Introduction to neural machine translation: the encoder-decoder neural architecture; back translation; byte pair encoding. The BLEU evaluation score. Performances and recent improvements. Neural MT: the encoder-decoder architecture; advantages; results. Attention in NMT. Unsupervised machine translation. MASS.


End of the course!


Lecture 24 (26/05/2020, Google meet, 2 hours): semantic parsing

Semantic parsing: definition, comparison to Semantic Role Labeling, approaches, a recent approach in detail. The Abstract Meaning Representation (AMR) and Universal Conceptual Cognitive Annotations (UCCA) formalisms. Semantic parsing approaches.


Thursday, May 21, 2020

Lecture 23 (21/05/2020, Google meet, 3 hours): WSD and Semantic Role Labeling

Issues with WSD: the knowledge acquisition bottleneck and silver data generation. From word to sentence representations. Semantic roles. Resources: PropBank, VerbNet, FrameNet. Semantic Role Labeling (SRL): traditional features. State-of-the-art neural approaches.

Thursday, May 14, 2020

Lecture 21 (14/05/2020, Google meet, 3 hours): Word Sense Disambiguation

Introduction to Word Sense Disambiguation. Elements necessary for performing WSD. Supervised vs. unsupervised vs. knowledge-based WSD. Supervised WSD techniques. Neural WSD: LSTM and BERT-based approaches. Integration of knowledge and supervision.

Tuesday, May 12, 2020

Lecture 20 (12/05/2020, Google meet, 2 hours): XLM, XLNet, RoBERTa; Natural Language Understanding: Semantic Role Labeling; homework

XLNet, RoBERTa, XLM, XLM-R. The GLUE and SuperGLUE benchmarks. Introduction to Natural Language Understanding (NLU): Word Sense Disambiguation, Semantic Role Labeling, Semantic Parsing.


Thursday, May 7, 2020

Lecture 19 (07/05/2020, Google meet, 3 hours): Transformer (2/2) and BERT

The Transformer's encoder and decoder. Positional embeddings. BERT. Notebooks on BERT. Sense embeddings with WordNet and SemCor.

Wednesday, May 6, 2020

Lecture 18 (05/05/2020, Google meet, 2 hours): bilingual embeddings, contextualized word embeddings, ELMo, the Transformer

More on semantic vector representations. Bilingual and multilingual embeddings. Contextualized word embeddings. ELMo. The Transformer architecture.

Lecture 17 (30/04/2020, Google meet, 3 hours): BabelNet and sense embeddings

More on BabelNet. Introduction to semantic vector representations: motivation, examples. Semantic vector representations: importance of their multilinguality; linkage to BabelNet; latent vs. explicit representations; monolingual vs. multilingual representations. The NASARI lexical, unified and embedded representations..

Tuesday, April 28, 2020

Lecture 16 (28/04/2020, Google meet, 2 hours): lexical semantics

Introduction to lexical semantics. Lexiconlemmas and word forms. Word sensesmonosemy vs. polysemy. Special kinds of polysemy. Computational sense representationsenumeration vs. generation. Graded word sense assignment. Lexical knowledge resources: WordNet.

Thursday, April 23, 2020

Lecture 15 (23/04/2020, Google meet, 3 hours): neural syntactic parsing

Graph-based and transition-based neural syntactic parsing. Deep biaffine attention for neural syntactic parsing. 

 

Lecture 14 (21/04/2020, Google meet, 2 hours): more on syntactic parsing

Syntactic parsing: top-down and bottom-up. Structural ambiguity Backtracking vs. dynamic programming for parsing. The CKY algorithm.

Friday, April 17, 2020

Lecture 13 (16/04/2020, Google meet, 3 hours): language modeling hands-on and intro to syntactic parsing

Neural Language modeling with LSTMs. Neural network tricks: sentence packing; weight tying; custom dropout. Introduction to syntax. Context-free grammars and languages. Treebanks. Normal forms. Dependency grammars.

Thursday, April 9, 2020

Lecture 12 (07/04/2020, Google meet, 2 hrs): LSTMs and character-embeddings

Gated architectures, Long-Short Term Memory networks (LSTMs). Bidirectional LSTMs and stacked LSTMs. How to create an LSTM with PyTorch. Part-of-speech tagging with LSTMs. Character embeddings.

Friday, April 3, 2020

Lecture 11 (02/04/2020, Google meet, 3 hrs): RNNs, Part-of-speech tagging (3/3)

Word2vec and word embedding properties and regularities. Identifying and managing multiword expressions. Working with sequences: Recurrent Neural Networks (RNNs). More on POS tagging with RNNs. The CoNLL format. Homework assignment: Named Entity Recognition.


Tuesday, March 31, 2020

Lecture 10 (31/03/2020, Google meet, 2 hrs): Part-of-speech tagging (2/3)

Hidden markov models. Deleted interpolation. Linear and logistic regression: Maximum Entropy models; logit and logistic function; relationship to sigmoid and softmax. Transformation-based POS tagging. Handling out-of-vocabulary words.

Thursday, March 26, 2020

Lecture 9 (26/03/2020, Google meet, 3 hrs): Part-of-speech tagging (1/3)

Data preparation. Introduction to part-of-speech tagging. Universal POS tags. Stochastic part-of-speech tagging. Intro to Hidden markov models. More on word2vec: hierarchical softmax and negative sampling.

Tuesday, March 24, 2020

Lecture 8 (24/03/2020, Google meet, 2 hrs): perplexity, smoothing, interpolation

Chain rule and n-gram estimation. Perplexity and its close relationship with entropy. Smoothing and interpolation.

Thursday, March 19, 2020

Lecture 7 (19/03/2020, Google meet, 3 hrs): word2vec in PyTorch + language modeling

Word2vec in PyTorch. We introduced N-gram models (unigrams, bigrams, trigrams), together with their probability modeling and issues


Tuesday, March 17, 2020

Lecture 6 (17/03/2020, Google meet, 2 hrs): word2vec and its implementation

One-hot encodings and word embeddings. Introduction to word2vec. Differences between CBOW and skip-gram. The loss function in word2vec. Implementation with PyTorch.



Tuesday, March 10, 2020

Lecture 4 (10/03/2020 - 2.30 hrs): the Perceptron

Introduction to the Perceptron. Activation functions. Loss functions: MSE and CCE. Colab notebook. Language classification with the perceptron.



Tuesday, March 3, 2020

Lecture 3 (03/03/2020): PyTorch

Introduction to PyTorch. Introduction to deep learning for NLP. The perceptron. Colab notebook with PyTorch basics.


Friday, February 28, 2020

Lecture 2 (27/02/2020): more on NLP + introduction to machine learning and deep learning (1)

More on Natural Language Processing and its applications. Introduction to Machine Learning for Natural Language Processing: supervised vs. unsupervised vs. reinforcement learning. Features, feature vector representations.

Lecture 1 (25/02/2020): introduction to NLP

We gave an introduction to the course and the field it is focused on, i.e., Natural Language Processing and its challenges.

Saturday, February 1, 2020

Ready, steady, go!

Welcome to the Sapienza NLP course blog! This year there will be important changes:

  1. The course will contain lots of new content on deep learning and neural networks!
  2. For attending students, there will be only TWO homeworks (and no additional duty), one of which will be done with delivery by the end of September and will replace the project. Non-attending students, instead, will have to work on a full-fledged project.
  3. There will be cool challenges throuhgout the whole course, including the possibility of writing and publishing papers.

IMPORTANT: The 2020 class hour schedule will be on Tuesday 14-16 and on Thursday 14-17, Aula 1 - Aule L ingegneria, via del Castro Laurenziano.

Please sign up to the NLP class!