Natural Language Processing and Text Mining
This course will cover the main topics in Natural Language Processing by computational means, with emphasis in written texts. We will study a broad range of techniques for computational linguistics, statistical language analysis, text mining and machine learning, as applied to problems such as sense disambiguation, syntactic analysis, automatic translation, text classification and clustering, sentiment analysis and authorship assignment, among others.
Instructors
- Manuel Montes-y-Gómez , PhD - INAOE - Mexico
- Thamar Solorio , PhD - University of Houston - USA
- Sergio Jiménez , PhD - Universidad Nacional de Colombia - Colombia
- Fabio A. González , PhD - Universidad Nacional de Colombia - Colombia
Classroom
103 - 401
Course topics
1 Introduction to NLP (Sergio Jiménez)
Linguistics background, basic text processing, language models, collocations, textual and lexical similarity, word sense desambiguation
2 Parsing and translation (Thamar Solorio)
POS tagging & formal grammars of english, syntactic and statistical parsing, semantic role labeling, machine translation, code switching.
3 Text mining (Manuel Montes-y-Gómez)
Text classification, text clustering, distributional semantics, distributed representations, authorship attribution, author profiling
4 Advanced machine learning models for NLP (Fabio González)
Neural networks, deep learning, word embeddings, recurrent neural networks
Evaluation and grading policy
4 credits in 64 hours.
You can decide, during the first days, wheter to get a note, get a certificate, both or none.
Course resources
References and resources
- D. Jurafsky and J. H. Martin, “Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition”, 3nd Ed.
- Bird S., Klein E., Loper E., “Natural language processing with Python”, O’Reilly Media, Inc., 2009.
- Feldman R., Sanger J., “The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data”, Cambridge University Press, 2006.
- Srivastava A., Sahami M., “Text Mining: Classification, Clustering, and applications”, Chapman and Hall, 2009.
Course schedule
Date | Topic | Material | Assignments |
---|---|---|---|
Jun 13 | 1. Introduction to NLP | ||
Jun 14 | 1.1 NLP goals. 1.2 Introduction to lexical similarity Slides | LexSim V1 Words DataSet | |
Jun 15 | 1.3 Lexical similarity functions implementation | LexSim V2 Results lexical similarity | |
Jun 16 | 1.4 Workshop text similarity | LexSim V3 Text similarity DataSet Must read: Computational Linguistics and Deep Learning word2vec Google News Pre-traning Model | Assignment 1 Files |
Jun 17 | 2. Parsing and translation 2.1 Introduction Slides 2.2 Pre-processing Slides
| Must read: Computational Linguistics and Deep Learning | |
Jun 20 | 2.3 Language Models Slides
2.4 Hidden Markov Models Example | New notebook tutorial | |
Jun 21 | 2.5 Word classes and part of speech tagging Slides | POS Tagging Exercise V1 Notebook (download) Language Models with KenLM Mac OS X Notebook (download) | |
Jun 22 | 2.6 Parsing Slides 2.7 Statistical parsing Slides | Example CKY Example Prob CKY | Assignment 2 Notebook (download) |
Jun 23 | 3. Text mining 3.1 Introduction to text classification Slides
| ||
Jun 24 | 3.2 Beyond the Bag-of-Words representation Slides
| ||
Jun 27 | 3.3 Non Conventional text classification techniques Slides
| ||
Jun 28 | 3.4 Authorship Analysis Slides
| Assignment 3 Poems | |
Jun 29 | 4. Advanced machine learning models for NLP Slides 4.1 Introduction
| Introduction video | Assignment 4.1 (download) |
Jun 30 | 4.2 Recurrent Neural Networks Notebook | Neural Networks video Softmax video | Assignment 4.2 (download) Bible model - biblia.txt Alternative model - reg1.txt |
Jul 1 | 4.3 Applications |
Contact
Coordinación académica
Ingeniero Fabio A. González
Correo electrónico: fagonzalezo@unal.edu.co
Teléfono: 3165000 ext: 14077/14011
Monitora
Lina F. Rosales
Correo electrónico: lfrosalesc@unal.edu.co