Natural Language Processing and Text Mining
This course will cover the main topics in Natural Language Processing by computational means, with emphasis in written texts. We will study a broad range of techniques for computational linguistics, statistical language analysis, text mining and machine learning, as applied to problems such as sense disambiguation, syntactic analysis, automatic translation, text classification and clustering, sentiment analysis and authorship assignment, among others.
1 Introduction to NLP (Sergio Jiménez)
Linguistics background, basic text processing, language models, collocations, textual and lexical similarity, word sense desambiguation
2 Parsing and translation (Thamar Solorio)
POS tagging & formal grammars of english, syntactic and statistical parsing, semantic role labeling, machine translation, code switching.
3 Text mining (Manuel Montes-y-Gómez)
Text classification, text clustering, distributional semantics, distributed representations, authorship attribution, author profiling
4 Advanced machine learning models for NLP (Fabio González)
Neural networks, deep learning, word embeddings, recurrent neural networks
Evaluation and grading policy
4 credits in 64 hours.
You can decide, during the first days, wheter to get a note, get a certificate, both or none.
References and resources
- D. Jurafsky and J. H. Martin, “Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition”, 3nd Ed.
- Bird S., Klein E., Loper E., “Natural language processing with Python”, O’Reilly Media, Inc., 2009.
- Feldman R., Sanger J., “The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data”, Cambridge University Press, 2006.
- Srivastava A., Sahami M., “Text Mining: Classification, Clustering, and applications”, Chapman and Hall, 2009.
|Jun 13||1. Introduction to NLP|
|Jun 14|| 1.1 NLP goals. |
1.2 Introduction to lexical similarity Slides
| LexSim V1 |
|Jun 15||1.3 Lexical similarity functions implementation|| LexSim V2 |
Results lexical similarity
|Jun 16||1.4 Workshop text similarity|| LexSim V3 |
Text similarity DataSet
Must read: Computational Linguistics and Deep Learning
word2vec Google News Pre-traning Model
| Assignment 1 |
|Jun 17||2. Parsing and translation |
2.1 Introduction Slides
2.2 Pre-processing Slides
|Must read: Computational Linguistics and Deep Learning|
|Jun 20|| 2.3 Language Models Slides |
2.4 Hidden Markov Models Example
|New notebook tutorial|
|Jun 21||2.5 Word classes and part of speech tagging Slides|| POS Tagging Exercise V1 Notebook (download) |
Language Models with KenLM Mac OS X Notebook (download)
|Jun 22|| 2.6 Parsing Slides |
2.7 Statistical parsing Slides
| Example CKY |
Example Prob CKY
| Assignment 2 |
|Jun 23|| 3. Text mining |
3.1 Introduction to text classification Slides
|Jun 24|| 3.2 Beyond the Bag-of-Words representation Slides |
|Jun 27|| 3.3 Non Conventional text classification techniques Slides |
|Jun 28|| 3.4 Authorship Analysis Slides || Assignment 3 |
|Jun 29|| 4. Advanced machine learning models for NLP Slides |
|Introduction video||Assignment 4.1 (download)|
|Jun 30||4.2 Recurrent Neural Networks Notebook|| Neural Networks video |
| Assignment 4.2 (download) |
Bible model - biblia.txt
Alternative model - reg1.txt
|Jul 1||4.3 Applications|
Ingeniero Fabio A. González Correo electrónico: firstname.lastname@example.org Teléfono: 3165000 ext: 14077/14011
Lina F. Rosales Correo electrónico: email@example.com