Statistical Natural Language Processing: CSED523

Gary Geunbae Lee, Eng 2-211,, 279-2254

1.  Course objectives

This course introduces various recent statistical methods in natural language processing. We will cover basic statistical tools for computational linguistics and their application to part-of-speech tagging, statistical parsing, word sense disambiguation, sentiment analysis, text categorization, machine translation, information retrieval and statistical language modeling.  We also briefly touch on some topics of statistical language models for speech recognition and text-to-speech systems, and recent deep learning models for natural language processing.


2. Course prerequisites

no required pre-requisite


3. Grading

midterm 40%
final 40%
presentation 20%


4  texts or references

Manning, C. D., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press. 1999. ISBN 0-262-13360-1.

Jurafsky, D. and J. H. Martin: Speech and Language Processing. Prentice-Hall. 2009. 2nd edition (3rd edition, 2019 draft:

Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing


5. Others

    instruction language: English
presentation: recent papers from ACL, NAACL, COLING, SIGIR, NeuroIPS for statistical/neural NLP applications (choose your own topic and discuss with the instructor)


6. Course schedule


Mathematical foundation

Linguistic essentials

Text processing-Collocations

Statistical inference: n-gram language modeling


Markov Models (HMM) / Maximum entropy

Deep learning NLP1 / Deep learning NLP2

POS tagging / Probabilistic parsing (PCFG)

Semantic Processing / Spoken language understanding

Statistical machine translation / Neural machine translation

Information extraction/ Application-IR-QA-sum

Automatic speech recognition / Text-to-speech