![]() |
Computational Linguistics I
CMSC 723 Fall 2010
|
![]() |
Computational linguistics (CL) is the science of doing what linguists
do with language, but using computers. Natural language processing
(NLP) is getting computers to do what people do with language.
Despite the title, most of this course is actually about NLP, not CL!
But we'll hit a lot of linguistics along the way.
This course is indended to be a broad introduction to what is a very
broad field. We will cover both rule-based and statistical approaches
to a wide variety of challenging problems in natural language
processing. We will discover that language ambiguity is the
rabid wolf of NLP, and develop techniques to try to tame it. Along
the way, we will see some linguistic theories developed specifically
for computational linguistics, which sheds some like on what sort of
linguistic models make sense, computationally.
Prerequisites: You must be able to program. You must find
language interesting. If you cannot write breadth- and depth-first
search in your programming language of choice in under an hour, you
will struggle in this class. If you cannot find humor in the sentence
pair "I ate spaghetti with meatballs / I ate spaghetti with a fork"
then you might not enjoy the class. Linguistic background is not
necessary, though of course it never hurts.
The purpose of grading (in my mind) is to provide extra incentive for
you to keep up with the material and to ensure that you exit the class
as a computational linguistics genius. If everyone gets an A, that
would make me happy (sadly, it hasn't happened yet). The components
of grading are:| 40% | Programming projects There are four programming projects, each worth 10% of your final grade. You will be graded on both code correctness as well as your analysis of the results. These must be completed in teams of two or three students, with cross-department (CS to linguistics) teams highly encouraged. | |
| 30% | Written homeworks There are eleven written homeworks (one per week), each is worth 3% of your final grade. They will be graded on a high-pass (100%), low-pass (50%) or fail (0%) basis. These are to be completed individually. Your lowest scoring homework will be dropped. (The initial homework, HW00, is not graded, but required if you do not want to fail.) | |
| 25% | Final project Everyone is to complete a final project, in teams of size up to three. We will discuss the scope of the project later in class. | |
| 5% | Class participation You will be graded on your in-class presentations of homework questions and other general participation, including participation in the comments on the blog. This is mostly subjective. |
![]() |
The textbook is the new-ish book by Jurafsky and Martin, Speech and Language Processing
(Second Edition) (ISBN 978-0-13-605234-0).
Other
recommended (but not required) books:
|
| Date | Topics | Readings | Due | Notes |
| 31 Aug |
Welcome to Computational Linguistics
What is this class about, linguistic phenomena |
- | - | blog |
| 02 Sep |
History and Approaches
Initial attempts, ALPAC, statistics and data |
1-1.6 | HW00 | blog |
| 07 Sep |
Regular Languages
Finite state machines and morphology |
2-2.2, 3-3.3 | - | blog |
| 09 Sep |
Probability and Statistics
A refresher, with a language focus |
4-4.3, 4.10-4.11 | HW01 | blog |
| 14 Sep |
N-gram models
Language modeling and smoothing |
4.4-4.6 | - | blog |
| 16 Sep |
Part of Speech Tagging
Rule-based approaches |
5.1-5.4 | HW02 | blog |
| 21 Sep |
Part of Speech Tagging II
Hidden Markov Models and the Viterbi algorithm |
5.5, 5.8 | - | - |
| 23 Sep |
Context Free Grammars
Expressivity, X-bar theory and parsing as search |
12-12.3, 12.5, 13-13.3 |
HW03 | blog |
| 28 Sep |
Context Free Grammars II
Dynamic programming and the CKY algorithm |
13.4,X-bar_theory | - | blog |
| 30 Sep |
No class: finish up P1
(Deadline extended 2 hours) |
- | P1 | - |
| 05 Oct | No class: Hal sick (again) :( | - | HW04 | - |
| 07 Oct |
Statistical Parsing
From treebanks to grammars, and Markovization |
12.4, 14-14.4 | - | blog |
| 12 Oct |
Incorporating Context
Features-based grammars, unification |
15-15.4 | - | blog |
| 14 Oct |
Representing Meaning
First-order logic |
17-17.3, 18.4 | HW05 | blog |
| 19 Oct |
Interpreting Text
Interpretation as abduction |
abduct (sec 1-3) | P2 | blog |
| 21 Oct |
Linguistic Challenges
Metaphor, metonymy, time, scope, quantifiers, etc. Plus FINAL PROJECT INFO |
17.4, 18.3, 18.6, 19.6 |
HW06 | blog |
| 26 Oct |
Computational Lexical Semantics
Word sense disambiguation + midterm |
19-19.3 | - | blog |
| 28 Oct |
Computational Semantics
Semantic roles and frames |
19.4-19.5 | HW07 | blog |
| 02 Nov |
Classification with Decision Trees
Learning, generalization and features |
dt | - | blog |
| 04 Nov |
Linear Models for Learning
Perceptron learning for sentiments |
Perceptron | HW08 | blog |
| 09 Nov |
Sequential Learning
Named entity recognition |
22.1 | - | blog |
| 11 Nov |
Using World Knowledge
Bootstrapping knowledge from text |
20.5, boot | HW09 | blog |
| 16 Nov |
Local Discourse Context
Anaphors, antecedents and coreference |
21.3-21.7 | - | blog |
| 18 Nov |
Document Coherence
TexTiling and argumentative zoning |
21-21.1, zone | HW10 | blog |
| 23 Nov |
Hierarchical Text Structure
Rhetorical structure theory and the discourse treebank |
21.2, discourse | P3 | blog |
| 30 Nov | Information Extraction | 22.2, 22.4 | HW11 | blog |
| 02 Dec | Mapping Text to Actions | mapping | - | blog |
| 07 Dec | Machine Translation | 25-25.5 | - | - |
| 09 Dec | Automatic Document Summarization | 23.3-23.6 | P4 | - |
| 17 Dec | Final Exam and Final Projects Due | - | - | - |