Instructor:Prof. Noah Smith History: Taught in Fall 2009 (Tuesday/Thursday 12-1:20 pm, Wean 5304) Prerequisite: permission of instructor
Course Description
Text-driven forecasting is an emerging collection of problems in which
text documents or document collections are automatically analyzed to
make specific, testable predictions about the future. Well-known
examples include predictions about stock or market behavior, product sales
patterns,
government elections, legislative activities, or public opinion polls.
While a research community focusing on these problems has yet to form,
this course is based on the following observations:
Forecasting provides a new driving force for research
in natural language processing. What level of "understanding" is needed for predictions to be accurate?
Forecasting is a unique machine
learning problem involving discrete non-IID data, time series, and
very natural evaluation against real-world events (i.e., did the model
correctly predict what would happen today?).
The rise of social media
(and non-news text more generally) and their availability on the web,
will inspire many new forecasting problems and datasets.
Focusing on tangible real-world predictions will provide a nexus for computer scientists to come together with domain experts to reason about language use and how it should be modeled.
Because people can never be expected to read all of the content relevant to a particular question about the future, intelligent text processing methods may be the only way such content can be fully exploited.
This twelve-credit seminar-project hybrid course aims to begin identifying
challenge problems and testing some solutions to them.
Format
The time and location are TBD; please contact the instructor if you are interested in participating.
The course will meet twice a week for the first month or so, operating like a seminar with discussion of two or three papers per week and brainstorming. The remainder of the semester will focus on team projects, which will be the bulk of the grade. Each team of approximately three students will build a system that uses a text database to make testable, future predictions.
A student wishing to audit the course will be expected to
attend the course meetings,
serve as an informal consultant to one of the teams and write a short "lessons
learned" paper at the end of the semester.
This course counts as a "lab" for LTI students.
Grading
Grades will be assigned based on participation in class discussions (40%) and the course project (60%).
Course Plan and Readings
Part 1: Seminar (roughly 1/3 of the semester)
Date
Readings to discuss
Notes
Tu 8-25
None; introductions, administrivia, and high-level discussion about the course.
Note that the classification techniques in this paper are very simplistic, from the point of view of machine learning as well as computational linguistics. Brendan's notes.
After deciding on project topics and forming teams, we will usually meet as a class once a week to discuss issues that come up in the projects and hear interim reports from each team. There may be some additional readings as well.