This course will be designed around the challenge problem of making computers aware of the everyday visual world i.e. process images or video to be able to recognize categories such as cars, buses, tigers, zebras, rooms, doors, telephones, faces, arms and hands as well as actions such as running, jumping and kicking. Topics will include a survey of human visual recognition: perception and physiology, recognition in the presence of transformations, local matching techniques, global matching techniques, segmentation as a front end, motion descriptors for action recognition, as well as case studies of recognition in different domains. I have a specific list of about 300 visual categories to focus our thoughts.
Lecture Topics
Introduction: Characteristics of visual recognition. Prototypes
and affordances. Basic, Superordinate and subordinate categories
(reference: Palmer, Chapter 9)
Multiple view approaches to 3D objects - aspects, k-medoids
Perceptual Organization - Grouping, figure/ground
The Human Body
Human Movement
Scenes.
Project presentations.
There is no required text for this course. Steve Palmer's
Vision Science and Forsyth and Ponce's Computer vision: A
Modern
Approach have useful source material.
We will use a scribe system to make course notes available through
the semester. Each lecture, one or two students will take turns taking
notes and typing them up. I'll edit and make the notes available on
the web.
The grade will be determined by a combination of home assignments,
scribe notes, and a final project. The project could be the
mathematical/statistical analysis of a visual task or the
implementation of some interesting algorithm or some psychophysical
experiment.
You'll be encouraged to work in teams for the projects and for the
home assignments.