Embodied Vision

CSIE5421 • Spring 2026 • National Taiwan University

The computer vision has gained great success since the re-invention of deep neural networks. However, the success only happens in passive perception, where the recognizer passively receives and parses the visual information without interacting with the environment. This set up oversimplifies perception, separating it from other functionalities. Humans are embodied agents in a physical world. Our actions and physics change the environment, and our perceptions facilitates adaptation of actions in the changing environment. In other words, perception should take actions and physics into consideration.

In this class, we aim to study embodied (action- and physic-aware) perception. We provides a technical overview over: (1) 4D modeling, (2) Dynamic simulation, (3) Multi-sensory perception, and (4) Action-centric perception.

Please fill in this google form if you are interested in extra enrollment!


Course Goals

The goal of this class is to guide you through “what are the essential components for embodied perception” that facilitates various downstream tasks, such as VR, AR and robotics. Going through each component thoroughly is not our goal.

Upon completion of the course students should know:

  • How to model the world in 4D
  • How to model the world physically and dynamically
  • How to incorporate multi-sensory inputs
  • What are the essential components in action-centric perception

Upon completion of the course students should be able to:

  • Write down the formulation of physic simulation
  • Implement a physic- and action-aware visual system

Prerequisite Knowledge

Students should have a solid understanding of the following areas:

  • Machine learning: stochastic gradient descent, loss function, optimization, neural network
  • Linear algebra: matrices, vectors, norms, scalar/vector products, orthogonality, singular value decomposition…
  • Probability: expectation, independence, Baye’s Theorem…
  • Linux system: setting up the required environment, being familiar with bash scripts
  • Python programming: creating python projects, importing required packages, visualizing your results
  • Pytorch programming: creating NNs, setting up training & evaluation pipeline

Prerequisite Hardware

Grading in this course heavily depends on the coding assignments and final project. You should prepare for your own Linux machine with GPUs for the assignments and the final project. Otherwise, you can try cloud platforms for the access of GPUs.

Other Information

  • Lectures: Monday 9:10 AM - 12:10 PM
  • Lecture Location: CSIE Building Room 104
  • Discussion: Discord
  • HW submission: Github classroom