Course Description
This course introduces the fundamental techniques used in computer vision, that is, the analysis of patterns in visual images to reconstruct and understand the objects and scenes that generated them. Topics covered include image processing basics, Hough Transforms, feature detection, feature descriptors, image representations, image classification and object detection. We will also cover camera geometry, multi-view geometry, stereo, 3D reconstruction from images, optical flow, motion analysis and tracking.
Version
Version B of 16-720 is intended for students with prior knowledge of computer vision and prior exposure to machine learning. Undergraduate students should take 16-385 which is the undergraduate version of the class. Those with no exposure to computer vision or machine learning should take the A version of the class. Those with advance experience in computer vision should take the 800 level computer vision courses.
Prerequisites (self evaluation)
Linear Algebra, Multivariate Calculus, Probability theory, Programming
Educational Outcomes
- Implement the Hough Transform to detect lines in an image
- Extract SIFT features to build a Bag-of-Words representation of an image for classification
- Perform object recognition using a convolutional neural network
- Detect Harris Corners and implement the RANSAC algorithm to find the homography between two images
- Perform 3D reconstruction and stereo rectification to implement stereo block matching using two images
- Implement a gradient descent based image alignment algorithm to track objects in a video
- Students will learn how to use Python and PyTorch through the programming assignments
Course Staff
Grading
Programming Assignments 100% (6 assignments total). Grades determined on an absolute scale. Typically 90% and above is A, 80% - 89% is B, 70% - 79% is C, 60% - 69% D, 59% or below is R. There will be extra credit opportunities for students who want to go deeper into the material.
- Hough Transform (10%)
- Bag of Visual Words (18%)
- Neural Networks (18%)
- Homography (18%)
- 3D Reconstruction (18%)
- LK Image Alignment and Tracking (18%)