Authors:
Dan B Goldman and David Salesin (Adobe Systems, Inc.). Chris Gonterman, Brian Curless, Steven M Seitz (University of Washington)
Summary:
Objects in a video are... objects in a video. Characters and props can be objects, cars and animals, etc. Most video editing software is concerned with timelines and frames, even though objects are what people are more concerned with. Being able to tag an object and have it tracked across frames would greatly speed up the video editing process (no splicing together stills to get your point across), and that's just what the authors of this paper are working on. They focus on the annotation, navigation, and composition of videos in an object-focused way. To accomplish these tasks, videos are preprocessed and low-level motion tracking is employed to determine what objects are in the video.
Annotation deals with adding graphics (such as text bubbles, highlights, outlines, etc.) to moving objects. Uses include sports broadcasting, surveillance videos, and post-production notation for professionals. The five annotations that they implemented were graffiti, scribbles, speech balloons, path arrows, and hyperlinks.
For navigation, the new system allows a user to select an object and drag the mouse to a new location on the screen. Once released, the video will move to a time when that object is close to that release point, thus computing video cuts for the user. The system visualizes ranges of motion for an object by placing a "starburst widget" on it which uses vectors to indicate the length and direction of motion that the object undergoes forward and backward in time.
Video-to-still composition is all about splicing together images from the video to create a single composition. The authors use a drag-and-drop system to move selected objects forward or backward through frames until the object is where it is wanted. All other objects in the frame remain frozen in place until they are directly selected and subsequently manipulated. In this way, a composite image can be created that has each object exactly where the user wants it to be.
The black car was pulled forward in time to appear to be right in front of the white one (affirmative action!)
Discussion:
Awesome stuff... except it takes 5 minutes PER FRAME to preprocess the video! That's an epic amount of time (and that's for 720 x 480 resolution). If they can speed that up, then they are golden. You should check out the paper for yourself!