Chris talks 436: March 2010

Wednesday, March 17, 2010

Video Object Annotation, Navigation, and Composition

*NOTE: I have not been keeping track of who I leave comments on, but I have comments out there. I'm writing the blogs, reading others, and attending lectures where I can ask questions. Comments seem a little redundant!

Authors:
Dan B Goldman and David Salesin (Adobe Systems, Inc.). Chris Gonterman, Brian Curless, Steven M Seitz (University of Washington)

Summary:
Objects in a video are... objects in a video. Characters and props can be objects, cars and animals, etc. Most video editing software is concerned with timelines and frames, even though objects are what people are more concerned with. Being able to tag an object and have it tracked across frames would greatly speed up the video editing process (no splicing together stills to get your point across), and that's just what the authors of this paper are working on. They focus on the annotation, navigation, and composition of videos in an object-focused way. To accomplish these tasks, videos are preprocessed and low-level motion tracking is employed to determine what objects are in the video.

The colored dots are computed to track probable objects

Annotation deals with adding graphics (such as text bubbles, highlights, outlines, etc.) to moving objects. Uses include sports broadcasting, surveillance videos, and post-production notation for professionals. The five annotations that they implemented were graffiti, scribbles, speech balloons, path arrows, and hyperlinks.

For navigation, the new system allows a user to select an object and drag the mouse to a new location on the screen. Once released, the video will move to a time when that object is close to that release point, thus computing video cuts for the user. The system visualizes ranges of motion for an object by placing a "starburst widget" on it which uses vectors to indicate the length and direction of motion that the object undergoes forward and backward in time.

Video-to-still composition is all about splicing together images from the video to create a single composition. The authors use a drag-and-drop system to move selected objects forward or backward through frames until the object is where it is wanted. All other objects in the frame remain frozen in place until they are directly selected and subsequently manipulated. In this way, a composite image can be created that has each object exactly where the user wants it to be.

The black car was pulled forward in time to appear to be right in front of the white one (affirmative action!)

Discussion:
Awesome stuff... except it takes 5 minutes PER FRAME to preprocess the video! That's an epic amount of time (and that's for 720 x 480 resolution). If they can speed that up, then they are golden. You should check out the paper for yourself!

Tuesday, March 16, 2010

Scratch Input: Creating Large, Inexpensive, Unpowered and Mobile Finger Input Surfaces

Authors:
Chris Harrison, Scott E. Hudson (Carnegie Mellon University)

Summary:
Scratch Input is a sensor that detects the sound of a fingernail being dragged across different surfaces. The reason for such a sensor is to allow for the addition of a finger input device (gesture recognition). The device (which fits into a modified stethoscope) is small enough to be added to mobile devices. The sensor can be placed onto any solid surface and detect the unique high frequency of scratching (listed as 3000Hz or more). In this paper, the authors also go over some examples of when Scratch Input could be useful. One of which comes from the case that a cell phone is equipped with the device and is resting on a table. When an incoming call is received, the user performs a certain gesture on the tabletop and the phone takes the call on speakerphone. Another example involved placing the device on a wall and using different gestures on said wall to manipulate the playback of music. The authors found that, while testing on tables during a user study, people were able to accurately perform a set of six gestures with an average accuracy of 89.5%. They concluded that their product is both accurate and easy to use.

Some gestures

Discussion:
These guys seem to like making cheap little gadgets (my previous blog was over another such product). I wonder what it is that drives them to do this kind of research and development? Anyway, just like their last paper, this seems like a cool idea. Being super super lazy and just scratching or tapping on the wall or a table to get stuff done would rock! If I want my computer to start torrenting episodes of Archer, all I have to do is sketch out a big A. If I want my cell phone to call Dominos, but I don't want to have to reach over and pick it up, I can draw a D and then yell my order at the phone. It's every lazy person's dream!

Monday, March 8, 2010

Lightweight Material Detection for Placement-Aware Mobile Computing

Authors:
Chris Harrison, Scott E. Hudson (Carnegie Mellon University)

Summary:
Placement-awareness would allow mobile devices to take certain actions without being explicitly told to do so (the authors give the example of a cell phone silencing itself while its owner is at work). In this paper, both cell phones and laptops are used to demonstrate the potential of a new sensor that observes its surroundings to determine the placement of its operating device. A user could map certain materials to locations, and the multispectral sensor could then predict its location by comparison (figure 1).

Figure 1- This picture is humongous.

After giving some examples of use, the authors discuss how the sensor works and what it is made of. To combat situations where no light reaches the bottom of the device, the sensor is equipped with different LEDs that can be reflected off of the resting surface. It does its detection in seconds, and costs less than a dollar to manufacture. With the help of a naive Bayes classifier, the sensor learns which materials correspond to which locations within 86.6% accuracy (which, they say, is much better than anything else on the market).

Discussion:
This sensor has potential, but it definitely needs to have an override added to it. If for some reason the sensor thinks that you're in a location that you aren't then you're going to get pwned. But saving energy, as they discussed in the paper, is a definite plus. I see a company catching on and then charging beastly fees for this $1 sensor (that's the way the world goes round aka Microsoft).

Towards More Paper-like Input: Flexible Input Devices for Foldable Interaction Styles

Authors:
David Gallant, Andrew Seniuk, and Roel Vertegaal (Queen's University)

Summary:
In this paper, the authors discuss their newest Foldable Input Device (FID). FIDs are an example of Foldable User Interfaces (FUIs), which combine traditional GUIs with the tangibility of paper inputs. In order to create their FID, the authors put 25-35 IR retro-reflectors onto a cardstock paper (think of it as your typical mouse pad). The IR reflectors are tracked with an augmented webcam that is attached to a computer running Windows XP. Via OpenGL, the motions and manipulations of the FID can be displayed in real time. The digital correspondent to the physical FID is known as a Sheet. Sheets respond to the manipulations of the FID, and multiple Sheets can be controlled by a single FID. Figure 1 shows some of the FID interaction techniques.

Figure 1. Some FID techniques. The extra whitespace on the side brought to you by how lame MS Paint is.

The FID interaction techniques covered in the paper include sliding, scooping, bending, hovering, folding, leafing, shaking, and squeezing. The authors where inspired by the art of Origami, wherein paper is manipulated to make shapes (usually viewed by their shadow). Their FID is thus an input device that is based on manipulating paper in a similar way.

Discussion:
This paper basically went over the different techniques of the new FID as opposed to user tests or any specific applications. The authors do, however, provide quite a lot of enthusiasm for all of the things you can do (even if they aren't sure if anyone would want to use them). I think that the idea behind the input is really cool, but cannot really see myself using a paper input in such a way unless a webpage was structured like a book and I needed to flip through it. For future research, they should find out just how much people care for their FID (no offense guys!).

Multimodal Interfaces for Automotive Applications (MIAA)

Summary:
Multimodal interactions allow for natural, intuitive design. If a person can physically touch and manipulate an object by hand, then they need not try and learn how to map these natural movements to those of a mouse, keyboard, or some other input device. Here the authors are concerned with applying multimodal interfaces to cars, where inputs and outputs are limited and often time-constrained. As cars continue to incorporate more advanced features that require the integration of technology, drivers must be able to interact with them with minimal difficulty and concentration (they're driving!).

In their workshop, the authors express an interest in multi-party systems wherein passengers and drivers fill different roles that give them access to different features and devices. They also mention voice commands and dialogue as a form of input, and having multiple unique outputs for each driver/passenger role. MIAA is focused on user-centered design.

Discussion:
This paper makes the workshop sound awesome. I didn't really notice the fact that almost all inputs and outputs in a vehicle are tailored toward the driver and are therefore within their reach (which according to Donny Norman is a design issue). Some minivans of SUVs give air conditioning or entertainment controls to passengers, but not in such a way as this paper implied. Keeping functions that are not critical to the driver's role away from the driver makes sense to me! But just what kinds of multimodal inputs would they seek to incorporate other than voice?

Sketch Recognition

Summary:

Our very own Dr. Hammond led an IUI workshop concerning sketch recognition. Sketch recognition is concerned with determining what a user intended to draw without forcing the user to draw within certain style constraints. It combines the intuitive pen and paper drawing actions with artificial intelligence, algorithms, and touch technology (including multi-touch and tablets).

Dr. Hammond stresses that sketch recognition is important to, and applicable in, the areas of education, design, and computer-human interaction. Diagrams serve as pivotal learning tools, and many disciplines use them to express key points that cannot be explained with words alone. In terms of design, hand drawing allows users to make improvements instead of becoming fixated on a computerized solution. Finally, Dr. Hammond discusses CAD (Computer-Aided Design) systems and how they can be supplemented with sketch recognition to save users time when drawing and testing their designs.

The last part of the paper deals with workshop that she led on sketch recognition. Dr. Hammond hosted a sketch recognition contest at IUI 2009 in order to better demonstrate her research advancements.

Discussion:

Sketch recognition sounds pretty cool (and not just because Hammond is our professor!). The idea that someone can quickly jot down a diagram that the computer can translate into something functional (or at least legible) is of great benefit to just about anyone who would ever use a diagram or drawing. Testing a student's comprehension of a function or algorithm that is presented as a diagram could be done naturally with their own diagram instead of contrived questions or by labeling parts. Never again would I have to scan in a bunch of papers just so I can share figures with people! Anything that makes communication more natural such as sketch recognition is a good thing.

Monday, March 1, 2010

Emotional Design: A Study of Being Boring

Dear Donny,
Your book, "Emotional Design", is boring. I struggled through the first six chapters over the course of two weeks, unable to bring myself to really concentrate. I arrived in chapter seven only to find you jabbing at Isaac Asimov's epic and award-winning science fiction writing (of which I have 26 books). Asimov endorsed "The Design of Everyday Things", a previous work of yours that I actually enjoyed. I find it rather low of you to critique his writing style now that he has died. I find that Asimov's works reflect creativity, style, and expansion; in "Emotional Design" you seem to be restating the obvious in loops of examples, striving to make a book out of a few chapter's worth of material. I sincerely hope that when I read "The Design of Future Things" I am not tempted to put it in the toaster as I was with this book.
Sincerely,
Chris

Now that that's cleared up, allow me to explain the book itself. If you ignore the blatant contradictions in his attitude (contrast this with his previous novel) and his examples (robots must not look like people but must look like people), Norman makes some key observations about our emotions and emotional design that we may take for granted. He breaks design into three levels: visceral (it appeals to basic, natural impulses), behavioural (where use and function are key), and reflective (it projects a certain meaning or message). Behavioural is essentially the idea behind his previous book that we read for class. Norman continues by discussing the need for emotional design and its benefits. When people connect with something, then they will excuse some of its shortcomings. When people trust something, they accept it and are faithful to it. When things are designed with people and their emotional responses in mind, everybody wins. He concludes with some ideas about robots and incorporating emotions into them so that they might learn to care about their tasks and have a sense of awareness.

P.S.
Did you even read Asimov's books? At one point you say that "...he never had people and robots working together as a team." That is the exact premise of all three original novels in The Robot Series! Literally, a human and a robot are partners. In other novels, people LOVE robots. I mean LOVE THEM like some creepy Japanese men love them, even! You also say that robots don't work together in his novels. In "The Naked Sun", the entire planet is literally run by robots. And in the latter novels of the Foundation Series, robots are even part of all-encompassing networks of organisms similar to the one mentioned in your book. Seriously, Donny... don't hide facts just to make a point in your book. That's low.

Chris talks 436