Wednesday, January 20, 2010

User Guided Audio Selection from Complex Sound

Authors:
Paris Smaragdis (Adobe Systems, Inc.)
Summary:
When someone wishes to manipulate a photo or video, they are presented with a wide variety of tools and applications. Changing colors, deleting objects, merging scenes, and many other tasks which were once impossible are now commonplace. Audio processing, however, is still a complex and complicated task. Users cannot simply point to a section of an audio waveform and isolate an instrument in an overlay. Because of this difficulty, Paris Smaragdis developed a novel interface for selecting sounds. Most audio editors concern themselves with two main points - visualization and sound separation. Audio visualization is essentially a waveform showing the air pressure over time, and is most widely used. Sound separation involves breaking down audio files into acoustic energy, which can be seen as a graph of time and frequency. Both of these points provide information, but they lack object-based interaction.

A time-frequency sound representation


Paris Smaragdis uses audio guidance to achieve sound selection from mixed audio. This task begins with the Probabilistic Latent Component Analysis (or PLCA) model. Simply put, the PLCA model estimates what pieces of an audio mixture belong to what unique instrument or sound, based on what is expected in the mixture, the presence of a given sound at a certain time, and the overall contribution that each sound makes to the mixture. The user can then sing, hum, or play an approximation of the sound they are trying to extract or edit, and use this sample as a prior. The PLCA model then tries to match the prior to parts of the audio mixture.

To test this approach, Smaragdis attempted to extract a speech after mixing it with background music. Using direct playback of the original speech, perfect extraction occurred. Having someone else say the words gave poorer results, but Smaragdis states that they still "...rival modern sound separation algorithms".

Discussion:
Obviously, the usefulness of this software is bounded by how accurately a user can reproduce a sound. Having the premixed track for playback would provide near perfect extraction, but a tone deaf person trying to edit an insane guitar solo would lead to what I can only imagine to be epic failure. Aside from the accuracy of user input, this audio selection tool sounds awesome. Anyone familiar with Audacity or other sound editing and mixing tools knows how frustrating trying to edit a unique instrument can be.

I see this work being furthered by working it in a different direction. If someone was able to extract a track by matching it to my input, could I not take my input and convert it into music? It would be revolutionary to simply sing or hum the parts you wish to include in a song and have the computer match it to pitches and note lengths. Then you could skin each input with the desired synthesized effects, or match it with recorded instrumental inputs. If I knew the first thing about making that a reality I would be hard at work on it now. But until then, I'm going to claim the idea as my own intellectual property!

2 comments:

  1. My mind = blown. We need Brandon in here or someone else who knows things about music (besides you, of course). There are so many cool things you could do with this! Your suggestion reminds me of an actually useful version of Microsoft Songsmith. If you don't know what that is, look up the Youtube videos. It's hilariously bad.

    ReplyDelete
  2. I agree, this research sounds cool, however, I do see drawbacks. The first being, "why would this software be useful commercially?" If the studio that recorded the music is trying to edit one track that was already mixed into a song, aren't they going about this the wrong way to begin with? If they are the ones that recorded the track to begin with, would it not be easier to simply edit that one previously recorded track and then mix all the tracks back together? Obviously this would require a smarter mixer, but I'm sure that already exists.

    I can see how it's useful for people that are not the original artists of the music, but then again, isn't it illegal to steal songs (even if it is just parts of them) anyway?

    On another note, your idea would be awesome if it were real. However, that would also be a lot of work, especially to get it right and easy to use.

    ReplyDelete