With a breakthrough that sounds more like the plot from the latest instalment of James bond than an academic research paper, engineers at MIT have managed to recover speech by analyzing the tiny vibrations of a potato chip bag from 15 feet away — with a video camera watching through soundproof glass. That is, if you’re having a private conversation in a room, but a spy can see a chip bag (or any other object) through the window, they could work out what you’re saying. There are some obvious security and forensics repercussions from this work, which is being presented at Siggraph 2014 next week, but other interesting uses will surely emerge (such as recovering audio from silent film, perhaps?)
Before we discuss how MIT recovers sound from a silent video feed, you should first watch the video below. The video does a good job of showing you how effective MIT’s passive recovery technique is, in a variety of different scenarios.
This technique, which MIT calls “the visual microphone,” works by analyzing how an object vibrates when it’s hit by sound waves. While it might not be entirely obvious, sound waves traveling through air are just regions of high and low pressure — and when these waves hit an object, the object is buffeted and vibrates in much the same way as your own eardrum.
The problem is, unless you’re six feet away from the speaker stack at a Pantera concert, these vibrations are really, really small. Earlier this morning I spent a good five minutes talking to an empty potato chip bag to see if I could spot the vibrations, but alas I could not. To get around this problem, MIT uses two tricks. First, it borrows a technique developed by another group at MIT that massively amplifies the tiniest of movements and variations in a video feed (this technique can monitor your pulse by watching for the tiniest variations in skin color caused by blood being pumped around your body). Second, the effectiveness of the visual microphone is significantly boosted by using a high-speed camera — basically, to see high-frequency vibrations, you ideally need a camera that captures at thousands of frames per second (if you want to reconstruct human speech at 300Hz, you preferably want to capture at 300 fps or higher).
The researchers found that, for normal-amplitude sounds (speech, music), the pressure waves caused objects to move/vibrate around one-tenth of a micrometer. This is about five-thousandths of a pixel in a close-up image, apparently. To spot the vibrations, the MIT team looks at the minute changes in a pixel’s color. For example, imagine a white chip packet in a blue-painted room. The edge of the chip packet, while it wouldn’t visibly move, would vary between shades of white and blue depending on the vibrations — and these are shades that can easily be detected by software. [DOI: 10.1145/2601097.2601119 - "The visual microphone: passive recovery of sound from video"]
The visual microphone has obvious applications in the realms of law enforcement, intelligence gathering, and forensics. While laser microphones (which detect vibrations on a pane of glass) are fairly old hat by this point, the visual microphone can be used after-the-fact on recorded footage. If the technique can be improved so that high-frame-rate cameras or rolling shutters aren’t required, then we might even be able to recover sound from silent films, such as those starring Charlie Chaplin.
Link: extremetech.com
0 comments:
Post a Comment