Using Sound to Help Blind People See

The world is a jumble of sights, sounds, and smells. While these signals may seem distinct and independent, they actually interact and integrate within the brain’s network of sensory neurons.

A new assistive device for blind people taps into this sensory network. It translates images into sounds, allowing visually impaired people to detect their environment without the need for hours of training or intense concentration.

The work is described in a paper published in Scientific Reports.

“Many neuroscience textbooks really only devote a few pages to multisensory interaction,” says Shinsuke Shimojo, a professor of experimental psychology at the California Institute of Technology (Caltech) and principal investigator on the study. “But 99 percent of our daily life depends on multisensory—also called multimodal—processing.”

As an example, he says, if you are talking on the phone with someone you know very well, and they are crying, you will not just hear the sound but will visualize their face in tears. “This is an example of the way sensory causality is not unidirectional—vision can influence sound, and sound can influence vision.”

Shimojo and postdoctoral scholar Noelle Stiles have exploited these crossmodal mappings to stimulate the visual cortex with auditory signals that encode information about the environment.

They explain that crossmodal mappings are ubiquitous; everyone already has them. Mappings include the intuitive matching of high pitch to elevated locations in space or the matching of noisy sounds with bright lights. Multimodal processing, like these mappings, may be the key to making sensory substitution devices more automatic.


The researchers conducted trials with both sighted and blind people using a sensory substitution device, called a vOICe device, that translates images into sound.

The vOICe device is made up of a small computer connected to a camera that is attached to darkened glasses, allowing it to “see” what a human eye would. A computer algorithm scans each camera image from left to right, and for every column of pixels, generates an associated sound with a frequency and volume that depends upon the vertical location and brightness of the pixels.

A large number of bright pixels at the top of a column would translate into a loud, high-frequency sound, whereas a large number of lower dark pixels would be a quieter, lower-pitched sound. A blind person wearing this camera on a pair of glasses could then associate different sounds with features of their environment.


In the trials, sighted people with no training or instruction were asked to match images to sounds; while the blind subjects were asked to feel textures and match them to sound. Tactile textures can be related to visual textures (patterns) like a topographic map—bright regions of an image translate to high tactile height relative to a page, while dark regions are flatter.

Both groups showed an intuitive ability to identify textures and images from their associated sounds. Surprisingly, the untrained (also called “naive”) group’s performance was significantly above chance, and not very different from the trained.

The intuitively identified textures used in the experiments exploited the crossmodal mappings already within the vOICe encoding algorithm.

“When we reverse the crossmodal mappings in the vOICe auditory-to-visual translation, the naive performance significantly decreased, showing that the mappings are important to the intuitive interpretation of the sound,” explains Stiles.

“We found that using this device to look at textures—patterns of light and dark—illustrated ‘intuitive’ neural connections between textures and sounds, implying that there is some preexisting crossmodality,” says Shimojo.

One common example of crossmodality is a condition called synesthesia, in which the activation of one sense leads to a different involuntary sensory experience, such as seeing a certain color when hearing a specific sound. “Now, we have discovered that crossmodal connections, preexisting in everyone, can be used to make sensory substitution intuitive with no instruction or training.”

The researchers do not exactly know yet what each sensory region of the brain is doing when processing these various signals, but they have a rough idea.

Continue on to the next page see a video that demonstrates the process and explains more about how the device works.