The 'cocktail party problem' refers to the experience of standing in a room full of people, maybe with background music, drink in hand, trying to cling on to what the hell the person next to you is saying.
The human brain is remarkably good at solving this problem, focusing a person’s attention on a particular stimulus, and excluding a range of other stimuli from conscious awareness.
Thus, allowing you to hang on the every word of that attractive stranger, or sadly, hear more about the reasons they're single, or about drones, from the person who has had you stuck in the corner for the last forty minutes.
However, just like with creating hands, technology has not yet been able to replicate this humanoid phenomenon.
This can be a particular problem when trying to analyze audio evidence in court cases. Overlapping voices and background noise can make it difficult to determine who is saying what, potentially rendering key evidence unusable.
Electrical engineer, founder and chief technology officer of Wave Sciences, Keith McElveen, had his interest in this problem piqued while working for the US government on a war crimes case.
It was the late stages of the Cold War, and McElveen was sifting through audio recording evidence of alleged orders to massacre civilians. These recordings had multiple overlapping voices, rendering them unintelligible.
Get the Digital Camera World Newsletter
The best camera deals, reviews, product advice, and unmissable photography news, direct to your inbox!
Speaking to the BBC he said:
"What we were trying to figure out was who ordered the massacre of civilians. Some of the evidence included recordings with a bunch of voices all talking at once - and that's when I learned what the 'cocktail party problem' [was]"
"I had been successful in removing noise like automobile sounds or air conditioners or fans from speech, but when I started trying to remove speech from speech, it turned out not only to be a very difficult problem, it was one of the classic hard problems in acoustics.
"Sounds are bouncing round a room, and it is mathematically horrible to solve."
In early 2010, a year after Wave Sciences was launched, McElveen and other researchers figured out that a mathematical technique used in SONAR to locate enemy submarines in 3D offered some clues about how the 'cocktail party problem' might be solved.
They came up with a plan to use AI to try and pinpoint and single out all competing sounds, focusing on where they originally came from in a room. Other speakers, but also other conduits of noise such as sound reflective surfaces.
An issue that arose was the need for a different microphone for each sound, costing a huge amount of money and putting off potential commercial partnerships.
He adds: "We knew there had to be a solution, because you can do it with just two ears."
After ten years of internally funded research, the team solved the problem in 2019 and filed a patent that September.
The solution: An AI Glimpse engine, that can analyze and understand exactly how sound bounces around a room prior to finding an ear, or a microphone.
"We catch the sound as it arrives at each microphone, backtrack to figure out where it came from, and then, in essence, we suppress any sound that couldn't have come from where the person is sitting," says Mr McElveen.
“The results don’t sound crystal clear when you can only use a very noisy recording to learn from, but they're still stunning." You can hear how the technology works through a simulation on their website.
According to the BBC, the results are akin to when a camera focuses on one subject, and blurs out both the foreground and background.
McElveen’s team's technology has since been instrumental in a 2022 US murder case, providing evidence that was essential to the convictions.
As the BBC reports, after two hitmen were arrested for murdering a man, the FBI used the AI technology to try and prove that they had been hired by a family going through a child custody dispute. The plan was to trick the family into believing that someone was blackmailing them over their involvement, and listen in.
Recordings of meetings taken in busy restaurants were authorized to use Wave Sciences’ technology, taking the audio from inadmissible, to pivotal.
The US military, the UK government, and other organizations are very interested in the technology. Despite the macabre inspiration, it’s good to hear AI being used for good, for a change.
Take a look at our guides to the best budget microphones, the best wireless microphones, and the best AI image generators.