A deep learning computer audio software system can crunch through audio data to identify music, say researchers at IBM.
In a paper to be presented this week at the International Conference on Computer Vision and Pattern Recognition (ICCVPR), the researchers describe how the system can identify patterns, music and human voices.
The system was built to analyse the volume of an audio signal, then find the highest pitch and lowest frequency for a given key.
For example, the researchers used a sound analysis algorithm called a “deep learning” algorithm to identify a song from a sound sample of a recording.
The researchers said they can also identify a pattern that could be used in the construction of new artificial neural networks (ANNs), or computer systems that can learn from past experience.
For the first time, the system was able to recognize human voices from a recording, the study found.
“When the recording was played back, it was very clear that the person is speaking, but it was not clear what they are saying,” said professor Richard Trewin, one of the study’s co-authors and a professor of computer science at Cornell University.
The researchers said their research is a “step towards creating deep neural networks for music recognition and natural language processing,” which could be integrated into other AI systems, such as speech recognition systems.
They also said their work could be applied to music recognition, and to natural language understanding, as well as other natural language tasks.
In their paper, the IBM researchers use the computer audio standard PCM.
The company’s research team is using PCM to build the system.
As a music analysis method, PCM can produce a number of different results.
Some of these are highly reliable.
For instance, the AI system can distinguish between a sample of human voice, human text and music.
Other results can be quite ambiguous, and may depend on whether the audio was made by a human, a machine or a third-party.
IBM said the AI software can “play back a sample from a recorded speech, as opposed to an unrecorded voice.”
In other words, the artificial intelligence system could “know” a person is talking when they are not, and it could identify an audio source when they don’t.
To perform this kind of analysis, the team used a “bundle of sound data” to create a virtual recording of a song that they played back.
This was then processed using PCL, a deep learning method that is capable of performing “sparse, deep learning tasks” with relatively few resources.
The results are then used to “evaluate the signal’s quality,” the researchers said.
Using PCM, the computer can identify music from audio samples by analyzing the volume, the timbre, and other features.
The sound samples are then “sorted” into “top 10” and “bottom 10” bins based on the sound characteristics of the music.
In other instances, the sound quality can be “detected and annotated” using a computer program, and “re-processed” in a similar manner.
After a “match,” the algorithm determines the best “best match” of the data.
This is a key area of interest, as it could potentially improve AI applications, said Dr. David Hinton, the senior director of the Deep Learning Lab at IBM Research.
In addition to music, the authors said they have used PCM for other natural-language tasks, including natural language recognition.
They are also working on “sensor tagging” that would “automatically identify the best possible candidate for a tagged document” based on their own training data.
This could be useful in identifying text for text-to-speech translation, for instance.
The researchers wrote that they are now working on a software platform to create an AI system that “is capable of identifying the human voice.”
The system could be able to “play out music recognition” on a regular basis, so the system could have a “presence in your everyday life,” they wrote.
Another key area for the researchers is “tasking,” a concept in machine learning that refers to the act of “trying” a problem to see if it is possible to solve it.
This involves a process of searching for problems that can be solved, then working on one of them, said Hinton.
The team is also working to develop tools to help them automate tasks such as task completion, task evaluation, task categorization, and task execution.
At the end of the day, the research could help “optimize algorithms for natural language detection, speech recognition and language understanding,” said Trewins.
It is also an important step towards making artificial intelligence more efficient, he added.
“We are excited to be collaborating with IBM on the next generation of deep learning for natural speech recognition,” said Henson.