A subjective benchmarking study highlights that Music AI's source separation model not only excels in objective metrics but also in human perception of audio quality. This evaluation, based on musician ratings and combined with the Signal to Distortion Ratio (SDR) benchmark study, confirms Music AI’s dominance in delivering clear, high-quality audio stems compared to leading competitors. This study was conducted in October 2024 with the models available at that time.
Benchmark Overview: Human Perception of Audio Quality
Unlike SDR, which objectively measures separation performance, this study focuses on subjective ratings provided by trained musicians. The goal was twofold:
- To benchmark Music AI’s model against competitors.
- To explore the relationship between SDR scores and human perception of quality.
Test Configuration:
- Stems Analyzed: Vocals, Drums, Bass, Other
- Companies Compared: Music.AI, Lalal, AudioShake, AudioStrip
- Dataset: 160 audio snippets from MUSDBHQ, including control audios recorded directly from instruments. (For each instrument analyzed, ten songs with varying SDRs were chosen from our test set, the open dataset MUSDBHQ. For each song, four random 6-bar snippets were extracted where the instrument was active. Consequently, each instrument was assessed across 40 snippets, totaling 160 snippets for the entire test.)
- Annotators: 5 musician interns (2 guitarists, 1 drummer, 1 bassist, 1 pianist)
Key Findings:
1. Overall Performance: Music AI achieved the highest average rating (66.07), outperforming AudioShake (61.95), AudioStrip (51.33), and Lalal (48.58). The original control audios scored 92.78, serving as a quality benchmark.
2. Instrument-Specific Insights: Music AI consistently outperformed AudioShake in all instruments except bass, where results were comparable. AudioStrip showcased strong performance in vocal separation, slightly trailing Music AI.
3. Annotator Agreement: All five annotators rated Music AI higher than AudioShake, demonstrating consistent preference across evaluators.
Correlation Between SDR and Perceived Quality
Correlation amongst all metrics is weak, but when a deeper investigation into small segments of the SDR was performed, an almost linear pattern was observed from SDR 0 to 11.
- Weak Correlation Observed:
Pearson correlation coefficients revealed weak links between SDR and subjective ratings:
- Music AI: -0.0682
- AudioShake: 0.0217
- AudioStrip: 0.3721
- Lalal: 0.5660
Perception Patterns:
- From SDR 0 to 11, perceived quality improved almost linearly.
- A plateau occurred between SDR 11 and 15, suggesting diminishing perceptual returns despite technical clarity gains.
- Surprisingly, a dip in ratings was noted between SDR 15 and 19, hinting at potential perceptual biases or audio artifacts.
Conclusion: Music AI's Superior Subjective Performance
Music AI not only leads in objective SDR metrics but also delivers superior audio quality as perceived by human listeners. This dual advantage underscores Music AI’s position as the premier solution for source separation, offering both technical excellence and listener satisfaction. For audio professionals seeking the best in source separation technology, Music AI provides unparalleled clarity and precision, validated by both data and human perception.