Music Matching (part deux)
After a comment on my previous blog entry (about creating a shazam clone) I started tinkering again.
Somebody asked: Could this be used to detect duplicate songs in my mp3 collection!?
That is exacly what I just tried!
The results
Here are some examples:
Duplicate found: 01 – everything in its right place.mp3.song matches with D:\data\v2\01-radiohead-everything_in_it’s_wrong_place-h8me.mp3.song and score: 134
(note: This is a remix of the original, with some rap mixed in)
Duplicate found: 01 – Joy Division – Exercise One.mp3.song matches with D:\data\v2\114 – Joy Division – Exercise One (From Still).mp3.song and score: 255
(note: Yes, duplicate!)
Duplicate found: 01 The District Sleeps Alone Tonight.mp3.song matches with D:\data\v2\The Postal Service – The District Sleeps Alone Tonight.mp3.song and score: 636
(note: Yes, duplicate!)
Duplicate found: 01-AudioTrack 01.mp3.song matches with D:\data\v2\06.Richard cheese -Id like a virgin- Butterfly.mp3.song and score: 144
(note: Yes, duplicate, the second was a radio snippet with jingle in the song)
Duplicate found: 01-editors-papillon_edit.mp3.song matches with D:\data\v2\02-editors-papillon_instrumental_edit.mp3.song and score: 382
(note: Almost a duplicate, the second is an instrumental version of the first)
Duplicate found: 01-the_strokes-you_only_live_once-ser.mp3.song matches with D:\data\v2\01-the_strokes-you_only_live_once.mp3.song and score: 450
(note: Yes, duplicate!)
Duplicate found: 01-the_wombats-backfire_at_the_disco.mp3.song matches with D:\data\v2\Backfire @ The Disco [Promo Version].mp3.song and score: 493
(note: Yes, almost duplicate, the second is a radio-promo announcement with jingle in it)
Conclusion
With a bit of tinkering this algorithm could be used to make a tool to detect duplicate songs, even if the mp3′s aren’t similair. Even live versions and instrumental versions are detected if you lower the threshold.
8 Responses to Music Matching (part deux)
Leave a Reply Cancel reply
Latest tweet
- Unique, there is only one traffic jam in the Netherlands currently! The only problem? It is almost 1000km long.
Links
Archives


How about putting the sourcecode on github, and seeing where it goes?
(I so much prefer github to sourceforge these days)
Or google code?
This stuff just rockzzz.
Can we use this algorithm for searching a song by giving an input from microphone of our voice? I mean singing a song with our voice and searching its matched in database??
Have you got any plans to share more of the code from your Shazam in Java blog post? I’m attempting a Ruby clone but am failing so far :-(
I had plans to clean my code up and releasing it into the open source community. But I’ve run into patent infringment claims…!
More on that later, I’m preparing a (long) blogpost about the current situation. But basically it seems I won’t be able to release any of this code. :-(
mr. roy can i have your email?? i like to know more n talk more… thanx for the consideration
Hey Man,
I’m making a graduation project to recognize voices to control a wheelchair. Im using Wavelets but i was thinking if this algorithm that you posted will works with it!
Do you have any contact that i can talk with you???
Thanks man!!!
congratulations for you blog!!!
hi there i got a working fingerprint implementation but i wonder how you compare 2 fingeprints with each other…the problem is i don’t know wether my approach is fast enough for big databases as i’m struggling a bit with efficiently aligning the fingerprints so they match or something xD