After a comment on my previous blog entry (about creating a shazam clone) I started tinkering again.

Somebody asked: Could this be used to detect duplicate songs in my mp3 collection!?

That is exacly what I just tried!

The results

Here are some examples:

Duplicate found: 01 – everything in its right place.mp3.song matches with D:\data\v2\01-radiohead-everything_in_it’s_wrong_place-h8me.mp3.song and score: 134
(note: This is a remix of the original, with some rap mixed in)


Duplicate found: 01 – Joy Division – Exercise One.mp3.song matches with D:\data\v2\114 – Joy Division – Exercise One (From Still).mp3.song and score: 255
(note: Yes, duplicate!)


Duplicate found: 01 The District Sleeps Alone Tonight.mp3.song matches with D:\data\v2\The Postal Service – The District Sleeps Alone Tonight.mp3.song and score: 636
(note: Yes, duplicate!)


Duplicate found: 01-AudioTrack 01.mp3.song matches with D:\data\v2\06.Richard cheese -Id like a virgin- Butterfly.mp3.song and score: 144
(note: Yes, duplicate, the second was a radio snippet with jingle in the song)


Duplicate found: 01-editors-papillon_edit.mp3.song matches with D:\data\v2\02-editors-papillon_instrumental_edit.mp3.song and score: 382
(note: Almost a duplicate, the second is an instrumental version of the first)


Duplicate found: 01-the_strokes-you_only_live_once-ser.mp3.song matches with D:\data\v2\01-the_strokes-you_only_live_once.mp3.song and score: 450
(note: Yes, duplicate!)


Duplicate found: 01-the_wombats-backfire_at_the_disco.mp3.song matches with D:\data\v2\Backfire @ The Disco [Promo Version].mp3.song and score: 493
(note: Yes, almost duplicate, the second is a radio-promo announcement with jingle in it)

Conclusion

With a bit of tinkering this algorithm could be used to make a tool to detect duplicate songs, even if the mp3′s aren’t similair. Even live versions and instrumental versions are detected if you lower the threshold.

Tagged with:
 

8 Responses to Music Matching (part deux)

  1. Will says:

    How about putting the sourcecode on github, and seeing where it goes?

    (I so much prefer github to sourceforge these days)

  2. ivo says:

    Or google code?
    This stuff just rockzzz.

  3. Abdul Jamil says:

    Can we use this algorithm for searching a song by giving an input from microphone of our voice? I mean singing a song with our voice and searching its matched in database??

  4. Ben says:

    Have you got any plans to share more of the code from your Shazam in Java blog post? I’m attempting a Ruby clone but am failing so far :-(

  5. royvanrijn says:

    I had plans to clean my code up and releasing it into the open source community. But I’ve run into patent infringment claims…!

    More on that later, I’m preparing a (long) blogpost about the current situation. But basically it seems I won’t be able to release any of this code. :-(

  6. k! says:

    mr. roy can i have your email?? i like to know more n talk more… thanx for the consideration

  7. Douglas says:

    Hey Man,

    I’m making a graduation project to recognize voices to control a wheelchair. Im using Wavelets but i was thinking if this algorithm that you posted will works with it!
    Do you have any contact that i can talk with you???

    Thanks man!!!
    congratulations for you blog!!!

  8. fukurokujo says:

    hi there i got a working fingerprint implementation but i wonder how you compare 2 fingeprints with each other…the problem is i don’t know wether my approach is fast enough for big databases as i’m struggling a bit with efficiently aligning the fingerprints so they match or something xD

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>