A couple of days ago I encountered this article: How Shazam Works

This got me interested in how a program like Shazam works… And more importantly, how hard is it to program something similar in Java?

About Shazam

Shazam is an application which you can use to analyse/match music. When you install it on your phone, and hold the microphone to some music for about 20 to 30 seconds, it will tell you which song it is.

When I first used it it gave me a magical feeling. “How did it do that!?”. And even today, after using it a lot, it still has a bit of magical feel to it.
Wouldn’t it be great if we can program something of our own that gives that same feeling? That was my goal for the past weekend.

Listen up..!

First things first, get the music sample to analyse we first need to listen to the microphone in our Java application…! This is something I hadn’t done yet in Java, so I had no idea how hard this was going to be.

But it turned out it was very easy:


final AudioFormat format = getFormat(); //Fill AudioFormat with the wanted settings
DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
final TargetDataLine line = (TargetDataLine) AudioSystem.getLine(info);
line.open(format);
line.start();

Now we can read the data from the TargetDataLine just like a normal InputStream:

// In another thread I start:

OutputStream out = new ByteArrayOutputStream();
running = true;

try {
    while (running) {
        int count = line.read(buffer, 0, buffer.length);
        if (count > 0) {
            out.write(buffer, 0, count);
        }
    }
    out.close();
} catch (IOException e) {
    System.err.println("I/O problems: " + e);
    System.exit(-1);
}

Using this method it is easy to open the microphone and record all the sounds! The AudioFormat I’m currently using is:


private AudioFormat getFormat() {
    float sampleRate = 44100;
    int sampleSizeInBits = 8;
    int channels = 1; //mono
    boolean signed = true;
    boolean bigEndian = true;
    return new AudioFormat(sampleRate, sampleSizeInBits, channels, signed, bigEndian);
}

So, now we have the recorded data in a ByteArrayOutputStream, great! Step 1 complete.

Microphone data

The next challenge is analyzing the data, when I outputted the data I received in my byte array I got a long list of numbers, like this:

0
0
1
2
4
7
6
3
-1
-2
-4
-2
-5
-7
-8
(etc)

Erhm… yes? This is sound?

To see if the data could be visualized I took the output and placed it in Open Office to generate a line graph:

Ah yes! This kind of looks like ‘sound’. It looks like what you see when using for example Windows Sound Recorder.

This data is actually known as time domain. But these numbers are currently basically useless to us… if you read the above article on how Shazam works you’ll read that they use a spectrum analysis instead of direct time domain data.
So the next big question is: How do we transform the current data into a spectrum analysis?

Discrete Fourier transform

To turn our data into usable data we need to apply the so called Discrete Fourier Transformation. This turns the data from time domain into frequency domain.
There is just one problem, if you transform the data into the frequency domain you loose every bit of information regarding time. So you’ll know what the magnitude of all the frequencies are, but you have no idea when they appear.

To solve this we need a sliding window. We take chunks of data (in my case 4096 bytes of data) and transform just this bit of information. Then we know the magnitude of all frequencies that occur during just these 4096 bytes.

Implementing this

Instead of worrying about the Fourier Transformation I googled a bit and found code for the so called FFT (Fast Fourier Transformation). I’m calling this code with the chunks:


byte audio[] = out.toByteArray();

final int totalSize = audio.length;

int amountPossible = totalSize/Harvester.CHUNK_SIZE;

//When turning into frequency domain we'll need complex numbers:
Complex[][] results = new Complex[amountPossible][];

//For all the chunks:
for(int times = 0;times < amountPossible; times++) {
    Complex[] complex = new Complex[Harvester.CHUNK_SIZE];
    for(int i = 0;i < Harvester.CHUNK_SIZE;i++) {
        //Put the time domain data into a complex number with imaginary part as 0:
        complex[i] = new Complex(audio[(times*Harvester.CHUNK_SIZE)+i], 0);
    }
    //Perform FFT analysis on the chunk:
    results[times] = FFT.fft(complex);
}

//Done!

Now we have a double array containing all chunks as Complex[]. This array contains data about all frequencies. To visualize this data I decided to implement a full spectrum analyzer (just to make sure I got the math right).
To show the data I hacked this together:


for(int i = 0; i < results.length; i++) {
    int freq = 1;
    for(int line = 1; line < size; line++) {
        // To get the magnitude of the sound at a given frequency slice
        // get the abs() from the complex number.
        // In this case I use Math.log to get a more managable number (used for color)
        double magnitude = Math.log(results[i][freq].abs()+1);

        // The more blue in the color the more intensity for a given frequency point:
        g2d.setColor(new Color(0,(int)magnitude*10,(int)magnitude*20));
        // Fill:
        g2d.fillRect(i*blockSizeX, (size-line)*blockSizeY,blockSizeX,blockSizeY);
		
        // I used a improviced logarithmic scale and normal scale:
        if (logModeEnabled && (Math.log10(line) * Math.log10(line)) > 1) {
            freq += (int) (Math.log10(line) * Math.log10(line));
        } else {
            freq++;
        }
    }
}

Introducing, Aphex Twin

This seems a bit of OT (off-topic), but I’d like to tell you about a electronic musician called Aphex Twin (Richard David James). He makes crazy electronic music… but some songs have an interesting feature. His biggest hit for example, Windowlicker has a spectrogram image in it.
If you look at the song as spectral image it shows a nice spiral. Another song, called ‘Mathematical Equation’ shows the face of Twin! More information can be found here: Bastwood – Aphex Twin’s face.

When running this song against my spectral analyzer I get the following result:

Not perfect, but it seems to be Twin’s face!

Determining the key music points

The next step in Shazam’s algorithm is to determine some key points in the song, save those points as a hash and then try to match on them against their database of over 8 million songs. This is done for speed, the lookup of a hash is O(1) speed. That explains a lot of the awesome performance of Shazam!

Because I wanted to have everything working in one weekend (this is my maximum attention span sadly enough, then I need a new project to work on) I kept my algorithm as simple as possible. And to my surprise it worked.

For each line the in spectrum analysis I take the points with the highest magnitude from certain ranges. In my case: 40-80, 80-120, 120-180, 180-300.


//For every line of data:

for (int freq = LOWER_LIMIT; freq < UPPER_LIMIT-1; freq++) {
    //Get the magnitude:
    double mag = Math.log(results[freq].abs() + 1);

    //Find out which range we are in:
    int index = getIndex(freq);

    //Save the highest magnitude and corresponding frequency:
    if (mag > highscores[index]) {
        highscores[index] = mag;
        recordPoints[index] = freq;
    }
}

//Write the points to a file:
for (int i = 0; i < AMOUNT_OF_POINTS; i++) {
    fw.append(recordPoints[i] + "\t");
}
fw.append("\n");

// ... snip ...


public static final int[] RANGE = new int[] {40,80,120,180, UPPER_LIMIT+1};

//Find out in which range
public static int getIndex(int freq) {
    int i = 0;
    while(RANGE[i] < freq) i++;
        return i;
    }
}

When we record a song now, we get a list of numbers such as:

33	56	99	121	195	
30	41	84	146	199	
33	51	99	133	183	
33	47	94	137	193	
32	41	106	161	191	
33	76	95	123	185	
40	68	110	134	232	
30	62	88	125	194	
34	57	83	121	182	
34	42	89	123	182	
33	56	99	121	195	
30	41	84	146	199	
33	51	99	133	183	
33	47	94	137	193	
32	41	106	161	191	
33	76	95	123	185	

If I record a song and look at it visually it looks like this:


(all the red dots are ‘important points’)

Indexing my own music

With this algorithm in place I decided to index all my 3000 songs. Instead of using the microphone you can just open mp3 files, convert them to the correct format, and read them the same way we did with the microphone, using an AudioInputStream. Converting stereo music into mono-channel audio was a bit trickier then I hoped. Examples can be found online (requires a bit too much code to paste here) have to change the sampling a bit.

Matching!

The most important part of the program is the matching process. Reading Shazams paper they use hashing to get matches and the decide which song was the best match.

Instead of using difficult point-groupings in time I decided to use a line of our data (for example “33, 47, 94, 137″) as one hash: 1370944733
(in my tests using 3 or 4 points works best, but tweaking is difficult, I need to re-index my mp3 every time!)

Example hash-code using 4 points per line:


//Using a little bit of error-correction, damping
private static final int FUZ_FACTOR = 2;

private long hash(String line) {
    String[] p = line.split("\t");
    long p1 = Long.parseLong(p[0]);
    long p2 = Long.parseLong(p[1]);
    long p3 = Long.parseLong(p[2]);
    long p4 = Long.parseLong(p[3]);
    return  (p4-(p4%FUZ_FACTOR)) * 100000000 + (p3-(p3%FUZ_FACTOR)) * 100000 + (p2-(p2%FUZ_FACTOR)) * 100 + (p1-(p1%FUZ_FACTOR));
}

Now I create two data sets:

- A list of songs, List<String> (List index is Song-ID, String is songname)
- Database of hashes: Map<Long, List<DataPoint>>

The long in the database of hashes represents the hash itself, and it has a bucket of DataPoints.

A DataPoint looks like:

private class DataPoint {

    private int time;
    private int songId;

    public DataPoint(int songId, int time) {
        this.songId = songId;
        this.time = time;
    }
	
    public int getTime() {
        return time;
    }
    public int getSongId() {
        return songId;
    }
}

Now we already have everything in place to do a lookup. First I read all the songs and generate hashes for each point of data. This is put into the hash-database.
The second step is reading the data of the song we need to match. These hashes are retrieved and we look at the matching datapoints.

There is just one problem, for each hash there are some hits, but how do we determine which song is the correct song..? Looking at the amount of matches? No, this doesn’t work…
The most important thing is timing. We must overlap the timing…! But how can we do this if we don’t know where we are in the song? After all, we could just as easily have recorded the final chords of the song.

By looking at the data I discovered something interesting, because we have the following data:

- A hash of the recording
- A matching hash of the possible match
- A song ID of the possible match
- The current time in our own recording
- The time of the hash in the possible match

Now we can substract the current time in our recording (for example, line 34) with the time of the hash-match (for example, line 1352). This difference is stored together with the song ID. Because this offset, this difference, tells us where we possibly could be in the song.
When we have gone through all the hashes from our recording we are left with a lot of song id’s and offsets. The cool thing is, if you have a lot of hashes with matching offsets, you’ve found your song.

The results

For example, when listening to The Kooks – Match Box for just 20 seconds, this is the output of my program:

Done loading: 2921 songs

Start matching song...

Top 20 matches:

01: 08_the_kooks_-_match_box.mp3 with 16 matches.
02: 04 Racoon - Smoothly.mp3 with 8 matches.
03: 05 Röyksopp - Poor Leno.mp3 with 7 matches.
04: 07_athlete_-_yesterday_threw_everyting_a_me.mp3 with 7 matches.
05: Flogging Molly - WMH - Dont Let Me Dia Still Wonderin.mp3 with 7 matches.
06: coldplay - 04 - sparks.mp3 with 7 matches.
07: Coldplay - Help Is Round The Corner (yellow b-side).mp3 with 7 matches.
08: the arcade fire - 09 - rebellion (lies).mp3 with 7 matches.
09: 01-coldplay-_clocks.mp3 with 6 matches.
10: 02 Scared Tonight.mp3 with 6 matches.
11: 02-radiohead-pyramid_song-ksi.mp3 with 6 matches.
12: 03 Shadows Fall.mp3 with 6 matches.
13: 04 Röyksopp - In Space.mp3 with 6 matches.
14: 04 Track04.mp3 with 6 matches.
15: 05 - Dress Up In You.mp3 with 6 matches.
16: 05 Supergrass - Can't Get Up.mp3 with 6 matches.
17: 05 Track05.mp3 with 6 matches.
18: 05The Fox In The Snow.mp3 with 6 matches.
19: 05_athlete_-_wires.mp3 with 6 matches.
20: 06 Racoon - Feel Like Flying.mp3 with 6 matches.

Matching took: 259 ms

Final prediction: 08_the_kooks_-_match_box.mp3.song with 16 matches.

It works!!

Listening for 20 seconds it can match almost all the songs I have. And even this live recording of the Editors could be matched to the correct song after listening 40 seconds!

Again it feels like magic! :-)

Currently, the code isn’t in a releasable state and it doesn’t work perfectly. It has been a pure weekend-hack, more like a proof-of-concept / algorithm exploration.

Maybe, if enough people ask about it, I’ll clean it up and release it somewhere.

Update:

The Shazam patent holders lawyers are sending me emails to stop me from releasing the code and removing this blogpost, read the story here.

Tagged with:
 
  • http://about.me/dacostarepublic Conal Da Costa

    Very nice, but what is buffer? The Variable you write out to the ByteArrayOutputStream?

  • kairo

    Well done!!!

    Can I get a copy of the code?

    Thanks,

    Kairo.

  • avaziyi

    So amazing.
    Moved by your hard working and fascinated by your creation.
    Could you send code to me? lzy_8921@163.com
    Thank you so much

  • Khushboo

    Very well explained.
    Can I get a copy of your code?
    Me email is khushboo810@gmail.com

  • Damla

    Can i get a copy of your code too please

  • lifan

    Creating Shazam in Java,Very well explained.
    Can I get a copy of your code,please?
    My email is 670181443@qq.com
    thank you!

  • Dhanush

    My project mates and me are doing a similar project and are halfway through it.Thanks to the chunks of code in your website.However we would be glad if you send us the complete code.
    Email- dhanushdiwakar@yahoo.co.in

  • Larcus94

    Thanks for this great post. However, would you mind explaining me the matching process a bit further? I don’t get how you determine a match by shifting the time. So I don’t grasp how you get matching offsets? Thanks in advance!

  • Deepak

    I have implemented this and then found that HOW BAD it works . Definitely this is excellent the starting point for everyone who wants to do this.

  • James

    Thank you for your terrific post.
    Can I get a copy of your code?
    Me email is levoru3@gmail.com
    Thank you in advance.

  • Mike

    I, too, would like to see the whole code released :)

  • Ericson

    Thank you for your solution.
    I would like to have the source code for tests.
    Could you please send it to me or someone else who has it?
    Here is my e-mail ericsonduarte@hotmail.com.
    Would be a huge favour.
    Regards,

    Ericson Duarte

  • Brian Kibet

    Such a nice application. I love it. Please send me a copy of source code i comprehend fully to my mail bryankibet@gmail.com

  • Greatije

    what this code means ? “double mag = Math.log(results[freq].abs() + 1);”
    i still cant get it , thanks

  • Vitaliy

    Very interesting, thank you!

    May I have a copy of code to mrpd2013@mail.ru?

    Vitaliy

  • Mario

    Hi, I am developing an student project and I would really like to have the code,

    because I have already developed the full algorithm, I can extract the fingerprint, but something it’s not clear, when I compare a full song and a crop of it, it doesn’t match at all. I would be really interested on it. Thanks a lot.

    mariottf@hotmail.com

    Regards!

  • Ericson

    After much effort analyzing this pseudocode and development classes to study signals obtained:

    ————————————– Example 1(Best) ——————————–
    File: D:\WarAndCrimeMono.wav
    Channels: 1, Frames: 12410496
    IO State: READING
    Sample Rate: 44100, Block Align: 2
    Valid Bits: 16, Bytes per sample: 2
    Frames Original: 12410496
    Fs Original: 44100
    Nr Channel Original: 1
    ——– Done Original in 9393,000000ms

    File: D:\Recording.wav
    Channels: 1, Frames: 879236
    IO State: READING
    Sample Rate: 44100, Block Align: 2
    Valid Bits: 16, Bytes per sample: 2
    Frames Recording: 879236
    Fs Recording: 44100
    Nr Channel Recording: 1
    ——– Done Recording in 633,000000ms

    Time Original -> 16,346848 | Time Recording -> 1,486077 | OffSet -> 14,860771
    Time Original -> 241,487528 | Time Recording -> 1,857596 | OffSet -> 239,629932
    Time Original -> 186,502676 | Time Recording -> 1,857596 | OffSet -> 184,645079
    Time Original -> 200,991927 | Time Recording -> 1,857596 | OffSet -> 199,134331
    Time Original -> 16,718367 | Time Recording -> 1,857596 | OffSet -> 14,860771
    Time Original -> 16,718367 | Time Recording -> 1,857596 | OffSet -> 14,860771
    Time Original -> 16,718367 | Time Recording -> 1,857596 | OffSet -> 14,860771
    Time Original -> 189,474830 | Time Recording -> 1,857596 | OffSet -> 187,617234
    Time Original -> 17,089887 | Time Recording -> 2,229116 | OffSet -> 14,860771
    Time Original -> 52,755737 | Time Recording -> 2,600635 | OffSet -> 50,155102
    Time Original -> 243,716644 | Time Recording -> 2,600635 | OffSet -> 241,116009
    Time Original -> 4,829751 | Time Recording -> 2,600635 | OffSet -> 2,229116
    Time Original -> 18,575964 | Time Recording -> 3,715193 | OffSet -> 14,860771
    Time Original -> 7,430385 | Time Recording -> 4,086712 | OffSet -> 3,343673
    Time Original -> 11,145578 | Time Recording -> 4,086712 | OffSet -> 7,058866
    Time Original -> 98,452608 | Time Recording -> 4,086712 | OffSet -> 94,365896
    Time Original -> 108,855147 | Time Recording -> 4,086712 | OffSet -> 104,768435
    Time Original -> 18,947483 | Time Recording -> 4,086712 | OffSet -> 14,860771
    Time Original -> 136,347574 | Time Recording -> 4,458231 | OffSet -> 131,889342
    Time Original -> 19,690522 | Time Recording -> 4,829751 | OffSet -> 14,860771
    Time Original -> 19,690522 | Time Recording -> 4,829751 | OffSet -> 14,860771
    Time Original -> 20,062041 | Time Recording -> 5,201270 | OffSet -> 14,860771
    Time Original -> 154,552018 | Time Recording -> 5,201270 | OffSet -> 149,350748
    Time Original -> 21,176599 | Time Recording -> 6,315828 | OffSet -> 14,860771
    Time Original -> 214,366621 | Time Recording -> 6,315828 | OffSet -> 208,050794
    Time Original -> 195,790658 | Time Recording -> 6,687347 | OffSet -> 189,103311
    Time Original -> 26,377868 | Time Recording -> 7,058866 | OffSet -> 19,319002
    Time Original -> 7,430385 | Time Recording -> 7,430385 | OffSet -> 0,000000
    Time Original -> 11,145578 | Time Recording -> 7,430385 | OffSet -> 3,715193
    Time Original -> 98,452608 | Time Recording -> 7,430385 | OffSet -> 91,022222
    Time Original -> 108,855147 | Time Recording -> 7,430385 | OffSet -> 101,424762
    Time Original -> 22,291156 | Time Recording -> 7,430385 | OffSet -> 14,860771
    Time Original -> 25,263311 | Time Recording -> 7,430385 | OffSet -> 17,832925
    Time Original -> 78,762086 | Time Recording -> 7,801905 | OffSet -> 70,960181
    Time Original -> 23,034195 | Time Recording -> 8,173424 | OffSet -> 14,860771
    Time Original -> 24,891791 | Time Recording -> 8,916463 | OffSet -> 15,975329
    Time Original -> 38,266485 | Time Recording -> 8,916463 | OffSet -> 29,350023
    Time Original -> 163,468481 | Time Recording -> 8,916463 | OffSet -> 154,552018
    Time Original -> 23,777234 | Time Recording -> 8,916463 | OffSet -> 14,860771
    Time Original -> 23,777234 | Time Recording -> 8,916463 | OffSet -> 14,860771
    Time Original -> 227,741315 | Time Recording -> 9,287982 | OffSet -> 218,453333
    Time Original -> 10,774059 | Time Recording -> 9,287982 | OffSet -> 1,486077
    Time Original -> 53,870295 | Time Recording -> 9,659501 | OffSet -> 44,210794
    Time Original -> 145,635556 | Time Recording -> 9,659501 | OffSet -> 135,976054
    Time Original -> 24,520272 | Time Recording -> 9,659501 | OffSet -> 14,860771
    Time Original -> 241,116009 | Time Recording -> 9,659501 | OffSet -> 231,456508
    Time Original -> 24,891791 | Time Recording -> 10,031020 | OffSet -> 14,860771
    Time Original -> 38,266485 | Time Recording -> 10,031020 | OffSet -> 28,235465
    Time Original -> 24,891791 | Time Recording -> 10,031020 | OffSet -> 14,860771
    Time Original -> 206,936236 | Time Recording -> 10,031020 | OffSet -> 196,905215
    Time Original -> 34,922812 | Time Recording -> 10,402540 | OffSet -> 24,520272
    Time Original -> 164,954558 | Time Recording -> 10,402540 | OffSet -> 154,552018
    Time Original -> 25,634830 | Time Recording -> 10,774059 | OffSet -> 14,860771
    Time Original -> 23,034195 | Time Recording -> 11,145578 | OffSet -> 11,888617
    Time Original -> 26,006349 | Time Recording -> 11,145578 | OffSet -> 14,860771
    Time Original -> 26,006349 | Time Recording -> 11,145578 | OffSet -> 14,860771
    Time Original -> 26,377868 | Time Recording -> 11,517098 | OffSet -> 14,860771
    Time Original -> 32,693696 | Time Recording -> 11,888617 | OffSet -> 20,805079
    Time Original -> 104,396916 | Time Recording -> 12,260136 | OffSet -> 92,136780
    Time Original -> 27,492426 | Time Recording -> 12,631655 | OffSet -> 14,860771
    Time Original -> 27,492426 | Time Recording -> 12,631655 | OffSet -> 14,860771
    Time Original -> 27,492426 | Time Recording -> 12,631655 | OffSet -> 14,860771
    Time Original -> 27,492426 | Time Recording -> 12,631655 | OffSet -> 14,860771
    Time Original -> 27,492426 | Time Recording -> 12,631655 | OffSet -> 14,860771
    Time Original -> 27,863946 | Time Recording -> 13,003175 | OffSet -> 14,860771
    Time Original -> 27,863946 | Time Recording -> 13,003175 | OffSet -> 14,860771
    Time Original -> 27,863946 | Time Recording -> 13,003175 | OffSet -> 14,860771
    Time Original -> 28,235465 | Time Recording -> 13,374694 | OffSet -> 14,860771
    Time Original -> 278,267937 | Time Recording -> 13,374694 | OffSet -> 264,893243
    Time Original -> 28,606984 | Time Recording -> 13,746213 | OffSet -> 14,860771
    Time Original -> 28,978503 | Time Recording -> 14,117732 | OffSet -> 14,860771
    Time Original -> 29,350023 | Time Recording -> 14,489252 | OffSet -> 14,860771
    Time Original -> 16,346848 | Time Recording -> 14,489252 | OffSet -> 1,857596
    Time Original -> 58,700045 | Time Recording -> 15,232290 | OffSet -> 43,467755
    Time Original -> 50,155102 | Time Recording -> 15,232290 | OffSet -> 34,922812
    Time Original -> 30,464580 | Time Recording -> 15,603810 | OffSet -> 14,860771
    Time Original -> 16,346848 | Time Recording -> 15,975329 | OffSet -> 0,371519
    Time Original -> 30,836100 | Time Recording -> 15,975329 | OffSet -> 14,860771
    Time Original -> 39,381043 | Time Recording -> 16,346848 | OffSet -> 23,034195
    Time Original -> 201,734966 | Time Recording -> 16,718367 | OffSet -> 185,016599
    Time Original -> 238,886893 | Time Recording -> 16,718367 | OffSet -> 222,168526
    Time Original -> 31,950658 | Time Recording -> 17,089887 | OffSet -> 14,860771
    Time Original -> 251,890068 | Time Recording -> 17,089887 | OffSet -> 234,800181
    Time Original -> 237,029297 | Time Recording -> 17,461406 | OffSet -> 219,567891
    Time Original -> 252,261587 | Time Recording -> 17,461406 | OffSet -> 234,800181
    Time Original -> 32,322177 | Time Recording -> 17,461406 | OffSet -> 14,860771
    Time Original -> 32,693696 | Time Recording -> 17,832925 | OffSet -> 14,860771
    Time Original -> 33,436735 | Time Recording -> 18,575964 | OffSet -> 14,860771
    Time Original -> 33,436735 | Time Recording -> 18,575964 | OffSet -> 14,860771
    Time Original -> 33,808254 | Time Recording -> 18,947483 | OffSet -> 14,860771
    Time Original -> 37,894966 | Time Recording -> 18,947483 | OffSet -> 18,947483
    Time Original -> 163,840000 | Time Recording -> 18,947483 | OffSet -> 144,892517
    Time Original -> 176,843175 | Time Recording -> 18,947483 | OffSet -> 157,895692
    Time Original -> 33,808254 | Time Recording -> 18,947483 | OffSet -> 14,860771
    Time Original -> 37,523447 | Time Recording -> 18,947483 | OffSet -> 18,575964
    Time Original -> 37,894966 | Time Recording -> 18,947483 | OffSet -> 18,947483
    Time Original -> 50,155102 | Time Recording -> 18,947483 | OffSet -> 31,207619
    Time Original -> 163,840000 | Time Recording -> 18,947483 | OffSet -> 144,892517
    Time Original -> 176,843175 | Time Recording -> 18,947483 | OffSet -> 157,895692
    Time Original -> 33,808254 | Time Recording -> 18,947483 | OffSet -> 14,860771
    Time Original -> 61,672200 | Time Recording -> 18,947483 | OffSet -> 42,724717
    Time Original -> 73,560816 | Time Recording -> 18,947483 | OffSet -> 54,613333
    Time Original -> 163,840000 | Time Recording -> 18,947483 | OffSet -> 144,892517
    Time Original -> 33,808254 | Time Recording -> 18,947483 | OffSet -> 14,860771
    Time Original -> 14,117732 | Time Recording -> 18,947483 | OffSet -> -4,829751
    Time Original -> 22,291156 | Time Recording -> 18,947483 | OffSet -> 3,343673
    Time Original -> 33,808254 | Time Recording -> 19,319002 | OffSet -> 14,489252
    Time Original -> 37,894966 | Time Recording -> 19,319002 | OffSet -> 18,575964
    Time Original -> 163,840000 | Time Recording -> 19,319002 | OffSet -> 144,520998
    Time Original -> 176,843175 | Time Recording -> 19,319002 | OffSet -> 157,524172
    Time Original -> 34,179773 | Time Recording -> 19,319002 | OffSet -> 14,860771

    Found equality Hashes -> 111
    Matches -> 46
    ——– Done Analysis in 738,000000ms

    ————————————– Example 2 ————————————–

    File: D:\Eminem_Loose_YourselfMono.wav
    Channels: 1, Frames: 14625792
    IO State: READING
    Sample Rate: 44100, Block Align: 2
    Valid Bits: 16, Bytes per sample: 2
    Frames Original: 14625792
    Fs Original: 44100
    Nr Channel Original: 1
    ——– Done Original in 11397,000000ms

    File: D:\Recording.wav
    Channels: 1, Frames: 879236
    IO State: READING
    Sample Rate: 44100, Block Align: 2
    Valid Bits: 16, Bytes per sample: 2
    Frames Recording: 879236
    Fs Recording: 44100
    Nr Channel Recording: 1
    ——– Done Recording in 642,000000ms

    Time Original -> 9,659501 | Time Recording -> 1,486077 | OffSet -> 8,173424
    Time Original -> 65,015873 | Time Recording -> 1,486077 | OffSet -> 63,529796
    Time Original -> 306,874921 | Time Recording -> 2,972154 | OffSet -> 303,902766
    Time Original -> 148,979229 | Time Recording -> 3,343673 | OffSet -> 145,635556
    Time Original -> 55,727891 | Time Recording -> 4,458231 | OffSet -> 51,269660
    Time Original -> 193,561542 | Time Recording -> 5,201270 | OffSet -> 188,360272
    Time Original -> 248,174875 | Time Recording -> 6,315828 | OffSet -> 241,859048
    Time Original -> 305,760363 | Time Recording -> 7,801905 | OffSet -> 297,958458
    Time Original -> 62,786757 | Time Recording -> 8,173424 | OffSet -> 54,613333
    Time Original -> 180,186848 | Time Recording -> 11,888617 | OffSet -> 168,298231
    Time Original -> 191,332426 | Time Recording -> 13,746213 | OffSet -> 177,586213
    Time Original -> 94,365896 | Time Recording -> 14,860771 | OffSet -> 79,505125
    Time Original -> 210,279909 | Time Recording -> 14,860771 | OffSet -> 195,419138
    Time Original -> 155,295057 | Time Recording -> 15,232290 | OffSet -> 140,062766
    Time Original -> 116,657052 | Time Recording -> 16,718367 | OffSet -> 99,938685
    Time Original -> 20,805079 | Time Recording -> 18,947483 | OffSet -> 1,857596
    Time Original -> 86,935510 | Time Recording -> 18,947483 | OffSet -> 67,988027
    Time Original -> 95,851973 | Time Recording -> 18,947483 | OffSet -> 76,904490
    Time Original -> 123,715918 | Time Recording -> 18,947483 | OffSet -> 104,768435
    Time Original -> 235,914739 | Time Recording -> 18,947483 | OffSet -> 216,967256
    Time Original -> 237,029297 | Time Recording -> 18,947483 | OffSet -> 218,081814
    Time Original -> 241,116009 | Time Recording -> 18,947483 | OffSet -> 222,168526
    Time Original -> 241,487528 | Time Recording -> 18,947483 | OffSet -> 222,540045
    Time Original -> 242,230567 | Time Recording -> 18,947483 | OffSet -> 223,283084
    Time Original -> 242,602086 | Time Recording -> 18,947483 | OffSet -> 223,654603
    Time Original -> 242,973605 | Time Recording -> 18,947483 | OffSet -> 224,026122
    Time Original -> 247,060317 | Time Recording -> 18,947483 | OffSet -> 228,112834
    Time Original -> 248,174875 | Time Recording -> 18,947483 | OffSet -> 229,227392
    Time Original -> 248,546395 | Time Recording -> 18,947483 | OffSet -> 229,598912
    Time Original -> 248,917914 | Time Recording -> 18,947483 | OffSet -> 229,970431
    Time Original -> 252,261587 | Time Recording -> 18,947483 | OffSet -> 233,314104
    Time Original -> 265,264762 | Time Recording -> 18,947483 | OffSet -> 246,317279
    Time Original -> 280,868571 | Time Recording -> 18,947483 | OffSet -> 261,921088
    Time Original -> 286,441361 | Time Recording -> 18,947483 | OffSet -> 267,493878
    Time Original -> 298,329977 | Time Recording -> 18,947483 | OffSet -> 279,382494
    Time Original -> 303,159728 | Time Recording -> 18,947483 | OffSet -> 284,212245
    Time Original -> 319,878095 | Time Recording -> 18,947483 | OffSet -> 300,930612
    Time Original -> 20,433560 | Time Recording -> 18,947483 | OffSet -> 1,486077
    Time Original -> 20,805079 | Time Recording -> 18,947483 | OffSet -> 1,857596
    Time Original -> 73,189297 | Time Recording -> 18,947483 | OffSet -> 54,241814
    Time Original -> 96,223492 | Time Recording -> 18,947483 | OffSet -> 77,276009
    Time Original -> 98,452608 | Time Recording -> 18,947483 | OffSet -> 79,505125
    Time Original -> 98,824127 | Time Recording -> 18,947483 | OffSet -> 79,876644
    Time Original -> 220,310930 | Time Recording -> 18,947483 | OffSet -> 201,363447
    Time Original -> 226,998277 | Time Recording -> 18,947483 | OffSet -> 208,050794
    Time Original -> 248,174875 | Time Recording -> 18,947483 | OffSet -> 229,227392
    Time Original -> 258,205896 | Time Recording -> 18,947483 | OffSet -> 239,258413
    Time Original -> 266,750839 | Time Recording -> 18,947483 | OffSet -> 247,803356
    Time Original -> 287,927438 | Time Recording -> 18,947483 | OffSet -> 268,979955
    Time Original -> 298,329977 | Time Recording -> 18,947483 | OffSet -> 279,382494
    Time Original -> 309,475556 | Time Recording -> 18,947483 | OffSet -> 290,528073
    Time Original -> 309,847075 | Time Recording -> 18,947483 | OffSet -> 290,899592
    Time Original -> 310,218594 | Time Recording -> 18,947483 | OffSet -> 291,271111
    Time Original -> 310,590113 | Time Recording -> 18,947483 | OffSet -> 291,642630
    Time Original -> 315,791383 | Time Recording -> 18,947483 | OffSet -> 296,843900
    Time Original -> 318,763537 | Time Recording -> 18,947483 | OffSet -> 299,816054
    Time Original -> 320,621134 | Time Recording -> 18,947483 | OffSet -> 301,673651
    Time Original -> 325,079365 | Time Recording -> 18,947483 | OffSet -> 306,131882
    Time Original -> 192,075465 | Time Recording -> 18,947483 | OffSet -> 173,127982
    Time Original -> 177,586213 | Time Recording -> 18,947483 | OffSet -> 158,638730
    Time Original -> 94,365896 | Time Recording -> 18,947483 | OffSet -> 75,418413
    Time Original -> 251,890068 | Time Recording -> 18,947483 | OffSet -> 232,942585
    Time Original -> 20,805079 | Time Recording -> 19,319002 | OffSet -> 1,486077
    Time Original -> 86,935510 | Time Recording -> 19,319002 | OffSet -> 67,616508
    Time Original -> 95,851973 | Time Recording -> 19,319002 | OffSet -> 76,532971
    Time Original -> 123,715918 | Time Recording -> 19,319002 | OffSet -> 104,396916
    Time Original -> 235,914739 | Time Recording -> 19,319002 | OffSet -> 216,595737
    Time Original -> 237,029297 | Time Recording -> 19,319002 | OffSet -> 217,710295
    Time Original -> 241,116009 | Time Recording -> 19,319002 | OffSet -> 221,797007
    Time Original -> 241,487528 | Time Recording -> 19,319002 | OffSet -> 222,168526
    Time Original -> 242,230567 | Time Recording -> 19,319002 | OffSet -> 222,911565
    Time Original -> 242,602086 | Time Recording -> 19,319002 | OffSet -> 223,283084
    Time Original -> 242,973605 | Time Recording -> 19,319002 | OffSet -> 223,654603
    Time Original -> 247,060317 | Time Recording -> 19,319002 | OffSet -> 227,741315
    Time Original -> 248,174875 | Time Recording -> 19,319002 | OffSet -> 228,855873
    Time Original -> 248,546395 | Time Recording -> 19,319002 | OffSet -> 229,227392
    Time Original -> 248,917914 | Time Recording -> 19,319002 | OffSet -> 229,598912
    Time Original -> 252,261587 | Time Recording -> 19,319002 | OffSet -> 232,942585
    Time Original -> 265,264762 | Time Recording -> 19,319002 | OffSet -> 245,945760
    Time Original -> 280,868571 | Time Recording -> 19,319002 | OffSet -> 261,549569
    Time Original -> 286,441361 | Time Recording -> 19,319002 | OffSet -> 267,122358
    Time Original -> 298,329977 | Time Recording -> 19,319002 | OffSet -> 279,010975
    Time Original -> 303,159728 | Time Recording -> 19,319002 | OffSet -> 283,840726
    Time Original -> 319,878095 | Time Recording -> 19,319002 | OffSet -> 300,559093
    Found equality Hashes -> 84
    Matches -> 7
    ——– Done Analysis in 826,000000ms

    ————————————– Example 3 ————————————–

    File: D:\GentMono.wav
    Channels: 1, Frames: 8574336
    IO State: READING
    Sample Rate: 44100, Block Align: 2
    Valid Bits: 16, Bytes per sample: 2
    Frames Original: 8574336
    Fs Original: 44100
    Nr Channel Original: 1
    ——– Done Original in 6783,000000ms

    File: D:\Recording.wav
    Channels: 1, Frames: 879236
    IO State: READING
    Sample Rate: 44100, Block Align: 2
    Valid Bits: 16, Bytes per sample: 2
    Frames Recording: 879236
    Fs Recording: 44100
    Nr Channel Recording: 1
    ——– Done Recording in 679,000000ms

    Time Original -> 24,891791 | Time Recording -> 0,743039 | OffSet -> 24,148753
    Time Original -> 98,452608 | Time Recording -> 2,972154 | OffSet -> 95,480454
    Time Original -> 28,606984 | Time Recording -> 3,715193 | OffSet -> 24,891791
    Time Original -> 117,028571 | Time Recording -> 4,086712 | OffSet -> 112,941859
    Time Original -> 30,464580 | Time Recording -> 4,458231 | OffSet -> 26,006349
    Time Original -> 84,706395 | Time Recording -> 6,315828 | OffSet -> 78,390567
    Time Original -> 98,081088 | Time Recording -> 9,287982 | OffSet -> 88,793107
    Time Original -> 112,198821 | Time Recording -> 10,774059 | OffSet -> 101,424762
    Time Original -> 32,322177 | Time Recording -> 13,746213 | OffSet -> 18,575964
    Time Original -> 148,979229 | Time Recording -> 14,117732 | OffSet -> 134,861497
    Time Original -> 109,969705 | Time Recording -> 14,489252 | OffSet -> 95,480454
    Time Original -> 130,774785 | Time Recording -> 16,346848 | OffSet -> 114,427937
    Time Original -> 59,814603 | Time Recording -> 18,947483 | OffSet -> 40,867120
    Time Original -> 122,601361 | Time Recording -> 18,947483 | OffSet -> 103,653878
    Found equality Hashes -> 14
    Matches -> 0
    ——– Done Analysis in 579,000000ms

  • Ericson
  • Annie

    Hi, this is awesome! Helped me a lot so far…but I don’t really get the matching part at all. I’m writing a BA thesis on this topic, so could you pls send the the code?
    phuong.anh@hotmail.de

  • Nheo

    Hi, i have read your article and it helped me a lot. I’m studing about querying audio file for my student project, so could you please send me the code? Here is my email: nhe0na@gmail.com.

  • Avi Gvinot

    Hi,great job!!
    there is any way I can get in touch with you, I’m working in a project and your code could be very useful for us.
    Please get in touch

    gvinot1@gmail.com
    Thanks.

  • Juntao Xie

    hi!
    I really appreciate your works. Awesome! I’m a college student from China and I’m trying to implement this in python. There are some questions confusing me:
    1. What’s the data struct of database of hashes in Java? Is it able to support multimap?
    2. While matching and calculating each time offset of datapoints, how to pick up the real time offset? I found that the real time offset was of the highest frequency.
    Hope you can understand me :). Thanks a lot.

  • Dragontamer

    http://en.wikipedia.org/wiki/Cross-correlation

    In signal processing, cross-correlation is a measure of similarity of two waveforms as a function of a time-lag applied to one of them.

    Signal Processing gurus have had to solve this problem over and over again in the past 100 years. Cross-correlation is what is used in radar towers and sonar to “prove” that the waveform matches what you sent out. (You send a unique radio signal out there… it bounces back… and then use cross-correlation on the signals that are coming back).

    Cross-correlation is similar to the convolution function… but it is different as well. Bonus points: every major signals-design textbook has cross-correlation in it. Good luck if someone tries to sue you for patent infringement, people have been using cross-correlation for years.

  • http://www.ssrad.org Sam

    What the f**k are they patenting in particular, everything is straight old math.
    Are they patenting DFT or boolean operator or what, good lord.

    Who cares we will implement this in million of slightly changed ways and open source it on github and they can throw they patents in to garbitch!

  • Steve Dao

    Hi!
    It is really great post. It would be a great help for my project if you can share your code.
    I am looking forward to seeing your respond at binhdna@gmail.com
    Thanks so much :D

  • Carl Jokl

    Well I guess I can’t comment on the Shazam lawyer situation. If you are really interested in Shazam technology, they are hiring right now and it is proving surprisingly difficult to find good people.

  • Carl Jokl

    I cannot comment on the situation with the lawyers however if you are interested in Shazam technology, Shazam are hiring right now and it can be difficult to find good candidates. Albeit you are probably in the wrong location unless you felt like working in London.

    Carl

  • Carlos

    Congrats!!

    Violento!! bacanisimo!!

  • Carlos

    This is awesome, congratulations, would you mind
    to share your code? my email is carlosortizur@gmail.com

  • sama

    Its really interesting, we can actually tag the complete data base of (local language) songs and then create our own data base and using this data we can categorize songs based on the pattern which is going to be my next weekend project, thanks, this looks amazing, it would be awesome if you can share the code on github. and for the patent infringement i think the code doesn’t break any of the copyright infringements, google has its own music tagging service(google now) and sound hound etc,… has their own way of identifying music i think you have implemented a simple algorithm in your own code and implementing algorithms cant come under copyright infringements (at-least for the one you have used)