Compare two spectogram to find the offset where they match algorithm

Compare two spectogram to find the offset where they match algorithm - c#

I record a daily 2 minutes radio broadcast from Internet. There's always the same starting and ending jingle. Since the radio broadcast exact time may vary from more or less 6 minutes I have to record around 15 minutes of radio.
I wish to identify the exact time where those jingles are in the 15 minutes record, so I can extract the portion of audio I want.
I already started a C# application where I decode an MP3 to PCM data and convert the PCM data to a spectrogram based on http://www.codeproject.com/KB/audio-video/SoundCatcher.aspx
I tried to use a Cross Correlation algorithm on the PCM data but the algorithm is very slow around 6 minutes with a step of 10ms and is some occasion it fail to find the jingle start time.
Any ideas of algorithms to compare two spectrogram for match? Or a better way to find that jingle start time?
Thanks,
Update, sorry for the delay
First, thank for all the anwsers most of them were relevent and or interresting ideas.
I tried to implement the Shazam algorithm proposed by fonzo. But failed to detect the peaks in the spectrogram. Here's three spectrograms of the starting jingle from three different records. I tried AForge.NET with the blob filter (but it failed to identify peaks), to blur the image and check for difference in height, the Laplace convolution, slope analysis, to detect the series of vertical bars (but there was too many false positive)...
In the mean while, I tried the Hough algorithm proposed by Dave Aaron Smith. Where I calculate the RMS of each columns. Yes yes each columns, it's a O(N*M) but M << N (Notice a column is around 8k of sample). So in the overall it's not that bad, still the algorithm take about 3 minutes, but has never fail.
I could go with that solution, but if possible, I would prefer the Shazam cause it's O(N) and probably much faster (and cooler also). So does any of you have an idea of an algorithm to always detect the same points in those spectrograms (doesn't have to be peaks), thanks to add a comment.
New Update
Finally, I went with the algorithm explained above, I tried to implement the Shazam algorithm, but failed to find proper peaks in the spectrogram, the identified points where not constant from one sound file to another. In theory, the Shazam algorithm is the solution for that kind of problem. The Hough algorithm proposed by Dave Aaron Smith was more stable and effective. I split around 400 files, and only 20 of them fail to split properly. Disk space when from 8GB to 1GB.
Thanks, for your help.

There's a description of the algorithm used by the shazam service (which identifies a music given a short possibly noisy sample) here : http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
From what I understood, the first thing done is to isolate peaks in the spectrogram (with some tweaks to assure an uniform coverage), which will give a "constellation" of pair of values (time;frequency) from the initial spectrogram. Once done, the sample constellation is compared to the constellation of the full track by translating a window of the sample length from the beginning to the end and counting the number of correlated points.
The paper then describes the technical solution they found to be able to do the comparison fast even with a huge collection of tracks.

I wonder if you could use a Hough transform. You would start by cataloging each step of the opening sequence. Let's say you use 10 ms steps and the opening sequence is 50 ms long. You compute some metric on each step and get
1 10 1 17 5
Now go through your audio and analyze each 10 ms step for the same metric. Call this array have_audio
8 10 8 7 5 1 10 1 17 6 2 10...
Now create a new empty array that's the same length as have_audio. Call it start_votes. It will contain "votes" for the start of the opening sequence. If you see a 1, you may be in the 1st or 3rd step of the opening sequence, so you have 1 vote for the opening sequence starting 1 step ago and 1 vote for the opening sequence starting 3 steps ago. If you see a 10, you have 1 vote for the opening sequence starting 2 steps ago, a 17 votes for 4 step ago, and so on.
So for that example have_audio, your votes will look like
2 0 0 1 0 4 0 0 0 0 0 1 ...
You have a lot of votes at position 6, so there's a good chance the opening sequence starts there.
You could improve performance by not bothering to analyze the entire opening sequence. If the opening sequence is 10 seconds long, you could just search for the first 5 seconds.

Here is a good python package that does just this:
https://code.google.com/p/py-astm/
If you are looking for a specific algorithm, good search terms to use are "accoustic fingerprinting" or "perceptual hashing".
Here's another python package that could also be used:
http://rudd-o.com/new-projects/python-audioprocessing/documentation/manuals/algorithms/butterscotch-signatures

If you already know the jingle sequence, you could analyse the correlation with the sequence instead of the cross correlation between the full 15 minutes tracks.
To quickly calculate the correlation against the (short) sequence, I would suggest using a Wiener filter.
Edit: a Wiener filter is a way to locate a signal in a sequence with noise. In this application, we are considering anything that is "not jingle" as noise (question for the reader: can we still assume that the noise is white and not correlated?).
( I found the reference I was looking for! The formulas I remembered were a little off and I'll remove them now)
The relevant page is Wiener deconvolution. The idea is that we can define a system whose impulse response h(t) has the same waveform as the jingle, and we have to locate the point in a noisy sequence where the system has received an impulse (i.e.: emitted a jingje).
Since the jingle is known, we can calculate its power spectrum H(f), and since we can assume that a single jingle appears in a recorded sequence, we can say that the unknown input x(t) has the shape of a pulse, whose power density S(f) is constant at each frequency.
Given the knowledges above, you can use the formula to obtain a "jingle-pass" filter (as in, only signals shaped like the jingle can pass) whose output is highest when the jingle is played.

Related

Can I search memory for a specific object structure?

Last night I took CheatEngine for a ride, and I found a structure in the game i currently play. The game has 4 characters, each with "health" and "mana", both of these are 4 bytes (int). Is there a way I can scan the application to find the first occurrence?
What I found was that health of player 1 was located at 2DC2E72C, and I'm going to short it to "72C" since the other players health comes very exact after that.
Player 1 health: 72C
Player 2 health: 81C
Player 3 health: 90C
Player 4 health: 9FC
After some handywork with my trusted microsoft calculator, I found that it is 240 bytes between each players health. Players mana is 4 bytes, placed right after health, so the structure is:
000 Player 1 health
004 Player 1 mana
240 Player 2 health
244 Player 2 mana
and so on.
So my question is, could I search for this pattern in the applications memory? The pattern would be something along the line of: 2x4 bytes, 240 bytes, 2x4 bytes, 240 bytes....

If you have a text file containing the memory contents you could use regular expressions to search for the pattern you want. Boost has a good library for regular expressions.

I think there won't be a managed way to do that.
However, CheatEngine is an open source program, that does the same
http://www.cheatengine.org/
Maybe you could check out the source code and figure out which API calls you need to achive the same with C#
Update: I see, you already mentioned CheatEngine, overlooked it the first time.
I found this article on CodeProject http://www.codeproject.com/Articles/15680/How-to-write-a-Memory-Scanner-using-C
Looks simple

There was a cheat program back in the day called FreeCheese if I remember correctly. The way it worked was something like this :
Take search value from user input
Scan game's process memory for any values that match users input and
store found addresses
Ask user to change the value ingame and take
new value as input
Rescan addresses from step 2 and discard those that don't match the new value
Repeat steps 3 and 4 until an address (number of addresses) is found that always reflect the changes specified by the user
Ask the user for a new value and write that value to the addresses found
Step 6 is tricky since you will need to do type / size checks to make sure you can actually apply the new value.
Happy cheating :)

The output value is expeceted to be from 0 to 1 but sometimes it produces more than 1

I have the following neural network which uses RPOP - Resilent back propagation
NetCore = new BasicNetwork();
NetCore.AddLayer(new BasicLayer(null, false, 32));
NetCore.AddLayer(new BasicLayer(new ActivationTANH(), true, 65));
NetCore.AddLayer(new BasicLayer(new ActivationTANH(), true, 65));
NetCore.AddLayer(new BasicLayer(new ActivationSigmoid(), false, 1));
NetCore.Structure.FinalizeStructure();
NetCore.Reset();
(I've posted the code just to be sure that i am doing right, if no someone would point out, i hope)
After training the network, the error rate is minimized to around 1%, i pass the test data and most of the time the output is produced something like this "5,07080020755566E-10" where i expect numbers from 0 to 1 and also it should be noted that when such cases occur they are always positive number(haven't encountered negative outputs yet).
The second question, which i wanted to ask, is as follows : the neural network is meant to predict soccer matches, so considering that i have 32 inputs. 16 inputs are for team 1 performance data and the 16 are for team 2.
The training sets are prepared like so: say we have 1000 matches and all of those training sets' output is 1.
So during the preparation of the training sets reversed matches are added additonaly, where the output is 0 and of course team 1 and team 2 inputs are changed respectively.
and when testing i get the following results for the same match
Output 0,0125940938512236 Desired 1 direct
Output 0,0386960820583483 Desired 0 reversed
The question is why? :)
I will appreciate any help.
Spreading a light to this issue would point me the direction where should i dig.
Thanks in advance.

After training the network, the error rate is minimized to around 1%, i pass the test data and most of the time the output is produced something like this "5,07080020755566E-10" where i expect numbers from 0 to 1 and also it should be noted that when such cases occur they are always positive number(haven't encountered negative outputs yet).
5,07080020755566E-10 is a number between 0 and 1. It's a very small number - only just a tiny bit more than 0. (I'm assuming your culture uses comma as a decimal separator.) It's 0,00000000050708(...) - the E-10 means "shifted 10 decimal places to the right".
I didn't really follow your second question - I suggest you ask it separately, and with more detail - assuming it's really a programming question at all. (It's hard to tell at the moment.)

Non-Repetitive Random Alphanumeric Code

One of my clients wants to use a unique code for his items (long story..) and he asked me for a solution. The code will consist in 4 parts in which the first one is the zip code where the item is sent from, the second one is the supplier registration number, the third number is the year when the item is sent and the last part is a three division alphanumeric unique character.
As you can see the first three parts are static fields which will never change for the same sender in the same year. So we can say that the last part is the identifier part for that year. This part is 3-division alpahnumeric which means starting from 000 and ending with ZZZ.
The problem is that my client, for some reasonable reasons, wants this part to be not sequential. For example this is not what he wants:
06450-05-2012-000
06450-05-2012-001
06450-05-2012-002
...
06450-05-2012-ZZY
06450-05-2012-ZZZ
The last part should produced randomly like:
06450-05-2012-A17
06450-05-2012-0BF
06450-05-2012-002
...
06450-05-2012-T7W
06450-05-2012-22C
But it should also non-repetitive. So once a possible id is generated the possibility should be discarded from the selection pool.
I am looking for an effective way to do this.
If I only record selected possibilities and check a newly created one against them there is always a worst case possibility that it keeps producing already selected ones, especially near the end.
If I create all possibilities at once and record them in a table or a file it may take a while after every item creation because it will lookup for a non-selected record. By the way 26 letters + 10 digits means 46.656 possible combinations, and there is a chance that there may be a 4th divison added which means 1.679.616 possible combinations.
Is there a more effective way you can suggest? I will use C# for coding and MS SQL for databese..

If it doesn't have to be random, you could maybe simply choose a fixed but "unpredictable" addend which is relatively prime to 26 + 10 == 36 == 2²·3². This means, just choose a fixed addend divisible by neither 2 nor 3.
Then keep adding this fixed number to your previous serial number every time you need a new serial number. This is to be done modulo 46656 (or 1679616) of course.
Mathematics guarantees you won't get the same number twice (before no more "free" numbers are left).
As the addend, you could use const int addend = 26075 since it's 5 modulo 6.

If you expect to create far less than 36^3 entries for each zip-supplier-year tuple, you should probably just pick a random value for the last field and then check to see if it exists, repeating if it does.
Even if you create half of the maximum number of possible entries, new entries still have an expected value of only one failure. Assuming your database is indexed on the overall identifier, this isn't too great a price to pay.
That said, if you expect to use all but a few possible identifiers, then you should probably create all the possible records in advance. It may sounds like a high cost, but each space in memory storing an unused record will eventually store a real record.
I'd expect the first situation is more likely, but if not, or if there's some other combination of the two, please add a comment with some more information and I'll revise my answer.

I think options depend on the amount of the codes that are going to be used:
If you expect to use most of them within a year, then it is better to pre-generate. If done right, lookup should be really fast. And you are going to have 1.679.616 items per year in your DB anyway, so you will have to do such things right.
On the other hand, is it good that you are expecting to use most of them? It may leave you without codes if there are suddenly more items than expected.
If you expect to use only a small amount, then random+existence check might be a way to go, however it is unclear what amount it should be for that to be best (I am pretty sure it is possible to calculate that though).

Calculate the score only based on the documents have more occurance of term in lucene

I am started working on resume retrieval(document) component based on lucene.net engine. It works great, and it fetches the document and score it based on the
the idea behind the VSM is the more
times a query term appears in a
document relative to the number of
times the term appears in all the
documents in the collection, the more
relevant that document is to the
query.
Lucene's Practical Scoring Function is derived from the below.
score(q,d)=coord(q,d)·queryNorm(q)· ∑( tf(t in d) ·idf(t)2 · t.getBoost() · norm(t,d) )
t in q
in this
tf(t in d) correlates to the term's frequency, defined as the number of times term t appears in the currently scored document d. Documents that have more occurrences of a given term receive a higher score
idf(t) stands for Inverse Document Frequency. This value correlates to the inverse of docFreq (the number of documents in which the term t appears). This means rarer terms give higher contribution to the total score.
This is very great indeed in most of the situation, but due to the fieldnorm calculation the result is not accurate
fieldnorm aka "field length norm" value represents the length of that field in that doc (so shorter fields are automatically boosted up).
Due to this we didn't get the accurate results.
Say for an example i got 10000 documents in which 3000 documents got java and oracle keyword. And the no of times it appears vary on each document.
assume doc A got 10 java 20 oracle among 1000 words and doc B got 2 java 2 oracle among 50 words
if am searching for a query "java and oracle", lucene returns doc B with high score
due to the length normalization.
Due to the nature of the business we need to retrieve the documents got more search keyword occurrence should come first, we don't really care about the length of the document.
Because of this a Guy with a big resume with lot of keywords is been moved below in the result and some small resumes came up.
To avoid that i need to disable length normalization. Can some one help me with this??
I have attached the Luke result image for your reference.
In this image, document with java 50 times and oracle 6 times moved down to 11 th position.
But this document with java 24 times and oracle 5 times is a top scorer due to the fieldnorm.
Hope i conveyed the info clear... If not please ask me, i ll give more info

You can disable length normalization with Field.setOmitNorms(true)

Algorithm for Team Scheduling - Quiz design

I've got a weird problem to solve - this is to be used in designing a quiz, but it's easiest to explain using teams.
There're 16 teams, and 24 matches. 4 teams play in every match. Each team has to appear once against 12/16 teams and twice against the remaining 3/16, and has to appear exactly 6 times. Any ideas on how to do this? If there's a software that can do this, that'd be great as well.
UPDATE:
I'm not sure if the above is even possible. Here is the minimum we're trying to accomplish:
Number of games is not set.
Each Game has 4 teams.
Each team gets an equal number of games.
Is this possible?

Check this ...
http://en.wikipedia.org/wiki/Round-robin_tournament
I think someone could generalize the algorithm so that applies for more than 2 teams ...
I know this doesn't answer the question but it provides some tip ...
This also may help a little ...
http://en.wikipedia.org/wiki/Tournament_(graph_theory)

Note that each team plays 3 others per match, so it takes at least 5 matches to play all 15 other teams. We hope, then, that there is a solution for 20 matches where each team plays 5 matches and plays each team exactly once.
With 16 teams it's possible to construct a solution by hand in the following way...
Divide the 20 matches into 5 rounds
Number the teams 1 to 16
For each match in turn, for each of the 4 places in that match, allocate the first team which
is still available to play in that round
has not yet played any of the teams already allocated to that match
You can narrow the search for available teams somewhat by noting that each match must contain exactly one team from each match of the previous round, so for place n you need only consider the teams which played match n in the previous round.
If we want 24 matches then any random choice of matches will suffice in the sixth round to fit the original requirements. However, to also ensure that no exact matches are repeated we can switch pairs of teams between the matches in some previous round. That is, if {1,2,3,4} and {5,6,7,8} were matches in some round then in round 6 we'll have {1,2,7,8} and {3,4,5,6}. Since 1 and 2 played each other exactly once in rounds 1-5, in the match {1,2,3,4}, we certainly haven't played match {1,2,7,8} yet.
The choice of data structures to implement that efficiently is left as an exercise for the reader.

Pull out your combinatorics book. I remember these questions as in that scope.

"Combinatorial Designs and Tournaments" was a textbook I had for a course about Combinatorial Designs that had this type of problem. One of my majors back in university was Combinatorics & Optimization, so I do remember a little about this kind of thing.

A little more clarity identifying the problem would be helpful. What type of sport are you trying to schedule. It sounds like you're into a 16 person tennis league and each week 4 individuals players show up on four courts to play a doubles match (players A&B vs C&D). The same is happening on the other three courts with players E thru P. Is this what you're looking for? If so the answer is easy. If not, I still don't understand what you're looking for.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.