Is it possible to split page requests 50 / 50?

Is it possible to split page requests 50 / 50? - c#

I have a landing page that plays a video. The client would like one video for half the users, and a different video for the second half.
I was thinking about generating doing a random number, if it's even then video 1 else video 2. I figure overtime this should end up being roughly 50 / 50. Another approach was setting a Application variable within Application_Start and using that. I was perhaps thinking I could configure something within IIS that I could evaluate when the page is requested.
The site is simple and will be served from a single source.
Is there a way to do this, before I start wasting time throwing things at the wall to see what sticks? I am not sure what to even search for.

You could generate a random number where the options are limited to 2 possible outcomes then use that:
Random random = new Random();
var randomNumber = random.Next(0, 2);
The above will give you either a 0 or a 1 as output which can be used to determine one of your two paths.
I ran this 10,000 times in a little program and it came out pretty even (4948/5052).
As long as your boss isn't worried about EXACTLY even numbers I think this should be OK.
Edit: In my test program I created a new randon in each iteration of the loop, because I felt this better simulated the use-case of an ASP.NET page.

Related

Safe to use random numbers to make filenames unique?

I'm writing a program which basically processes data and outputs many files. There is no way it will be producing more than 10-20 files each use. I just wanted to know if using this method to generate unique filenames is a good idea? is it possible that rand will choose, lets say x and then within 10 instances, choose x again. Is using random(); a good idea? Any inputs will be appreciated!
Random rand = new Random ();
int randNo = rand.Next(100000,999999)l
using (var write = new StreamWriter("C:\\test" + randNo + ".txt")
{
// Stuff
}

I just wanted to know if using this method to generate unique filenames is a good idea?
No. Uniqueness isn't a property of randomness. Random means that the resulting value is not in any way dependent upon previous state. Which means repeats are possible. You could get the same number many times in a row (though it's unlikely).
If you want values which are unique, use a GUID:
Guid.NewGuid();
As pointed out in the comments below, this solution isn't perfect. But I contend that it's good enough for the problem at hand. The idea is that Random is designed to be random, and Guid is designed to be unique. Mathematically, "random" and "unique" are non-trivial problems to solve.
Neither of these implementations is 100% perfect at what it does. But the point is to simply use the correct one of the two for the intended functionality.
Or, to use an analogy... If you want to hammer a nail into a piece of wood, is it 100% guaranteed that the hammer will succeed in doing that? No. There exists a non-zero chance that the hammer will shatter upon contacting the nail. But I'd still reach for the hammer rather than jury rigging something with the screwdriver.

No, this is not correct method to create temporary file names in .Net.
The right way is to use either Path.GetTempFileName (creates file immediatedly) or Path.GetRandomFileName (creates high quality random name).
Note that there is not much wrong with Random, Guid.NewGuid(), DateTime.Now to generate small number of file names as covered in other answers, but using functions that are expected to be used for particular purpose leads to code that is easier to read/prove correctness.

If you want to generate a unique value, there's a tool specifically designed for generating unqiue identifying values, a Globally Unique IDentifier (GUID).
var guid = Guid.NewGuid();
Leave the problem of figuring out the best way of creating such a unique value to others.

There is what is called the Birthday Paradox... If you generate some random numbers (any number > 1), the possibility of encountering a "collision" increases... If you generate sqrt(numberofpossiblevalues) values, the possibility of a collision is around 50%... so you have 799998 possible values... sqrt(799998) is 894... It is quite low... With 45-90 calls to your program you have a 50% chance of a collision.
Note that random being random, if you generate two random numbers, there is a non-zero possibility of a collision, and if you generate numberofpossiblevalues + 1 random numbers, the possibility of a collision is 1.
Now... Someone will tell you that Guid.NewGuid will generate always unique values. They are sellers of very good snake oil. As written in the MSDN, in the Guid.NewGuid page...
The chance that the value of the new Guid will be all zeros or equal to any other Guid is very low.
The chance isn't 0, it is very (very very I'll add) low! Here the Birthday Paradox activates... Now... Microsoft Guid have 122 bits of "random" part and 6 bits of "fixed" part, the 50% chance of a collision happens around 2.3x10^18 . It is a big number! The 1% chance of collision is after 3.27x10^17... still a big number!
Note that Microsoft generates these 122 bits with a strong random number generator: https://msdn.microsoft.com/en-us/library/bb417a2c-7a58-404f-84dd-6b494ecf0d13#id9
Windows uses the cryptographic PRNG from the Cryptographic API (CAPI) and the Cryptographic API Next Generation (CNG) for generation of Version 4 GUIDs.
So while the whole Guid generated by Guid.NewGuid isn't totally random (because 6 bits are fixed), it is still quite random.

I would think it would be a good idea to add in the date & time the file was created in the file name in order to make sure it is not duplicated. You could also add random numbers to this if you want to make it even more unique (in the case your 10 files are saved at the exact same time).
So the files name might be file06182015112300.txt (showing the month, day, year, hour, minute & seconds)

If you want to use files of that format, and you know you won't run out of unused numbers, it's safer to check that the random number you generate isn't already used as follows:
Random rand = new Random();
string filename = "";
do
{
int randNo = rand.Next(100000, 999999);
filename = "C:\\test" + randNo + ".txt";
} while (File.Exists(filename));
using (var write = new StreamWriter(filename))
{
//Stuff
}

How to equally distribute files to users based on file sizes?

I have N files with various file sizes and also M users.
What I want to do is to use an algorithm in C#, C++ or pseudocode that will equally distribute the files to the users.
If file sizes were not in the game it would be something like N/M files per user. So, I could randomly select N/M files for each user (maybe some users could not take part if M > N and no more files were left). But, now I have the file sizes in the game and I want to auto assign the files to users with file sizes in mind.
A file can be related only with one user. So, when a file is related with a user it cannot be used again.
A user can be related with many files.
If files are less than the users (N > M) some users may or many not take part at all.
Also, these cases are possible N < M, M > N and M = N and the algorithm should equally distribute files to users.
If anyone can help me I would appreciate.
Thank you.

If this is homework, it's a stinker!
It's the optimization version of the partition problem, and it's NP-hard (i.e., you're not going to be able to solve it efficiently) even when you have only two users.
There is a greedy algorithm which gives a decent approximation to the optimal arrangement, and does it in O(n log n) time. That is what I would go with if I were you, unless you have a very clear need for perfect optimality. This is the pseudocode, taken from the Wikipedia page I linked to above. It is for two sets (i.e., M=2), but easily generalises. The basic idea is that at each stage, you assign the current file to the user who has the smallest total.
INPUT: A list of integers S
OUTPUT: An attempt at a partition of S into two sets of equal sum
1 function find_partition(S):
2 A ← {}
3 B ← {}
4 sort S in descending order
5 for i in S:
6 if sum(A) <= sum(B)
7 add element i to set A
8 else
9 add element i to set B
10 return {A, B}
Perfect optimality is certainly achievable in principle, but there are two issues to think about.
If nothing else, you could try every possible assignment of files to users. That would be very inefficient, but it's known to be an NP-hard problem, which means that whatever you do, you're going to end up with something with an exponential running time.
It's not absolutely clear what optimal means in a case with more than two users. (It's clear for two, which is why the partition problem is expressed in terms of two.) For instance, suppose you have eight users. Which is the better allocation: [8,4,4,4,4,4,4,0] or [5,5,5,5,3,3,3,3]? You need some well-defined metric that determines the "badness" of an allocation before you can try to minimise it.

Non-Repetitive Random Alphanumeric Code

One of my clients wants to use a unique code for his items (long story..) and he asked me for a solution. The code will consist in 4 parts in which the first one is the zip code where the item is sent from, the second one is the supplier registration number, the third number is the year when the item is sent and the last part is a three division alphanumeric unique character.
As you can see the first three parts are static fields which will never change for the same sender in the same year. So we can say that the last part is the identifier part for that year. This part is 3-division alpahnumeric which means starting from 000 and ending with ZZZ.
The problem is that my client, for some reasonable reasons, wants this part to be not sequential. For example this is not what he wants:
06450-05-2012-000
06450-05-2012-001
06450-05-2012-002
...
06450-05-2012-ZZY
06450-05-2012-ZZZ
The last part should produced randomly like:
06450-05-2012-A17
06450-05-2012-0BF
06450-05-2012-002
...
06450-05-2012-T7W
06450-05-2012-22C
But it should also non-repetitive. So once a possible id is generated the possibility should be discarded from the selection pool.
I am looking for an effective way to do this.
If I only record selected possibilities and check a newly created one against them there is always a worst case possibility that it keeps producing already selected ones, especially near the end.
If I create all possibilities at once and record them in a table or a file it may take a while after every item creation because it will lookup for a non-selected record. By the way 26 letters + 10 digits means 46.656 possible combinations, and there is a chance that there may be a 4th divison added which means 1.679.616 possible combinations.
Is there a more effective way you can suggest? I will use C# for coding and MS SQL for databese..

If it doesn't have to be random, you could maybe simply choose a fixed but "unpredictable" addend which is relatively prime to 26 + 10 == 36 == 2²·3². This means, just choose a fixed addend divisible by neither 2 nor 3.
Then keep adding this fixed number to your previous serial number every time you need a new serial number. This is to be done modulo 46656 (or 1679616) of course.
Mathematics guarantees you won't get the same number twice (before no more "free" numbers are left).
As the addend, you could use const int addend = 26075 since it's 5 modulo 6.

If you expect to create far less than 36^3 entries for each zip-supplier-year tuple, you should probably just pick a random value for the last field and then check to see if it exists, repeating if it does.
Even if you create half of the maximum number of possible entries, new entries still have an expected value of only one failure. Assuming your database is indexed on the overall identifier, this isn't too great a price to pay.
That said, if you expect to use all but a few possible identifiers, then you should probably create all the possible records in advance. It may sounds like a high cost, but each space in memory storing an unused record will eventually store a real record.
I'd expect the first situation is more likely, but if not, or if there's some other combination of the two, please add a comment with some more information and I'll revise my answer.

I think options depend on the amount of the codes that are going to be used:
If you expect to use most of them within a year, then it is better to pre-generate. If done right, lookup should be really fast. And you are going to have 1.679.616 items per year in your DB anyway, so you will have to do such things right.
On the other hand, is it good that you are expecting to use most of them? It may leave you without codes if there are suddenly more items than expected.
If you expect to use only a small amount, then random+existence check might be a way to go, however it is unclear what amount it should be for that to be best (I am pretty sure it is possible to calculate that though).

How to create a probability by a given percentage?

I'm trying to create a percentage-based probability for a game. E.g. if an item has a 45% chance of a critical hit, that must mean it is 45 of 100 hits would be critical.
First, I tried to use a simple solution:
R = new Random();
int C = R.Next(1, 101);
if (C <= ProbabilityPercent) DoSomething()
But in 100 iterations with a chance of e.g. 48%, it gives 40-52 out of 100.
Same goes for 49, 50, 51.
So, there is no difference between these "percents".
The question is how to set a percentage of e.g. 50, and get strictly 50 of 100 with random?
It is a very important thing for probability of rare item finding with an opportunity to increase a chance to find with an item. So the buff of 1% would be sensinble, because now it is not.
Sorry for my bad English.

You need to think only in terms of uniform distribution over repeated rolls.
You can't look over 100 rolls, because forcing that to yield exactly 45 would not be random. Usually, such rolls should exhibit "lack of memory". For example, if you roll a dice looking for a 6, you have a 1-in-6 chance. If you roll it 5 times, and don't get a six - then: the chance of getting a 6 on the next roll is not 1. It is still 1 in 6. As such, you can only look at how well it met your expectation when amortized over a statistically large number of events... 100,000 say.
Basically: your current code is fine. If the user knows (because they've hit 55 times without a critical) that the next 45 hits must be critical, then it is no longer random and they can game the system.
Also; 45% chance of critical hit seems a bit high ;p

I'm trying to create a percentage-based probability for a game. E.g.
if an item has a 45% chance of a critical hit, that must mean it is 45
of 100 hits would be critical.
No, that's not true. You missunderstud completely the concept of Probability. You dont want a "percentage-based probability", you want a "percentage-based random distribution of 100 samples".
What you need is a "bag of events", 45 of them "crytical" and 55 of them "non crytical". Once you pick an event from the bag, you only have the remaining events to pick the next time.
You can model it this way:
Initialize two integer variables Crytical and NonCrytical so that they sum exactly 100 according to the desired percetnages.
Get a random value from 1 to Crytical+NonCrytical
If the random value is less than the value of Crytical, let you hit be crytical and:
Crytical = Crytical -1
Else, let your hit be non-crytical
NonCrytical = NonCrytical-1
End If
Repeat until Crytical+NonCrytical = 0

Since I am no expert in C# I will use a C++ function for ease but still applicable for any language.
rand() - random number generator.
//33.34%
double probability = 0.3334;
//divide to get a number between 0 and 1
double result = rand() / RAND_MAX;
if(result < probability)
//do something
I have used this method to create very large percolated grids and it works excellent for precision values.

The thing is, with Random you might want to initialize this class only once.
This is because Random uses the system time as a seed for generating random numbers.
If your loop is very fast it could happen that multiple Random-instances are using the same seed and thus generating the same numbers.
Check the generated numbers if you suspect this is happening.
Besides this, inherent to Randomness is that it won't give you exact results.
This means that even with a 50/50 chance it could happen that a sequence of 100 "heads" or "tails" gives "heads" 100 times.
The only thing you can do is create the Random-instance once and live with the results; otherwise you shouldn't use Random.

Compare two spectogram to find the offset where they match algorithm

I record a daily 2 minutes radio broadcast from Internet. There's always the same starting and ending jingle. Since the radio broadcast exact time may vary from more or less 6 minutes I have to record around 15 minutes of radio.
I wish to identify the exact time where those jingles are in the 15 minutes record, so I can extract the portion of audio I want.
I already started a C# application where I decode an MP3 to PCM data and convert the PCM data to a spectrogram based on http://www.codeproject.com/KB/audio-video/SoundCatcher.aspx
I tried to use a Cross Correlation algorithm on the PCM data but the algorithm is very slow around 6 minutes with a step of 10ms and is some occasion it fail to find the jingle start time.
Any ideas of algorithms to compare two spectrogram for match? Or a better way to find that jingle start time?
Thanks,
Update, sorry for the delay
First, thank for all the anwsers most of them were relevent and or interresting ideas.
I tried to implement the Shazam algorithm proposed by fonzo. But failed to detect the peaks in the spectrogram. Here's three spectrograms of the starting jingle from three different records. I tried AForge.NET with the blob filter (but it failed to identify peaks), to blur the image and check for difference in height, the Laplace convolution, slope analysis, to detect the series of vertical bars (but there was too many false positive)...
In the mean while, I tried the Hough algorithm proposed by Dave Aaron Smith. Where I calculate the RMS of each columns. Yes yes each columns, it's a O(N*M) but M << N (Notice a column is around 8k of sample). So in the overall it's not that bad, still the algorithm take about 3 minutes, but has never fail.
I could go with that solution, but if possible, I would prefer the Shazam cause it's O(N) and probably much faster (and cooler also). So does any of you have an idea of an algorithm to always detect the same points in those spectrograms (doesn't have to be peaks), thanks to add a comment.
New Update
Finally, I went with the algorithm explained above, I tried to implement the Shazam algorithm, but failed to find proper peaks in the spectrogram, the identified points where not constant from one sound file to another. In theory, the Shazam algorithm is the solution for that kind of problem. The Hough algorithm proposed by Dave Aaron Smith was more stable and effective. I split around 400 files, and only 20 of them fail to split properly. Disk space when from 8GB to 1GB.
Thanks, for your help.

There's a description of the algorithm used by the shazam service (which identifies a music given a short possibly noisy sample) here : http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
From what I understood, the first thing done is to isolate peaks in the spectrogram (with some tweaks to assure an uniform coverage), which will give a "constellation" of pair of values (time;frequency) from the initial spectrogram. Once done, the sample constellation is compared to the constellation of the full track by translating a window of the sample length from the beginning to the end and counting the number of correlated points.
The paper then describes the technical solution they found to be able to do the comparison fast even with a huge collection of tracks.

I wonder if you could use a Hough transform. You would start by cataloging each step of the opening sequence. Let's say you use 10 ms steps and the opening sequence is 50 ms long. You compute some metric on each step and get
1 10 1 17 5
Now go through your audio and analyze each 10 ms step for the same metric. Call this array have_audio
8 10 8 7 5 1 10 1 17 6 2 10...
Now create a new empty array that's the same length as have_audio. Call it start_votes. It will contain "votes" for the start of the opening sequence. If you see a 1, you may be in the 1st or 3rd step of the opening sequence, so you have 1 vote for the opening sequence starting 1 step ago and 1 vote for the opening sequence starting 3 steps ago. If you see a 10, you have 1 vote for the opening sequence starting 2 steps ago, a 17 votes for 4 step ago, and so on.
So for that example have_audio, your votes will look like
2 0 0 1 0 4 0 0 0 0 0 1 ...
You have a lot of votes at position 6, so there's a good chance the opening sequence starts there.
You could improve performance by not bothering to analyze the entire opening sequence. If the opening sequence is 10 seconds long, you could just search for the first 5 seconds.

Here is a good python package that does just this:
https://code.google.com/p/py-astm/
If you are looking for a specific algorithm, good search terms to use are "accoustic fingerprinting" or "perceptual hashing".
Here's another python package that could also be used:
http://rudd-o.com/new-projects/python-audioprocessing/documentation/manuals/algorithms/butterscotch-signatures

If you already know the jingle sequence, you could analyse the correlation with the sequence instead of the cross correlation between the full 15 minutes tracks.
To quickly calculate the correlation against the (short) sequence, I would suggest using a Wiener filter.
Edit: a Wiener filter is a way to locate a signal in a sequence with noise. In this application, we are considering anything that is "not jingle" as noise (question for the reader: can we still assume that the noise is white and not correlated?).
( I found the reference I was looking for! The formulas I remembered were a little off and I'll remove them now)
The relevant page is Wiener deconvolution. The idea is that we can define a system whose impulse response h(t) has the same waveform as the jingle, and we have to locate the point in a noisy sequence where the system has received an impulse (i.e.: emitted a jingje).
Since the jingle is known, we can calculate its power spectrum H(f), and since we can assume that a single jingle appears in a recorded sequence, we can say that the unknown input x(t) has the shape of a pulse, whose power density S(f) is constant at each frequency.
Given the knowledges above, you can use the formula to obtain a "jingle-pass" filter (as in, only signals shaped like the jingle can pass) whose output is highest when the jingle is played.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.