'Normalizing' Probability Array - c#

I have an array of doubles representing probabilities of certain events happening, [25,25,25,10,15] for event A,B..E. The numbers add up to 100.
Through my analysis I want to be able to cross off the possibility of a certain event happening or not.
So, If I find that event E is impossible, then I set that index to 0.
How do I re-normalize the array so that the total adds up to 100, and the relative probability of each event is maintained?
I will use C# or Java.

You can re-distribute the probability of the removed event in many ways, for example
Proportionally - Replace each number with the value p/sum(p), where sum(p) is the sum of remaining probabilities.
Equally - Divide the removed value by the number of remaining items, and add the result to each of the remaining probability.
The "correct" answer depends on the specifics of your problem.

I'm assuming the probabilities will stay the same (i.e the first double's new probably should be 25/85 in your example). Then what you need to do is save the double you've removed, subtract 100 by it and divide each double by it, and multiply by 100.
Java code:
double[] doubleArray = new double[] { 25, 25 ,25, 10, 15 }; //or any other array
double temp = 100-doubleArray[4]; //or any other specific value in the array
doubleArray[4] = 0;
for (int i; i<doubleArray.length; i++) {
doubleArray[i] = doubleArray[i]/temp*100;
}

sum = 100 - P(E)
for each event e:
if e is E:
P(E) = 0
else:
P(e) = P(e)*100/sum

A more general solution is to normalise an array to give it a specific total.
public static void setTotal(double total, double[] values) {
double sum = 0;
for (double value : values) sum += value;
double factor = total / sum;
for (int i = 0; i < values.length; i++)
values[i] *= factor;
}
This allows you to alter any value and ensure the total is whatever you want (provided the sum is not zero)

Related

how to count the total amount of digits in a byte[] if you were to sum the array

i want to make a function that counts the amount of digits once the value is sumed up
lets say i have this array
byte[] array = new byte[] { 200, 100, 200, 250, 150, 100, 200 };
once this is sumed up you'll have a value of 1200
you can get the amount of digits with these functions
Math.Floor(Math.Log10(1200)+1) // 4
but if i sum it up and there are too many values in the array i get an integer overflow
public decimal countDigits(byte[] array)
{
decimal count = array[0];
for (int i = 1; i < array.Length; i++)
{
count = Math.Log10(Math.Pow(count, 10)+array[i]);
}
return count;
}
this does give the correct output i want but this causes a integeroverflow if the count is greater than 28.898879583742193 (log10(decimal.MaxValue))
Let's put a simple question: how many bytes should we sum in order to get integer overflow with long? The answer is simple: in the worst case (all bytes are of maximum possible value) it requires
long.MaxValue / byte.MaxValue + 1 = 36170086419038337 (~3.61e16) bytes
How long are we going to sum? Even if it requires just 1 cpu tick (~ 0.1 ns) to get item from array and sum we require
~3.6e6 seconds which is 41 day (or 82 days in case of ulong). If it's not your case (note, that array in C# can't have more than 2.1e9 items when we want at least 3.6e16), then you can just sum as long (or ulong):
public static int countNumbers(byte[] array) {
ulong sum = 0;
foreach (byte item in array)
sum += item;
// How many digits do we have?
return sum.ToString().Length;
}
i want to make a function that counts the amount of digits once the value is sumed up
public decimal countNumbers(byte[] array)
{
// sum the values...
decimal sum = 0;
foreach (byte value in array)
{
sum += value
}
// ... then "count" the digits.
return Math.Floor(Math.Log10(sum)+1);
}
This would be the code then, however your Question, the provided example and the naming of your method all imply different things so see if that helps, and if it does, please work on your naming.
There's a BigInteger data type that can sum up arbitrary (integer) values. It even has a Log10 method, so it works very similar to a standard integer variable. The only limitation is that the result of BigInteger.Log10 must be smaller than Double.MaxValue, but that sounds like a reasonable limitation. (10^1E308=10^10^308 is a really huge number)

Adding int numbers?

public static int addIntNumbers()
{
int input = int.Parse(Console.ReadLine());
int sum = 0;
while (input !=0)
{
sum += input % 10;
input /= 10;
Console.WriteLine(sum);
}
return sum;
}
I don't understand this syntax: After the while condition, sum += input % 10, which basically means sum = sum(which is 0) + input % 10, so lets say I input 24 so the sum of this should be 4, I think ?
And then the second line which I have no idea what it is even doing.
Any suggestions ?
The best way might be to add comments. However I can already tell that whoever wrote this, did not know what he as doing. For starters, there were not comments, the naming is abysmal and the I/O is actually handeled inside the function.
//The name is not right. This is not a proper sum function
//I think it is getting a sum of all digits in the input
public static int addIntNumbers()
{
//Get a input from the user, parse it to int
//That really belons outside. Just the int in per argument
int input = int.Parse(Console.ReadLine());
//Initialize sum to 0
int sum = 0;
//Input is also used as sort of "running variable".
//The loop will break if input reaches 0
while (input !=0)
{
//sum = sum + input % 10
//It tries to divide input by 10, get's the rest, then adds that rest to sum
sum += input % 10;
//Divide input by 10. Note that all decimal parts will be dropped
//That means it will reach 0 invariably
input /= 10;
//Output the current sum for debugging
Console.WriteLine(sum);
}
//The function returns
return sum;
}
Your code calculates digit-by-digit sum of an integer (the sum is positive if input is positive, negative if input is negative).
If you are a C# beginner, this might help you:
while (input !=0)
{
sum = sum + (input % 10); //sum = sum + remainder of division by ten (separation of least significant digit)
input = input / 10; //input is integer-divided by ten, which results in discarding of the least significant digit
Console.WriteLine(sum);
}
If you don't understand, get familiar with a difference between
4/6
and 4.0/6.
The first one is integer division, the other is floating point division.
Some things to help you understand what's going on here:
First, assuming you're in Visual Studio, you can set a break point in your code by clicking to the left of the line number, in the margin. A red dot will show up and when your code hits that point, it will pause. While paused, you can look at the "Locals" tab or hover over variable names in your code to see what values are at that point in time. You can then use F10 to step forward one line at a time and see how things change.
Second, the /= operator is similar to the += operator, except with division. So, "x /= 10" is exactly the same as "x = x / 10".
This program is adding up each digit of the number you type in by getting the ones digit, adding it to sum, then dividing the number by 10 to get rid of the old ones digit.

Calculate number of distinct values between two numbers at a given precision

Context: I am building a random-number generating user interface where a user can enter values for the following:
lowerLimit: the lower limit for each randomly generated number
upperLimit: the upper limit for each randomly generated number
maxPrecision: the maximum precision each randomly generated number
Quantity: the maximum number of random number values to be generated
The question is: how can I ensure that at a given lowerLimit/upperLimit range and at a given precision, that the user does not request a greater quantity than is possible?
Example:
lowerLimit: 1
upperLimit: 1.01
maxPrecision: 3
Quantity: 50
At this precision level (3), there are 11 possible values between 1 and 1.01: 1.000, 1.001, 1.002, 1.003, 1.004, 1.005, 1.006, 1.007, 1.008, 1.009, 1.100, yet the user is asking for the top 50.
In one version of the function that returns only distinct values that match user criteria, I am using a dictionary object to store already-generated values and if the value already exists, try another random number until I have found X distinct random number values where X is the user-desired quantity. The problem is, my logic allows for a never-ending loop if the number of possible values is less than the user-entered quantity.
While I could probably employ logic to detect runaway condition, I thought it would be a nicer approach to somehow calculate the quantity of possible return values in advance to make sure it is possible. But that logic is eluding me. (Haven't tried anything because I can't think of how to do it).
Please note: I did see question Generating random, unique values C# but is does not address the specifics of my question relating to number of possible values at a given precision and subsequent runaway condition.
private Random RandomSeed = new Random();
public double GetRandomDouble(double lowerBounds, double upperBounds, int maxPrecision)
{
//Return a randomly-generated double between lowerBounds and upperBounds
//with maximum precision of maxPrecision
double x = (RandomSeed.NextDouble() * ((upperBounds - lowerBounds))) + lowerBounds;
return Math.Round(x, maxPrecision);
}
public double[] GetRandomDoublesUnique(double lowerBounds, double upperBounds, int maxPrecision, int quantity)
{
//This method returns an array of doubles containing randomly-generated numbers
//between user-entered lowerBounds and upperBounds with a maximum precision of
//maxPrecision. The array size is capped at user-entered quantity.
//Create Dictionary to store number values already generated so we can ensure
//we don't have duplicates
Dictionary<double, int> myDoubles = new Dictionary<double, int>();
double[] returnValues = new double[quantity];
double nextValue;
for (int i = 0; i < quantity; i++)
{
nextValue = GetRandomDouble(lowerBounds, upperBounds, maxPrecision);
if (!myDoubles.ContainsKey(nextValue))
{
myDoubles.Add(nextValue, i);
returnValues[i] = nextValue;
}
else
{
i -= 1;
}
}
return returnValues;
}
Number of items can be computed by just subtracting "position" of first from last (pseudo-code below, use Math.Pow to compute 10^x):
(int)(last * 10 ^ precision) - (int)(first * 10 ^ precision)
This may need to be adjusted depending on whether you want boundaries and whether you take decimal (precise) or float/double as input - some +/-1 and Math.Round may need to be sprinkled in to get desired results for all expected values.
After you get number of items there are essentially two cases
there are significantly more choices that desired results (i.e. 1 to 100, take 5 random numbers) - use code you have to filter out duplicates.
there the number of choices is close or less than desired number of results (i.e. 1 to 10, return 11 random numbers) - pre-generate the list of all value and shuffle.
Experiment with the boundary between "significantly more" and "close" - I'd use 25% as boundary ( i.e. 1 to 100, take 76 - use shuffling) to avoid excessive retires close to the end (which is exact reason of slowness/infinite retries of basic approach).
Correct implementation of shuffle is in Randomize a List<T> (check out similar posts like Generating random, unique values C# for more discussion).
The easiest way would probably be to convert the values to integers by multiplying them by 10 ^ precision and then subtract
int lowerInt = (int)(lower * (decimal)Math.Pow(10, precision));
int higherInt = (int)(higher * (decimal)Math.Pow(10, precision));
int possibleValues = higherInt - lowerInt + 1
I feel like it would defeat the purpose of you project to require the user to know how many possible values there are in advance, since it seems like thats what they are hitting this function for in the first place. I'm assuming that requirement was just to alleviate the technical issues you were having. You can just change your loop to this now
for (int i = 0; i < possibleValues; i++)
This is what worked based on Josh Williard's answer.
public double[] GetRandomDoublesUnique(double lowerBounds, double upperBounds, int maxPrecision, int quantity)
{
if (lowerBounds >= upperBounds)
{
throw new Exception("Error in GetRandomDoublesUnique is: LowerBounds is greater than UpperBounds!");
}
//These next few lines are for the purpose of determining the maximum possible number of return values
//possibleValues is populated to prevent a runaway condition that could occurs if the
//max possible values--at the given precision level--is less than the user-selected quantity.
//i.e. if user selects 1 to 1.01, precision of 3, and quantity of 50, there would be a problem
// if we didn't limit loop to the 11 possible values at precision of 3:
//1.000, 1.001, 1.002, 1.003, 1.004, 1.005, 1.006, 1.007, 1.008, 1.009, 1.010
int lowerInt = (int)(lowerBounds * (double)Math.Pow(10, maxPrecision));
int higherInt = (int)(upperBounds * (double)Math.Pow(10, maxPrecision));
int possibleValues = higherInt - lowerInt + 1;
//Create Dictionary to store number values already generated so we can ensure
//we don't have duplicates
Dictionary<double, int> myDoubles = new Dictionary<double, int>();
double[] returnValues = new double[(quantity>possibleValues?possibleValues:quantity)];
double NextValue;
//Iterate through and generate values--limiting to both the user-selected quantity and # of possible values
for (int i = 0; (i < quantity)&&(i<possibleValues); i++)
{
NextValue = GetRandomDouble(lowerBounds, upperBounds, maxPrecision);
if (!myDoubles.ContainsKey(NextValue))
{
myDoubles.Add(NextValue, i);
returnValues[i] = NextValue;
}
else
{
i -= 1;
}
}
return returnValues;
}

How do I generate random number between 0 and 1 in C#?

I want to get the random number between 1 and 0. However, I'm getting 0 every single time. Can someone explain me the reason why I and getting 0 all the time?
This is the code I have tried.
Random random = new Random();
int test = random.Next(0, 1);
Console.WriteLine(test);
Console.ReadKey();
According to the documentation, Next returns an integer random number between the (inclusive) minimum and the (exclusive) maximum:
Return Value
A 32-bit signed integer greater than or equal to minValue and less than maxValue; that is, the range of return values includes minValue but not maxValue. If minValue equals maxValue, minValue is returned.
The only integer number which fulfills
0 <= x < 1
is 0, hence you always get the value 0. In other words, 0 is the only integer that is within the half-closed interval [0, 1).
So, if you are actually interested in the integer values 0 or 1, then use 2 as upper bound:
var n = random.Next(0, 2);
If instead you want to get a decimal between 0 and 1, try:
var n = random.NextDouble();
You could, but you should do it this way:
double test = random.NextDouble();
If you wanted to get random integer ( 0 or 1), you should set upper bound to 2, because it is exclusive
int test = random.Next(0, 2);
Every single answer on this page regarding doubles is wrong, which is sort of hilarious because everyone is quoting the documentation. If you generate a double using NextDouble(), you will not get a number between 0 and 1 inclusive of 1, you will get a number from 0 to 1 exclusive of 1.
To get a double, you would have to do some trickery like this:
public double NextRandomRange(double minimum, double maximum)
{
Random rand = new Random();
return rand.NextDouble() * (maximum - minimum) + minimum;
}
and then call
NextRandomRange(0,1 + Double.Epsilon);
Seems like that would work, doesn't it? 1 + Double.Epsilon should be the next biggest number after 1 when working with doubles, right? This is how you would solve the problem with ints.
Wellllllllllllllll.........
I suspect that this will not work correctly, since the underlying code will be generating a few bytes of randomness, and then doing some math tricks to fit it in the expected range. The short answer is that Logic that applies to ints doesn't quite work the same when working with floats.
Lets look, shall we? (https://referencesource.microsoft.com/#mscorlib/system/random.cs,e137873446fcef75)
/*=====================================Next=====================================
**Returns: A double [0..1)
**Arguments: None
**Exceptions: None
==============================================================================*/
public virtual double NextDouble() {
return Sample();
}
What the hell is Sample()?
/*====================================Sample====================================
**Action: Return a new random number [0..1) and reSeed the Seed array.
**Returns: A double [0..1)
**Arguments: None
**Exceptions: None
==============================================================================*/
protected virtual double Sample() {
//Including this division at the end gives us significantly improved
//random number distribution.
return (InternalSample()*(1.0/MBIG));
}
Ok, starting to get somewhere. MBIG btw, is Int32.MaxValue(2147483647 or 2^31-1), making the division work out to:
InternalSample()*0.0000000004656612873077392578125;
Ok, what the hell is InternalSample()?
private int InternalSample() {
int retVal;
int locINext = inext;
int locINextp = inextp;
if (++locINext >=56) locINext=1;
if (++locINextp>= 56) locINextp = 1;
retVal = SeedArray[locINext]-SeedArray[locINextp];
if (retVal == MBIG) retVal--;
if (retVal<0) retVal+=MBIG;
SeedArray[locINext]=retVal;
inext = locINext;
inextp = locINextp;
return retVal;
}
Well...that is something. But what is this SeedArray and inext crap all about?
private int inext;
private int inextp;
private int[] SeedArray = new int[56];
So things start to fall together. Seed array is an array of ints that is used for generating values from. If you look at the init function def, you see that there is a whole lot of bit addition and trickery being done to randomize an array of 55 values with initial quasi-random values.
public Random(int Seed) {
int ii;
int mj, mk;
//Initialize our Seed array.
//This algorithm comes from Numerical Recipes in C (2nd Ed.)
int subtraction = (Seed == Int32.MinValue) ? Int32.MaxValue : Math.Abs(Seed);
mj = MSEED - subtraction;
SeedArray[55]=mj;
mk=1;
for (int i=1; i<55; i++) { //Apparently the range [1..55] is special (All hail Knuth!) and so we're skipping over the 0th position.
ii = (21*i)%55;
SeedArray[ii]=mk;
mk = mj - mk;
if (mk<0) mk+=MBIG;
mj=SeedArray[ii];
}
for (int k=1; k<5; k++) {
for (int i=1; i<56; i++) {
SeedArray[i] -= SeedArray[1+(i+30)%55];
if (SeedArray[i]<0) SeedArray[i]+=MBIG;
}
}
inext=0;
inextp = 21;
Seed = 1;
}
Ok, going back to InternalSample(), we can now see that random doubles are generated by taking the difference of two scrambled up 32 bit ints, clamping the result into the range of 0 to 2147483647 - 1 and then multiplying the result by 1/2147483647. More trickery is done to scramble up the list of seed values as it uses values, but that is essentially it.
(It is interesting to note that the chance of getting any number in the range is roughly 1/r EXCEPT for 2^31-2, which is 2 * (1/r)! So if you think some dumbass coder is using RandNext() to generate numbers on a video poker machine, you should always bet on 2^32-2! This is one reason why we don't use Random for anything important...)
so, if the output of InternalSample() is 0 we multiply that by 0.0000000004656612873077392578125 and get 0, the bottom end of our range. if we get 2147483646, we end up with 0.9999999995343387126922607421875, so the claim that NextDouble produces a result of [0,1) is...sort of right? It would be more accurate to say it is int he range of [0,0.9999999995343387126922607421875].
My suggested above solution would fall on its face, since double.Epsilon = 4.94065645841247E-324, which is WAY smaller than 0.0000000004656612873077392578125 (the amount you would add to our above result to get 1).
Ironically, if it were not for the subtraction of one in the InternalSample() method:
if (retVal == MBIG) retVal--;
we could get to 1 in the return values that come back. So either you copy all the code in the Random class and omit the retVal-- line, or multiply the NextDouble() output by something like 1.0000000004656612875245796924106 to slightly stretch the output to include 1 in the range. Actually testing that value gets us really close, but I don't know if the few hundred million tests I ran just didn't produce 2147483646 (quite likely) or there is a floating point error creeping into the equation. I suspect the former. Millions of test are unlikely to yield a result that has 1 in 2 billion odds.
NextRandomRange(0,1.0000000004656612875245796924106); // try to explain where you got that number during the code review...
TLDR? Inclusive ranges with random doubles is tricky...
You are getting zero because Random.Next(a,b) returns number in range [a, b), which is greater than or equal to a, and less than b.
If you want to get one of the {0, 1}, you should use:
var random = new Random();
var test = random.Next(0, 2);
Because you asked for a number less than 1.
The documentation says:
Return Value
A 32-bit signed integer greater than or equal to minValue and less than maxValue; that is, the range of return values
includes minValue but not maxValue. If minValue equals maxValue,
minValue is returned.
Rewrite the code like this if you are targeting 0.0 to 1.0
Random random = new Random();
double test = random.NextDouble();
Console.WriteLine(test);
Console.ReadKey();

Series calculation

I have some random integers like
99 20 30 1 100 400 5 10
I have to find a sum from any combination of these integers that is closest(equal or more but not less) to a given number like
183
what is the fastest and accurate way of doing this?
If your numbers are small, you can use a simple Dynamic Programming(DP) technique. Don't let this name scare you. The technique is fairly understandable. Basically you break the larger problem into subproblems.
Here we define the problem to be can[number]. If the number can be constructed from the integers in your file, then can[number] is true, otherwise it is false. It is obvious that 0 is constructable by not using any numbers at all, so can[0] is true. Now you try to use every number from the input file. We try to see if the sum j is achievable. If an already achieved sum + current number we try == j, then j is clearly achievable. If you want to keep track of what numbers made a particular sum, use an additional prev array, which stores the last used number to make the sum. See the code below for an implementation of this idea:
int UPPER_BOUND = number1 + number2 + ... + numbern //The largest number you can construct
bool can[UPPER_BOUND + 1]; //can[number] is true if number can be constructed
can[0] = true; //0 is achievable always by not using any number
int prev[UPPER_BOUND + 1]; //prev[number] is the last number used to achieve sum "number"
for (int i = 0; i < N; i++) //Try to use every number(numbers[i]) from the input file
{
for (int j = UPPER_BOUND; j >= 1; j--) //Try to see if j is an achievable sum
{
if (can[j]) continue; //It is already an achieved sum, so go to the next j
if (j - numbers[i] >= 0 && can[j - numbers[i]]) //If an (already achievable sum) + (numbers[i]) == j, then j is obviously achievable
{
can[j] = true;
prev[j] = numbers[i]; //To achieve j we used numbers[i]
}
}
}
int CLOSEST_SUM = -1;
for (int i = SUM; i <= UPPER_BOUND; i++)
if (can[i])
{
//the closest number to SUM(larger than SUM) is i
CLOSEST_SUM = i;
break;
}
int currentSum = CLOSEST_SUM;
do
{
int usedNumber = prev[currentSum];
Console.WriteLine(usedNumber);
currentSum -= usedNumber;
} while (currentSum > 0);
This seems to be a Knapsack-like problem, where the value of your integers would be the "weight" of each item, the "profit" of each item is 1, and you are looking for the least number of items to exactly sum to the maximum allowable weight of the knapsack.
This is a variant of the SUBSET-SUM problem, and is also NP-Hard like SUBSET-SUM.
But if the numbers involved are small, pseudo-polynomial time algorithms exist. Check out:
http://en.wikipedia.org/wiki/Subset_sum_problem
Ok More details.
The following problem:
Given an array of integers, and integers a,b, is there
some subset whose sum lies in the
interval [a,b] is NP-Hard.
This is so because we can solve subset-sum by choosing a=b=0.
Now this problem easily reduces to your problem and so your problem is NP-Hard too.
Now you can use the polynomial time approximation algorithm mentioned in the wiki link above.
Given an array of N integers, a target S and an approximation threshold c,
there is a polynomial time approximation algorithm (involving 1/c) which tells if there is a subset sum in the interval [(1-c)S, S].
You can use this repeatedly (by some form of binary search) to find the best approximation to S you need. Note you can also use this on intervals of the from [S, (1+c)S], while the knapsack will only give you a solution <= S.
Of course there might be better algorithms, in fact I can bet on it. There should be plenty of literature on the web. Some search terms you can use: approximation algorithms for subset-sum, pseudo-polynomial time algorithms, dynamic programming algorithm etc.
A simple-brute-force-method would be to read the text in, parse it into numbers, and then go through all combinations until you find the required sum.
A quicker solution would be to sort the numbers, then...
Add the largest number to your sum, Is it too big? if so, take it off and try the next smallest.
if the sum is too small, add the next largest number and repeat.
Continue adding numbers not letting the sum exceed the target. Finish when you hit the target.
Note that when you backtrack, you may need to back track more than one level. Sounds like a good case for recursion...
If the numbers are large you can turn this into an Integer Programme. Using Mathematicas solver, it might look something like this
nums = {99, 20, 30 , 1, 100, 400, 5, 10};
vars = a /# Range#Length#nums;
Minimize[(vars.nums - 183)^2, vars, Integers]
You can sort the list of values, find the first value that's greater than the target, and start concentrating on the values that are less than the target. Find the sum that's closest to the target without going over, then compare that to the first value greater than the target. If the difference between the closest sum and the target is less than the difference between the first value greater than the target and the target, then you have the sum that's closest.
Kinda hokey, but I think the logic hangs together.

Categories