Given a set of items, each with a value, determine the number of each item to include in a collection so that the total value is less than or equal to given limit and the total value is as large as possible.
Example:
Product A = 4
Product B = 3
Product C = 2
Product D = 5
If Total Capacity = 10.5 , then the combination of B,C,D will be selected.
If Total Capacity = 12.5 , then the combination of A,B,D will be selected.
If Total Capacity = 17 , then the combination of A,B,C,D will be selected.
I am looking for an algorithm (like knapsack or bin packing) to determine the combination. Any help appreciated.
You say that this is "like knapsack". As far as I can see it is a special case of bounded knapsack problem called the 0-1 knapsack problem.
It is NP-complete.
There are lots of ways you could attempt to solve it. See this related question for one approach:
Writing Simulated Annealing algorithm for 0-1 knapsack in C#
If you only have four items then just testing all possibilities should be fast enough for most purposes.
Related
I got 9 numbers which I want to divide in two lists, and both lists need to reach a certain amount when summed up. For example I got a list of ints:
List<int> test = new List<int>
{
1963000, 1963000, 393000, 86000,
393000, 393000, 176000, 420000,
3193000
};
And I want to have 2 lists of numbers that when you sum them up, they both reach over 4 million.
It doesn't matter if the 2 lists don't have the same amount of numbers. If it only takes 2 numbers to reach 4 million in 1 list, and 7 numbers together reaching 7 million, is fine.
As long as both lists summed up are equal to 4 million or higher.
Is this certain sum low enough to be reached easily?
If yes, then your algorithm may be as simple as: iterate i from 1 to number of items. sum up the first i numbers. if the sum is higher than your certain sum (eg 4 million), then you are finished, else increment i.
BUT: if your certain sums are high and it is not such trivial to find the partition, then you have the famous Partition Probem (https://en.wikipedia.org/wiki/Partition_problem), this is not that simple but there are some algorithms. Read this wikipedia artikle or try to google "Partition problem solution" or similar.
I have a list of entities, and for the purpose of analysis, an entity can be in one of three states. Of course I wish it was only two states, then I could represent that with a bool.
In most cases there will be a list of entities where the size of the list is usually 100 < n < 500.
I am working on analyzing the effects of the combinations of the entities and the states.
So if I have 1 entity, then I can have 3 combinations. If I have two entities, I can have six combinations, and so on.
Because of the amount of combinations, brute forcing this will be impractical (it needs to run on a single system). My task is to find good-but-not-necessarily-optimal solutions that could work. I don't need to test all possible permutations, I just need to find one that works. That is an implementation detail.
What I do need to do is to register the combinations possible for my current data set - this is basically to avoid duplicating the work of analyzing each combination. Every time a process arrives at a certain configuration of combinations, it needs to check if that combo is already being worked at or if it was resolved in the past.
So if I have x amount of tri-state values, what is an efficient way of storing and comparing this in memory? I realize there will be limitations here. Just trying to be as efficient as possible.
I can't think of a more effective unit of storage then two bits, where one of the four "bit states" is not used. But I don't know how to make this efficient. Do I need to make a choice on optimizing for storage size or performance?
How can something like this be modeled in C# in a way that wastes the least amount of resources and still performs relatively well when a process needs to ask "Has this particular combination of tri-state values already been tested?"?
Edit: As an example, say I have just 3 entities, and the state is represented by a simple integer, 1, 2 or 3. We would then have this list of combinations:
111
112
113
121
122
123
131
132
133
211
212
213
221
222
223
231
232
233
311
312
313
321
322
323
331
332
333
I think you can break this down as follows:
You have a set of N entities, each of which can have one of three different states.
Given one particular permutation of states for those N entities, you
want to remember that you have processed that permutation.
It therefore seems that you can treat the N entities as a base-3 number with 3 digits.
When considering one particular set of states for the N entities, you can store that as an array of N bytes where each byte can have the value 0, 1 or 2, corresponding to the three possible states.
That isn't a memory-efficient way of storing the states for one particular permutation, but that's OK because you don't need to store that array. You just need to store a single bit somewhere corresponding to that permutation.
So what you can do is to convert the byte array into a base 10 number that you can use as an index into a BitArray. You then use the BitArray to remember whether a particular permutation of states has been processed.
To convert a byte array representing a base three number to a decimal number, you can use this code:
public static int ToBase10(byte[] entityStates) // Each state can be 0, 1 or 2.
{
int result = 0;
for (int i = 0, n = 1; i < entityStates.Length; n *= 3, ++i)
result += n * entityStates[i];
return result;
}
Given that you have numEntities different entities, you can then create a BitArray like so:
int numEntities = 4;
int numPerms = (int)Math.Pow(numEntities, 3);
BitArray states = new BitArray(numPerms);
Then states can store a bit for each possible permutation of states for all the entities.
Let's suppose that you have 4 entities A, B, C and D, and you have a permutation of states (which will be 0, 1 or 2) as follows: A2 B1 C0 D1. That is, entity A has state 2, B has state 1, C has state 0 and D has state 1.
You would represent that as a boolean array like so:
byte[] permutation = { 2, 1, 0, 1 };
Then you can convert that to a base 10 number like so:
int asBase10 = ToBase10(permutation);
Then you can check if that permutation has been processed like so:
if (!bits[permAsBase10])
{
// Not processed, so process it.
process(permutation);
bits[permAsBase10] = true; // Remember that we processed it.
}
Without getting overly fancy with algorithms and data structures and assuming your tri-state values can be represented in strings and doesn't have a easily determined fix maximum amount. ie. "111", "112", etc (or even "1:1:1", "1:1:2") then a simple SortedSet may end up being fairly efficient.
As a bonus, it doesn't care about the number of values in your set.
SortedSet<string> alreadyTried = new SortedSet<string>();
if(!HasSetBeenTried("1:1:1"){
// do whatever
}
if(!HasSetBeenTried("500:212:100"){
// do whatever
}
public bool HasSetBeenTried(string set){
if(alreadyTried.Contains(set)) return false;
alreadyTried.Add(set);
return true;
}
Simple mathematic says:
3 entities in 3 states makes 27 combinations.
So you need exactly log(27)/log(2) = ~ 4.75 bits to store that information.
Because a pc only can make use of whole bits, you need to "waste" ~0.25 bits and use 5 bits per combination.
The more data you gather, the better you can pack that information, but in the end, maybe a compression algorithm could help even more.
Again: you only asked for memory efficiency, not performance.
In general you can calculate the bits you need by Math.Ceil(Math.Log( noCombinations , 2 )).
I have a problem that I can't solve. There is this table:
I have to optimally allocate 1 million dollars among the five products. I think it looks like knapsack problem but I am not sure. If I want to solve this for what should I look? If it is knapsack how should I change an original knapsack solution to fit mine problem?
To my understanding, the problem described can be solved via dynamic programming in a way very similar to the 0/1 Knapsack Problem. However, the recurrence relation has to be adapted. Instead of considering 2 cases at each stage (namely discarding or taking the respective item), 6 cases have to be considered, which correspond to discarding the item (not investing in the product), taking choice 1 (using investment 1) to taking choice 10 (using investment 10). Although each item will have 11 profit values (for each of the choices), the state space remains two-dimensional (one axis for the minimum weight, i.e. the invested amount, and one axis for the enumeration of items). In more detail, a formulation in pseudocode could be as follows. For ease of presentaion, access outside of the state space is supposed to yield a value of positive infinity.
// Input:
// Values (stored in array v_1,...,v_10)
// Weights (stored in arrays w_1,...w_10)
// Number of distinct items (n)
// Knapsack capacity (W)
for j from 0 to W do:
m[0, j] := 0
for i from 1 to n do:
for j from 0 to W do:
m[i, j] := max(m[i-1, j],
m[i-1, j-w_1[i-1]] + v_1[i-1],
...,
m[i-1, j-w_10[i-1]] + v_10[i-1])
I've been thinking about how to implement something that, frankly, is beyond my mathematical skills. So here goes, feel free to try and point me in the right direction rather than complete code solutions any help I'd be grateful for.
So, imagine I've done an analysis of text and generated a table of the frequencies of different two-character combinations. I've stored these in a 26x26 array.
eg.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A 1 15 (frequency of AA, then frequency of AB etc.)
B 12 0 (freq of BA, BB etc..)
... etc.
So I want to randomly choose these two-character combinations but I'd like to 'weight' my choice based on the frequency. ie. the AB from above should be 15 times 'more likely' than AA. And, obviously, the selection should never return something like BB (ie. a frequency of 0 - in this example, obviously BB does occur in words like Bubble!! :-) ). For the 0 case I realise I could loop until I get a non-0 frequency but that's just not elegant because I have a feeling/intuition that there is a way to skew my average.
I was thinking to chose the first char of my pair - ie. the row - (I'm generating a 4-pair-sequence ultimately) I could just use the system random function (Random class.Next) then use the 'weighted' random algorithm to pick the second char.
Any ideas?
Given your example sample, I would first create a cumulative series of all of the numbers (1, 15, 12, 0 => 1, 16, 28, 28).
Then I would produce a random number between 0 and 27 (let's say 19).
Then I would calculate that 19 was >=16 but <28, giving me bucket 3 (BA).
There are some good suggestions in the other answers for your specific problem. To solve the general problem of "I have a source of random numbers conforming to a uniform probability distribution, but I would like it to conform to a given nonuniform probability distribution", then you can work out the quantile function, which is the function that performs that transformation. I give a gentle introduction that explains why the quantile function is the function you want here:
Generating Random Non-Uniform Data In C#
How about summing all the frequencies and using that from AA to ZZ to generate your pair.
Lets say you have a total frequency of pairs if the rnd return 0 you get AA if it returns 1-14 then its AB etc
Use your frequency matrix to generate a complete set of values. Order the set by Random.Next(). Store the randomized set in an array. Then you can just select an element out if that array based on Random.Next(randomarray.Length).
If there is a mathematical way to calculate the frequency you could do that as well. But creating a precompiled and cached set will reduce the calculation time if this is called repeatedly.
As a note, depending on the max frequency this could require a good amount of storage. You would also want to create the instance of random before you loop to build the set. This is so you don't reseed the random generator.
...
Another way (similar to what you suggested at the end of your question) would be to do this in two passes with the first selecting the row and the second used your weighted frequency to select the column. That would just be the sum of the row frequencies bounded over a ranges. The first suggestion should give a more even distribution based on weight.
Take the sum of the probabilities. Take a random number between zero and that sum. Add up the probabilities until you get it's greater than or equal to your random number. Then use the item your on.
Eg pseudocode:
b = getProbabilites()
s = sum(b)
r = randomInt() % s
i = 0
acc = 0
while (acc < r) {
acc += b[i]
i++
}
return i
If efficiency is not a problem, you could create a key->value hash instead of an array. An upside of this would be that (if you format it well in the text) it would be very easy to update the values should the need arise. Something like
{
AA => 5, AB => 2, AC => 4,
BA => 6, BB => 5, BC => 9,
CA => 2, CB => 7, CC => 8
}
With this, you could easily retrieve the value for the sequence you want, and quickly find the entry to update. If the table is automatically generated and extremely large, it could help to get/be familiar with vim's use of regular expressions.
Looking at another question of mine I realized that technically there is nothing preventing this algorithm from running for an infinite period of time. (IE: It never returns)
Because of the chance that rand.Next(1, 100000); could theoretically keep generating the same value.
Out of curiosity; how would I calculate the probability of this happening? I assume it would be very small?
Code from other question:
Random rand = new Random();
List<Int32> result = new List<Int32>();
for (Int32 i = 0; i < 300; i++)
{
Int32 curValue = rand.Next(1, 100000);
while (result.Exists(value => value == curValue))
{
curValue = rand.Next(1, 100000);
}
result.Add(curValue);
}
On ONE given draw of a random number, the probability of repeating a value readily found in the result list is
P(Collision) = i * 1/100000 where i is the number of values in the list.
That is because all 100,000 possible numbers are assumed to have the same probability of being drawn (assumption of a uniform distribution) and the drawing of any number is independent from that of drawing any other number.
The probability of experiencing such a "collision" with the numbers from the list several several times in a row is
P(n Collisions) = P(Collision) ^ n
where n is the number of times a collision happens
That is because the drawings are independent.
Numerically...
when the list is half full, i = 150 and
P(Collision) = 0.15% = 0.0015 and
P(2 Collisions) = 0.00000225
P(3 Collisions) - 0.000000003375
P(4 Collisions) = 0.0000000000050265
when the list is all full but for the last one, i = 299 and
P(Collision) = 0.299% = 0.00299 and
P(2 Collisions) = 0.0000089401 (approx)
P(3 Collisions) = 0.00000002673 (approx)
P(4 Collisions) = 0.000000000079925 (approx)
You are therefore right to assume that the probability of having to draw multiple times for finding the next suitable value to add to the array is very small, and should therefore not impact the overall performance of the snippet. Beware that there will be a few retries (statistically speaking), but the total number of retries will be small compared to 300.
If however the total number of item desired in the list was to increase much, or if the range of random number sought was to be reduced, P(Collision) would not be so small and hence the number of "retries" needed would grow accordingly. That is why other algorithms exist for drawings multiple values without replacement; most are based on the idea of using the random number as an index into a array of all the remaining values.
Assuming a uniform distribution (not a bad assumption, I believe) the chance of getting the number n times in a row is (0.00001)^n.
It's quite possible for a PRNG to generate the same number in a limited range in consecutive calls. The probability would be a function of the bit-size of the raw PRNG and the method used to reduce that size to the numeric range you want (in this case 1 - 100000).
To answer your question exactly, no, it isn't very small, the probability of it going on for an infinite period of time "is" 0. I say "is" because it actually tends to 0 when the number of iterations tends to infinity.
As bdares said, it will tend to 0 with (1/range)ˆn , with n being the number of iterations, if we can assume an uniform distribution (this says we kinda can).
This program will not halt if:
A random number is picked that is in the result set
That number generates a cycle (i.e. a loop) in the random number generator's algorithm (they all do)
All numbers in the loop are already in the result set
All random number generators eventually loop back on themselves, due to the limited number of integers possible ==> for 32-bit, only 2^32 possible values.
"Good" generators have very large loops. "Poor" algorithms yield short loops for certain values. Consult Knuth's The Art of Computer Programming for random number generators. It is a fascinating read.
Now, assuming there is a cycle of (n) numbers. For your program, which loops 300 times, that means (n) <= 300. Also, the number of attempts you try before you hit on a number in this cycle, plus the length of the cycle, must not be greater than 300. Therefore, assuming the first try you hit on the cycle, then the cycle can be 300 long. If on the second try you hit the cycle, it can only be 299 long.
Assuming that most random number generation algorithms have reasonably-flat probability distribution, the probability of hitting a 300-cycle the first time is (300/2^32), multiplied by the probability of having a 300-cycle (this depends on the rand algorithm), plus the probability of hitting a 299-cycle the first time (299/2^32) x probability of having a 299-cycle, etc. And so on and so forth. Then add up the second try, third try, all the way up to the 300-th try (which can only be a 1-cycle).
Now this is assuming that any number can take on the full 2^32 generator space. If you are limiting it to 100000 only, then in essence you increase the chance of having much shorter cycles, because multiple numbers (in the 2^32 space) can map to the same number in "real" 100000 space.
In reality, most random generator algorithms have minimum cycle lengths of > 300. A random generator implementation based on the simplest LCG (linear congruential generator, wikipedia) can have a "full period" (i.e. 2^32) with the correct choice of parameters. So it is safe to say that minimum cycle lengths are definitely > 300. If this is the case, then it depends on the mapping algorithm of the generator to map 2^32 numbers into 100000 numbers. Good mappers will not create 300-cycles, poor mappers may create short cycles.