I have some random integers like
99 20 30 1 100 400 5 10
I have to find a sum from any combination of these integers that is closest(equal or more but not less) to a given number like
183
what is the fastest and accurate way of doing this?
If your numbers are small, you can use a simple Dynamic Programming(DP) technique. Don't let this name scare you. The technique is fairly understandable. Basically you break the larger problem into subproblems.
Here we define the problem to be can[number]. If the number can be constructed from the integers in your file, then can[number] is true, otherwise it is false. It is obvious that 0 is constructable by not using any numbers at all, so can[0] is true. Now you try to use every number from the input file. We try to see if the sum j is achievable. If an already achieved sum + current number we try == j, then j is clearly achievable. If you want to keep track of what numbers made a particular sum, use an additional prev array, which stores the last used number to make the sum. See the code below for an implementation of this idea:
int UPPER_BOUND = number1 + number2 + ... + numbern //The largest number you can construct
bool can[UPPER_BOUND + 1]; //can[number] is true if number can be constructed
can[0] = true; //0 is achievable always by not using any number
int prev[UPPER_BOUND + 1]; //prev[number] is the last number used to achieve sum "number"
for (int i = 0; i < N; i++) //Try to use every number(numbers[i]) from the input file
{
for (int j = UPPER_BOUND; j >= 1; j--) //Try to see if j is an achievable sum
{
if (can[j]) continue; //It is already an achieved sum, so go to the next j
if (j - numbers[i] >= 0 && can[j - numbers[i]]) //If an (already achievable sum) + (numbers[i]) == j, then j is obviously achievable
{
can[j] = true;
prev[j] = numbers[i]; //To achieve j we used numbers[i]
}
}
}
int CLOSEST_SUM = -1;
for (int i = SUM; i <= UPPER_BOUND; i++)
if (can[i])
{
//the closest number to SUM(larger than SUM) is i
CLOSEST_SUM = i;
break;
}
int currentSum = CLOSEST_SUM;
do
{
int usedNumber = prev[currentSum];
Console.WriteLine(usedNumber);
currentSum -= usedNumber;
} while (currentSum > 0);
This seems to be a Knapsack-like problem, where the value of your integers would be the "weight" of each item, the "profit" of each item is 1, and you are looking for the least number of items to exactly sum to the maximum allowable weight of the knapsack.
This is a variant of the SUBSET-SUM problem, and is also NP-Hard like SUBSET-SUM.
But if the numbers involved are small, pseudo-polynomial time algorithms exist. Check out:
http://en.wikipedia.org/wiki/Subset_sum_problem
Ok More details.
The following problem:
Given an array of integers, and integers a,b, is there
some subset whose sum lies in the
interval [a,b] is NP-Hard.
This is so because we can solve subset-sum by choosing a=b=0.
Now this problem easily reduces to your problem and so your problem is NP-Hard too.
Now you can use the polynomial time approximation algorithm mentioned in the wiki link above.
Given an array of N integers, a target S and an approximation threshold c,
there is a polynomial time approximation algorithm (involving 1/c) which tells if there is a subset sum in the interval [(1-c)S, S].
You can use this repeatedly (by some form of binary search) to find the best approximation to S you need. Note you can also use this on intervals of the from [S, (1+c)S], while the knapsack will only give you a solution <= S.
Of course there might be better algorithms, in fact I can bet on it. There should be plenty of literature on the web. Some search terms you can use: approximation algorithms for subset-sum, pseudo-polynomial time algorithms, dynamic programming algorithm etc.
A simple-brute-force-method would be to read the text in, parse it into numbers, and then go through all combinations until you find the required sum.
A quicker solution would be to sort the numbers, then...
Add the largest number to your sum, Is it too big? if so, take it off and try the next smallest.
if the sum is too small, add the next largest number and repeat.
Continue adding numbers not letting the sum exceed the target. Finish when you hit the target.
Note that when you backtrack, you may need to back track more than one level. Sounds like a good case for recursion...
If the numbers are large you can turn this into an Integer Programme. Using Mathematicas solver, it might look something like this
nums = {99, 20, 30 , 1, 100, 400, 5, 10};
vars = a /# Range#Length#nums;
Minimize[(vars.nums - 183)^2, vars, Integers]
You can sort the list of values, find the first value that's greater than the target, and start concentrating on the values that are less than the target. Find the sum that's closest to the target without going over, then compare that to the first value greater than the target. If the difference between the closest sum and the target is less than the difference between the first value greater than the target and the target, then you have the sum that's closest.
Kinda hokey, but I think the logic hangs together.
Related
Consider the following interface that describes a continuous range of integer values.
public interface IRange {
int Minimum { get;}
int Maximum { get;}
IRange LargestOverlapRange(IEnumerable<IRange> ranges);
}
I am looking for an efficient algorithm to find the largest overlap range given a list of IRange objects. The idea is briefly outlined in the following diagram. Where the top numbers represent the integer values, and the |-----| represent the IRange objects with a min and max value. I stacked the IRange objects so that the solution is easy to visualize.
0123456789 ... N
|-------| |------------| |-----|
|---------| |---|
|---| |------------|
|--------| |---------------|
|----------|
Here, the LargestOverlapRange method would return:
|---|
Since that range has a total of 4 'overlaps'. If there are two separate IRange with the same number of overlaps, I want to return null.
Here is some brief code of what I tried.
public class Range : IRange
{
public IRange LargestOverlapRange(IEnumerable<IRange> ranges) {
int maxInt = 20000;
// Create a histogram of the counts
int[] histogram = new int[maxInt];
foreach(IRange range in ranges) {
for(int i=range.Minimum; i <= range.Maximum; i++) {
histogram[i]++;
}
}
// Find the mode of the histogram
int mode = 0;
int bin = 0;
for(int i =0; i < maxInt; i++) {
if(histogram[i] > mode) {
mode = histogram[i];
bin = i;
}
}
// Construct a new range of the mode values, if they are continuous
Range range;
for(int i = bin; i < maxInt; i++) {
if(histogram[i] == mode) {
if(range != null)
return null; // violates two ranges with the same mode
range = new Range();
range.Minimum = i;
while(i < maxInt && histrogram[i] == mode)
i++;
range.Maximum = i;
}
}
return range;
}
}
This involves four loops and is easily O(n^2) if not higher. Is there a more efficient algorithm (speed wise) to find the largest overlap range from a list of other ranges?
EDIT
Yes, the O(n^2) is not correct, I was thinking about it incorrectly. It should be O(N * M) as was pointed out in the comments.
EDIT 2
Let me stipulate a few things, the absolute min and max values of the integer values will be from (0, 20000). Secondly, the average number of IRange will be on the order of 100. I don't know if this will change the way the algorithm is designed.
EDIT 3
I am implementing this algorithm on a scientific instrument (a mass spectrometer) in which the speed of the data processing is paramount to the quality of data (faster analysis time = more spectra collected in time T). The firmware language (proprietary) only has arrays[] and is not object orientated. I choose C# since I am decent at porting concepts between the two languages and thought that in the interest of the SO community, a good answer would have a wider audience.
Convert your list of ranges to a list of start and stop points. Sort the list with an O(n log n) algorithm. Now you can iterate through the list and increment or decrement a counter depending on whether it's a start or stop point, which will give you the current overlap depth.
As I understood OP's question, the solution given the 3 ranges
A: 012
B: 123
C: 34
would be the range 12 (a common subset of A and B), not range 123 (because it isn't a common subset of any pair).
Think about the algorithm on paper before writing any code. How about a dynamic programming solution? (If you don't know dynamic programming, it's worth reading about it in a book). The idea of dynamic programming is to build up solutions of simpler subproblems.
Let f_i(n, k) be the size of the longest interval starting at n common to at least k of the first i given ranges.
You can work out f_1 from f_0, and f_2 from f_1 and so on. Updating the functions just depends on the one extra range considered.
Suppose there are M ranges. The values of f_M will tell us the answer to your problem.
The deepest depth you talked about is the greatest k such that f_M(n, k) is non zero for some n. Let's call that maximal depth K. Then we look for the maximum of f_M(n, K) over n. Its maximum is the size of your largest range, which begins at the maximising n.
The maximising n must be the lower bound of some range, so we only need to calculate f for these kind of n. There are M ranges, so at most M lower bounds. Thus, this algorithm has complexity O(MMK).
Let the ith range be from a to b
If n is outside a to b, then no change
f_i(n,k) = f_i-1(n,k)
If n is within a to b, we test the k deep solution made by combining fresh the interval with our old k-1 deep solution. We only use it if it's better than what we already had.
f_i(n,k) = max ( f_i-1(n,k) , min( f_i-1(n,k-1) , b-n+1))
Example! For ranges 0 to 5, 2 to 6, 4 to 8, and 6 to 9.
n 0123456789
...... range 0 to 5
f_1(n,1) 6543210000
..... range 2 to 6
f_2(n,1) 6554321000
f_2(n,2) 0043210000
..... range 4 to 8
f_3(n,1) 6554543210
f_3(n,2) 0043321000
f_3(n,3) 0000210000
.... range 6 to 9
f_4(n,1) 6554544321
f_4(n,2) 0043323210
f_4(n,3) 0000211000
f_4(n,4) 0000000000
Thus the deepest depth K is 3, and the longest range is 4 to 5. We can also see that the longest range depth 2 has size 4 and starts at 3.
I'm playing a little experiment to increase my knowledge and I have reached a part where I feel i could really optimize it, but am not quite sure how to do this.
I have many arrays of numbers. (for simplicity, lets say each array has 4 numbers: 1, 2, 3, and 4)
The target is to have all of the numbers in ascending order (ie,
1-2-3-4), but the numbers are all scrambled in the different arrays.
A higher weight is placed upon larger numbers.
I need to sort all of these arrays in order of how close they are to
the target.
Ie, 4-3-2-1 would be the worst possible case.
Some example cases:
3-4-2-1 is better than 4-3-2-1
2-3-4-1 is better than 1-4-3-2 (even though two numbers match (1 and 3).
the biggest number is closer to its spot.)
So the big numbers always take precedence over the smaller numbers. Here is my attempt:
var tmp = from m in moves
let mx = m.Max()
let ranking = m.IndexOf(s => s == mx)
orderby ranking descending
select m;
return tmp.ToArray();
P.S IndexOf in the above example, is an extension I wrote to take an array and expression, and return the index of the element that satisfies the expression. It is needed because the situation is really a little more complicated, i'm simplifying it with my example.
The problem with my attempt here though, is that it would only sort by the biggest number, and forget all of the other numbers. it SHOULD rank by biggest number first, then by second largest, then by third.
Also, since it will be doing this operation over and over again for several minutes, it should be as efficient as possible.
You could implement a bubble sort, and count the number of times you have to move data around. The number of data moves will be large on arrays that are far away from the sorted ideal.
int GetUnorderedness<T>(T[] data) where T : IComparable<T>
{
data = (T[])data.Clone(); // don't modify the input data,
// we weren't asked to actually sort.
int swapCount = 0;
bool isSorted;
do
{
isSorted = true;
for(int i = 1; i < data.Length; i++)
{
if(data[i-1].CompareTo(data[i]) > 0)
{
T temp = data[i];
data[i] = data[i-1];
data[i-1] = temp;
swapCount++;
isSorted = false;
}
}
} while(!isSorted);
}
From your sample data, this will give slightly different results than you specified.
Some example cases:
3-4-2-1 is better than 4-3-2-1
2-3-4-1 is better than 1-4-3-2
3-4-2-1 will take 5 swaps to sort, 4-3-2-1 will take 6, so that works.
2-3-4-1 will take 3, 1-4-3-2 will also take 3, so this doesn't match up with your expected results.
This algorithm doesn't treat the largest number as the most important, which it seems you want; all numbers are treated equally. From your description, you'd consider 2-1-3-4 as much better than 1-2-4-3, because the first one has both the largest and second largest numbers in their proper place. This algorithm would consider those two equal, because each requires only 1 swap to sort the array.
This algorithm does have the advantage that it's not just a comparison algorithm, each input has a discrete output, so you only need to run the algorithm once for each input array.
I hope this helps
var i = 0;
var temp = (from m in moves select m).ToArray();
do
{
temp = (from m in temp
orderby m[i] descending
select m).ToArray();
}
while (++i < moves[0].Length);
This is fairly 'math-y' but I'm posting here because it's a Project Euler problem, & I have working code that presumably has bugs in it.
The question Determing longest repeating cycle in a decimal expansion solves the problem using logarithms, but I'm interested in solving with simple brute force. More accurately, I'm interested in understanding why my algorithm and code is not returning the correct solution.
The algorithm is simple:
replicate a 'long division',
at each step record the divisor and the remainder
when a divisor / remainder tuple is repeated, infer that the decimal representation will repeat.
Here are private fields, as requested
private int numerator;
private int recurrence;
private int result;
private int resultRecurrence;
private List<dynamic> digits;
and here is the code:
private void Go()
{
foreach (var i in primes)
{
digits = new List<dynamic>();
numerator = 1;
recurrence = 0;
while (numerator != 0)
{
numerator *= 10;
// quotient
var q = numerator / i;
// remainder
var r = numerator % i;
digits.Add(new { Divisor = q, Remainder = r });
// if we've found a repetition then break out
var m = digits.Where(p => p.Divisor == q && p.Remainder == r).ToList();
if (m.Count > 1)
{
recurrence = digits.LastIndexOf(m[0]) - digits.IndexOf(m[0]);
break;
}
numerator = r;
}
if (recurrence > resultRecurrence)
{
resultRecurrence = recurrence;
result = i;
}
}}
When testing integers < 10 and < 20 I get the correct result; and I correctly identify the value of i as well. However the decimal represetation that I get is incorrect - I calculate i-1 whereas the correct result is far less (something like i-250).
So presumably I either have a programming bug - which I can't find - or a logic bug.
I'm confused because it feels like a multiplicative group over p to me, in which there would be p-1 elements. I'm sure I'm missing something, can anyone provide suggestions?
edit
I'm not going to include my prime number code - it's not relevant, as I explain above I correctly identify the value of i (from memory it is 983) but I'm having problems getting the correct value for resultRecurrence.
I'm confused because it feels like a multiplicative group over p to me, in which there would be p-1 elements. I'm sure I'm missing something, can anyone provide suggestions?
Close.
For all primes except 2 and 5 (which divide 10), the sequence of remainders is formed by starting with 1 and transforming by
remainder = (10 * remainder) % prime
thus the k-th remainder is 10k (mod prime) and the set of remainders forms a subgroup of the group of nonzero remainders modulo prime[1]. The length of the recurring cycle is the order of that subgroup, which is also known as the order of 10 modulo prime.
The order of the group of nonzero remainders modulo prime is prime-1, and there's a theorem by Fermat:
Let G be a finite group of order g and H be a subgroup of G. Then the order h of H divides g.
So the length of the cycle is always a divisor of prime-1, and sometimes it's prime-1, e.g. for 7 or 19.
[1] For composite numbers n coprime to 10, that would be the group of remainders modulo n that are coprime to n.
First off, you don’t need the divisors, you only need the remainders.
Secondly, I would split the function into multiple independent parts instead of having everything in one big method: The long division / finding of the cycle length is independent of the rest (= finding the longest cycle).
Your break on Where coupled with Count is unintuitive. Why not just use a while loop with the condition (! digits.Contains(r))? (This would require putting 0 as a remainder into the digits list before the loop start.)
This leaves us with a much cleaner code that should be straightforward to debug.
recurrence = digits.LastIndexOf(m[0]) - digits.IndexOf(m[0]);
Surely the value of resultRecurrence is always going to be i-1 ? Since for a fraction of the form 1/n, the decimal starts repeating exactly when the division-in-progress (the ith digit) gives the same quotient-remainder as the very first trial division (1, hence i-1).
(as a side note, may I introduce you to Math.DivRem).
I have a piece of code that in principal looks like the below. The issue is that I am triggering this code 10's of thousands of times and need it to be more optimized. Any suggestions would be welcome.
//This array is in reality enormous and needs to be triggered loads of times in my code
int[] someArray = { 1, 631, 632, 800, 801, 1600, 1601, 2211, 2212, 2601, 2602 };
//I need to know where in the array a certain value is located
//806 is located between entry 801 and 1600 so I want the array ID of 801 to be returned (4).
id = 806
//Since my arrays are very large, this operation takes far too long
for (int i = 0; i < someArrayLenght; i++)
{
if (someArray[i] <= id)
return i;
}
Edit: Sorry got the condition wrong. It should return the id when 806 is greater than 801. Hope you can make sense ot ouf it.
The array values look sorted. If that’s indeed the case, use binary search:
int result = Array.BinarySearch(someArray, id);
return result < 0 ? (~result - 1) : result;
If the searched value does not appear in the array, Array.BinarySearch will return the bitwise complement of the next greater value’s index. This is why I am testing for negative numbers and using the bitwise complement operator in the code above. The result should then be the same as in your code.
Binary search has logarithmic running time instead of linear. That is, in the worst case only log2 n many entries have to be searched instead of n (where n is the array’s size).
Providing someArray's content is sorted, use binary search — see also Array.BinarySearch.
Note: In your example the condition in if (someArray[i] <= id) return i; will trigger whenever id >= 1. I doubt that's what you want to do.
I am have difficulties solving this problem:
For a positive number n, define C(n)
as the number of the integers x, for
which 1 < x < n and x^3 = 1 mod n.
When n=91, there are 8 possible values
for x, namely : 9, 16, 22, 29, 53, 74,
79, 81. Thus, C(91)=8.
Find the sum of the positive numbers
n <= 10^11 for which C(n) = 242.
My Code:
double intCount2 = 91;
double intHolder = 0;
for (int i = 0; i <= intCount2; i++)
{
if ((Math.Pow(i, 3) - 1) % intCount2 == 0)
{
if ((Math.Pow(i, 3) - 1) != 0)
{
Console.WriteLine(i);
intHolder += i;
}
}
}
Console.WriteLine("Answer = " + intHolder);
Console.ReadLine();
This works for 91 but when I put in any large number with a lot of 0's, it gives me a lot of answers I know are false. I think this is because it is so close to 0 that it just rounds to 0. Is there any way to see if something is precisely 0? Or is my logic wrong?
I know I need some optimization to get this to provide a timely answer but I am just trying to get it to produce correct answers.
Let me generalize your questions to two questions:
1) What specifically is wrong with this program?
2) How do I figure out where a problem is in a program?
Others have already answered the first part, but to sum up:
Problem #1: Math.Pow uses double-precision floating point numbers, which are only accurate to about 15 decimal places. They are unsuitable for doing problems that require perfect accuracy involving large integers. If you try to compute, say, 1000000000000000000 - 1, in doubles, you'll get 1000000000000000000, which is an accurate answer to 15 decimal places; that's all we guarantee. If you need a perfectly accurate answer for working on large numbers, use longs for results less than about 10 billion billion, or the large integer mathematics class in System.Numerics that will ship with the next version of the framework.
Problem #2: There are far more efficient ways to compute modular exponents that do not involve generating huge numbers; use them.
However, what we've got here is a "give a man a fish" situation. What would be better is to teach you how to fish; learn how to debug a program using the debugger.
If I had to debug this program the first thing I would do is rewrite it so that every step along the way was stored in a local variable:
double intCount2 = 91;
double intHolder = 0;
for (int i = 0; i <= intCount2; i++)
{
double cube = Math.Pow(i, 3) - 1;
double remainder = cube % intCount2;
if (remainder == 0)
{
if (cube != 0)
{
Console.WriteLine(i);
intHolder += i;
}
}
}
Now step through it in the debugger with an example where you know the answer is wrong, and look for places where your assumptions are violated. If you do so, you'll quickly discover that 1000000 cubed minus 1 is not 99999999999999999, but rather 1000000000000000000.
So that's advice #1: write the code so that it is easy to step through in the debugger, and examine every step looking for the one that seems wrong.
Advice #2: Pay attention to quiet nagging doubts. When something looks dodgy or there's a bit you don't understand, investigate it until you do understand it.
Wikipedia has an article on Modular exponentiation that you may find informative. IIRC, Python has it built in. C# does not, so you'll need to implement it yourself.
Don't compute powers modulo n using Math.Pow; you are likely to experience overflow issues among other possible issues. Instead, you should compute them from first principles. Thus, to compute the cube of an integer i modulo n first reduce i modulo n to some integer j so that i is congruent to j modulo n and 0 <= j < n. Then iteratively multiply by j and reduce modulo n after each multiplication; to compute a cube you would perform this step twice. Of course, that's the native approach but you can make this more efficient by following the classic algorithm for exponentiation by using exponentiation by squaring.
Also, as far as efficiency, I note that you are unnecessarily computing Math.Pow(i, 3) - 1 twice. Thus, at a minimum, replace
if ((Math.Pow(i, 3) - 1) % intCount2 == 0) {
if ((Math.Pow(i, 3) - 1) != 0) {
Console.WriteLine(i);
intHolder += i;
}
}
with
int cubed = Math.Pow(i, 3) - 1;
if((cubed % intCount2 == 0) && (cubed != 0)) {
Console.WriteLine(i);
intHolder += i;
}
Well, there's something missing or a typo...
"intHolder1" should presumably be "intHolder" and for intCount2=91 to result in 8 the increment line should be:-
intHolder ++;
I don't have a solution to your problem, but here's just a piece of advice :
Don't use floating point numbers for calculations that only involve integers... Type int (Int32) is clearly not big enough for your needs, but long (Int64) should be enough : the biggest number you will have to manipulate will be (10 ^ 11 - 1) ^ 3, which is less than 10 ^ 14, which is definitely less than Int64.MaxValue. Benefits :
you do all your calculations with 64-bit integers, which should be pretty efficient on a 64-bit processor
all the results of your calculations are exact, since there are no approximations due the internal representation of doubles
Don't use Math.Pow to calculate the cube of an integer... x*x*x is just as simple, and more efficient since it doesn't need a conversion to/from double. Anyway, I'm not very good at math, but you probably don't need to calculate x^3... check the links about modular exponentiation in other answers