Algorithm for "consolidating" N items into K

Algorithm for "consolidating" N items into K - c#

I was wondering whether there's a known algorithm for doing the following, and also wondering how it would be implemented in C#. Maybe this is a known type of problem.
Example:
Suppose I have a class
class GoldMine
{
public int TonsOfGold { get; set; }
}
and a List of N=3 such items
var mines = new List<GoldMine>() {
new GoldMine() { TonsOfGold = 10 },
new GoldMine() { TonsOfGold = 12 },
new GoldMine() { TonsOfGold = 5 }
};
Then consolidating the mines into K=2 mines would be the consolidations
{ {Lines[0],Lines[1]}, {Lines[2]} }, // { 22 tons, 5 tons }
{ {Lines[0],Lines[2]}, {Lines[1]} }, // { 15 tons, 12 tons }
{ {Lines[1],Lines[2]}, {Lines[0]} } // { 17 tons, 10 tons }
and consolidating into K=1 mines would be the single consolidation
{ Lines[0],Lines[1],Lines[2] } // { 27 tons }
What I'm interested in is the algorithm for the consolidation process.

If I'm not mistaken, the problem you're describing is Number of k-combinations for all k
I found a code snippet which I believe addresses your use case but I just can't remember where I got it from. It must have been from StackOverflow. If anyone recognized this particular piece of code, please let me know and I'll make sure to credit it.
So here's the extension method:
public static class ListExtensions
{
public static List<ILookup<int, TItem>> GroupCombinations<TItem>(this List<TItem> items, int count)
{
var keys = Enumerable.Range(1, count).ToList();
var indices = new int[items.Count];
var maxIndex = items.Count - 1;
var nextIndex = maxIndex;
indices[maxIndex] = -1;
var groups = new List<ILookup<int, TItem>>();
while (nextIndex >= 0)
{
indices[nextIndex]++;
if (indices[nextIndex] == keys.Count)
{
indices[nextIndex] = 0;
nextIndex--;
continue;
}
nextIndex = maxIndex;
if (indices.Distinct().Count() != keys.Count)
{
continue;
}
var group = indices.Select((keyIndex, valueIndex) =>
new
{
Key = keys[keyIndex],
Value = items[valueIndex]
})
.ToLookup(x => x.Key, x => x.Value);
groups.Add(group);
}
return groups;
}
}
And a little utility method that prints the output:
public void PrintGoldmineCombinations(int count, List<GoldMine> mines)
{
Debug.WriteLine("count = " + count);
var groupNumber = 0;
foreach (var group in mines.GroupCombinations(count))
{
groupNumber++;
Debug.WriteLine("group " + groupNumber);
foreach (var set in group)
{
Debug.WriteLine(set.Key + ": " + set.Sum(m => m.TonsOfGold) + " tons of gold");
}
}
}
You would use it like so:
var mines = new List<GoldMine>
{
new GoldMine {TonsOfGold = 10},
new GoldMine {TonsOfGold = 12},
new GoldMine {TonsOfGold = 5}
};
PrintGoldmineCombinations(1, mines);
PrintGoldmineCombinations(2, mines);
PrintGoldmineCombinations(3, mines);
Which will produce the following output:
count = 1
group 1
1: 27 tons of gold
count = 2
group 1
1: 22 tons of gold
2: 5 tons of gold
group 2
1: 15 tons of gold
2: 12 tons of gold
group 3
1: 10 tons of gold
2: 17 tons of gold
group 4
2: 10 tons of gold
1: 17 tons of gold
group 5
2: 15 tons of gold
1: 12 tons of gold
group 6
2: 22 tons of gold
1: 5 tons of gold
count = 3
group 1
1: 10 tons of gold
2: 12 tons of gold
3: 5 tons of gold
group 2
1: 10 tons of gold
3: 12 tons of gold
2: 5 tons of gold
group 3
2: 10 tons of gold
1: 12 tons of gold
3: 5 tons of gold
group 4
2: 10 tons of gold
3: 12 tons of gold
1: 5 tons of gold
group 5
3: 10 tons of gold
1: 12 tons of gold
2: 5 tons of gold
group 6
3: 10 tons of gold
2: 12 tons of gold
1: 5 tons of gold
Note: this does not take into account duplicates by the contents of the sets and I'm not sure if you actually want those filtered out or not.
Is this what you need?
EDIT
Actually, looking at your comment it seems you don't want the duplicates and you also want the lower values of k included, so here is a minor modification that takes out the duplicates (in a really ugly way, I apologize) and gives you the lower values of k per group:
public static List<ILookup<int, TItem>> GroupCombinations<TItem>(this List<TItem> items, int count)
{
var keys = Enumerable.Range(1, count).ToList();
var indices = new int[items.Count];
var maxIndex = items.Count - 1;
var nextIndex = maxIndex;
indices[maxIndex] = -1;
var groups = new List<ILookup<int, TItem>>();
while (nextIndex >= 0)
{
indices[nextIndex]++;
if (indices[nextIndex] == keys.Count)
{
indices[nextIndex] = 0;
nextIndex--;
continue;
}
nextIndex = maxIndex;
var group = indices.Select((keyIndex, valueIndex) =>
new
{
Key = keys[keyIndex],
Value = items[valueIndex]
})
.ToLookup(x => x.Key, x => x.Value);
if (!groups.Any(existingGroup => group.All(grouping1 => existingGroup.Any(grouping2 => grouping2.Count() == grouping1.Count() && grouping2.All(item => grouping1.Contains(item))))))
{
groups.Add(group);
}
}
return groups;
}
It produces the following output for k = 2:
group 1
1: 27 tons of gold
group 2
1: 22 tons of gold
2: 5 tons of gold
group 3
1: 15 tons of gold
2: 12 tons of gold
group 4
1: 10 tons of gold
2: 17 tons of gold

This is actually the problem of enumerating all K-partitions of a set of N objects, often described as enumerating the ways to place N labelled objects into K unlabelled boxes.
As is almost always the case, the easiest way to solve a problem involving enumeration of unlabelled or unordered alternatives is to create a canonical ordering and then figure out how to generate only canonically-ordered solutions. In this case, we assume that the objects have some total ordering so that we can refer to them by integers between 1 and N, and then we place the objects in order into the partitions, and order the partitions by the index of the first object in each one. It's pretty easy to see that this ordering cannot produce duplicates and that every partitioning must correspond to some canonical ordering.
We can then represent a given canonical ordering by a sequence of N integers, where each integer is the number of the partition for the corresponding object. Not every sequence of N integers will work, however; we need to constrain the sequences so that the partitions are in the canonical order (sorted by the index of the first element). The constraint is simple: each element in the sequence must either be some integer which previously appeared in the sequence (an object placed into an already present partition) or it must be the index of the next partition, which is one more than the index of the last partition already present. In summary:
The first entry in the sequence must be 1 (because the first object can only be placed into the first partition); and
Each subsequent entry is at least 1 and no greater than one more than the largest entry preceding that point.
(These two criteria could be combined if we interpret "the largest entry preceding" the first entry as 0.)
That's not quite enough, since it doesn't restrict the sequence to exactly K. If we wanted to find all of the partitions, that would be fine, but if we want all the partitions whose size is precisely K then we need to constrain the last element in the sequence to be K, which means that the second last element must be at least K−1, the third last element at least K−2, and so on, as well as not allowing any element to be greater than K:
The element at position i must be in the range [max(1, K+i−N), K]
Generating sequences according to a simple set of constraints like the above can easily be done recursively. We start with an empty sequence, and then successively add each possible next elements, calling this procedure recursively to fill in the entire sequence. As long as it is simple to produce the list of possible next elements, the recursive procedure will be straight-forward. In this case, we need three pieces of information to produce this list: N, K, and the maximum value generated so far.
That leads to the following pseudo-code:
GenerateAllSequencesHelper(N, K, M, Prefix):
if length(Prefix) is N:
Prefix is a valid sequence; handle it
else:
# [See Note 1]
for i from max(1, length(Prefix) + 1 + K - N)
up to min(M + 1, K):
Append i to Prefix
GenerateAllSequencesHelper(N, K, max(M, i), Prefix)
Pop i off of Prefix
GenerateAllSequences(N, K):
GenerateAllSequencesHelper(N, K, 0, [])
Since the recursion depth will be extremely limited for any practical application of this procedure, the recursive solution should be fine. However, it is also quite simple to produce an iterative solution even without using a stack. This is an instance of a standard enumeration algorithm for constrained sequences:
Start with the lexicographically smallest possible sequence
While possible:
Scan backwards to find the last element which could be increased. ("Could be" means that increasing that element would still result in the prefix of some valid sequence.)
Increment that element to the next largest possible value
Fill in the rest of the sequence with the smallest possible suffix.
In the iterative algorithm, the backwards scan might involve checking O(N) elements, which apparently makes it slower than the recursive algorithm. However, in most cases they will have the same computational complexity, because in the recursive algorithm each generated sequence also incurs the cost of the recursive calls and returns required to reach it. If each (or, at least, most) recursive calls produce more than one alternative, the recursive algorithm will still be O(1) per generated sequence.
But in this case, it is likely that the iterative algorithm will also be O(1) per generated sequence, as long as the scan step can be performed in O(1); that is, as long as it can be performed without examining the entire sequence.
In this particular case, computing the maximum value of the sequence up to a given point is not O(1), but we can produce an O(1) iterative algorithm by also maintaining the vector of cumulative maxima. (In effect, this vector corresponds to the stack of M arguments in the recursive procedure above.)
It's easy enough to maintain the M vector; once we have it, we can easily identify "incrementable" elements in the sequence: element i is incrementable if i>0, M[i] is equal to M[i−1], and M[i] is not equal to K. [Note 2]
Notes
If we wanted to produce all partitions, we would replace the for loop above with the rather simpler:
for i from 1 to M+1:
This answer is largely based on this answer, but that question asked for all partitions; here, you want to generate the K-partitions. As indicated, the algorithms are very similar.

Related

C# why does binarysearch have to be made on sorted arrays and lists?

C# why does binarysearch have to be made on sorted arrays and lists?
Is there any other method that does not require me to sort the list?
It kinda messes with my program in a way that I cannot sort the list for it to work as I want to.

A binary search works by dividing the list of candidates in half using equality. Imagine the following set:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
We can also represent this as a binary tree, to make it easier to visualise:
Source
Now, say we want to find the number 3. We can do it like so:
Is 3 smaller than 8? Yes. OK, now we're looking at everything between 1 and 7.
Is 3 smaller than 4? Yes. OK, now we're looking at everything between 1 and 3.
Is 3 smaller than 2? No. OK, now we're looking at 3.
We found it!
Now, if your list isn't sorted, how will we divide the list in half? The simple answer is: we can't. If we swap 3 and 15 in the example above, it would work like this:
Is 3 smaller than 8? Yes. OK, now we're looking at everything between 1 and 7.
Is 3 smaller than 4? Yes. OK, now we're looking at everything between 1 and 3 (except we swapped it with 15).
Is 3 smaller than 2? No. OK, now we're looking at 15.
Huh? There's no more items to check but we didn't find it. I guess it's not in the list.
The solution is to use an appropriate data type instead. For fast lookups of key/value pairs, I'll use a Dictionary. For fast checks if something already exists, I'll use a HashSet. For general storage I'll use a List or an array.
Dictionary example:
var values = new Dictionary<int, string>();
values[1] = "hello";
values[2] = "goodbye";
var value2 = values[2]; // this lookup will be fast because Dictionaries are internally optimised inside and partition keys' hash codes into buckets.
HashSet example:
var mySet = new HashSet<int>();
mySet.Add(1);
mySet.Add(2);
if (mySet.Contains(2)) // this lookup is fast for the same reason as a dictionary.
{
// do something
}
List exmaple:
var list = new List<int>();
list.Add(1);
list.Add(2);
if (list.Contains(2)) // this isn't fast because it has to visit each item in the list, but it works OK for small sets or places where performance isn't so important
{
}
var idx2 = list.IndexOf(2);
If you have multiple values with the same key, you could store a list in a Dictionary like this:
var values = new Dictionary<int, List<string>>();
if (!values.ContainsKey(key))
{
values[key] = new List<string>();
}
values[key].Add("value1");
values[key].Add("value2");

There is no way you use binary search on unordered collections. Sorting collection is the main concept of the binary search. The key is that on every move u take the middle index between l and r. On first step they are 0 and size - 1, after every step one of them becomes middle index between them. If x > arr[m] then l becomes m + 1, otherwise r becomes m - 1. Basically, on every step you take half of the array you had and, of course, it remains sorted. This code is recursive, if you don't know what recursion is(which is very important in programming), you can review and learn here.
// C# implementation of recursive Binary Search
using System;
class GFG {
// Returns index of x if it is present in
// arr[l..r], else return -1
static int binarySearch(int[] arr, int l,
int r, int x)
{
if (r >= l) {
int mid = l + (r - l) / 2;
// If the element is present at the
// middle itself
if (arr[mid] == x)
return mid;
// If element is smaller than mid, then
// it can only be present in left subarray
if (arr[mid] > x)
return binarySearch(arr, l, mid - 1, x);
// Else the element can only be present
// in right subarray
return binarySearch(arr, mid + 1, r, x);
}
// We reach here when element is not present
// in array
return -1;
}
// Driver method to test above
public static void Main()
{
int[] arr = { 2, 3, 4, 10, 40 };
int n = arr.Length;
int x = 10;
int result = binarySearch(arr, 0, n - 1, x);
if (result == -1)
Console.WriteLine("Element not present");
else
Console.WriteLine("Element found at index "
+ result);
}
}
Output:
Element is present at index 3

Sure there is.
var list = new List<int>();
list.Add(42);
list.Add(1);
list.Add(54);
var index = list.IndexOf(1); //TADA!!!!
EDIT: Ok, I hoped the irony was obvious. But strictly speaking, if your array is not sorted, you are pretty much stuck with the linear search, readily available by means of IndexOf() or IEnumerable.First().

Subset Sum algorithm efficiency

We have a number of payments (Transaction) that come into our business each day. Each Transaction has an ID and an Amount. We have the requirement to match a number of these transactions to a specific amount. Example:
Transaction Amount
1 100
2 200
3 300
4 400
5 500
If we wanted to find the transactions that add up to 600 you would have a number of sets (1,2,3),(2,4),(1,5).
I found an algorithm that I have adapted, that works as defined below. For 30 transactions it takes 15ms. But the number of transactions average around 740 and have a maximum close to 6000. Is the a more efficient way to perform this search?
sum_up(TransactionList, remittanceValue, ref MatchedLists);
private static void sum_up(List<Transaction> transactions, decimal target, ref List<List<Transaction>> matchedLists)
{
sum_up_recursive(transactions, target, new List<Transaction>(), ref matchedLists);
}
private static void sum_up_recursive(List<Transaction> transactions, decimal target, List<Transaction> partial, ref List<List<Transaction>> matchedLists)
{
decimal s = 0;
foreach (Transaction x in partial) s += x.Amount;
if (s == target)
{
matchedLists.Add(partial);
}
if (s > target)
return;
for (int i = 0; i < transactions.Count; i++)
{
List<Transaction> remaining = new List<Transaction>();
Transaction n = new Transaction(0, transactions[i].ID, transactions[i].Amount);
for (int j = i + 1; j < transactions.Count; j++) remaining.Add(transactions[j]);
List<Transaction> partial_rec = new List<Transaction>(partial);
partial_rec.Add(new Transaction(n.MatchNumber, n.ID, n.Amount));
sum_up_recursive(remaining, target, partial_rec, ref matchedLists);
}
}
With Transaction defined as:
class Transaction
{
public int ID;
public decimal Amount;
public int MatchNumber;
public Transaction(int matchNumber, int id, decimal amount)
{
ID = id;
Amount = amount;
MatchNumber = matchNumber;
}
}

As already mentioned your problem can be solved by pseudo polynomial algorithm in O(n*G) with n - number of items and G - your targeted sum.
The first part question: is it possible to achieve the targeted sum G. The following pseudo/python code solves it (have no C# on my machine):
def subsum(values, target):
reached=[False]*(target+1) # initialize as no sums reached at all
reached[0]=True # with 0 elements we can only achieve the sum=0
for val in values:
for s in reversed(xrange(target+1)): #for target, target-1,...,0
if reached[s] and s+val<=target: # if subsum=s can be reached, that we can add the current value to this sum and build an new sum
reached[s+val]=True
return reached[target]
What is the idea? Let's consider values [1,2,3,6] and target sum 7:
We start with an empty set - the possible sum is obviously 0.
Now we look at the first element 1 and have to options to take or not to take. That leaves as with possible sums {0,1}.
Now looking at the next element 2: leads to possible sets {0,1} (not taking)+{2,3} (taking).
Until now not much difference to your approach, but now for element 3 we have possible sets a. for not taking {0,1,2,3} and b. for taking {3,4,5,6} resulting in {0,1,2,3,4,5,6} as possible sums. The difference to your approach is that there are two way to get to 3 and your recursion will be started twice from that (which is not needed). Calculating basically the same staff over and over again is the problem of your approach and why the proposed algorithm is better.
As last step we consider 6 and get {0,1,2,3,4,5,6,7} as possible sums.
But you also need the subset which leads to the targeted sum, for this we just remember which element was taken to achieve the current sub sum. This version returns a subset which results in the target sum or None otherwise:
def subsum(values, target):
reached=[False]*(target+1)
val_ids=[-1]*(target+1)
reached[0]=True # with 0 elements we can only achieve the sum=0
for (val_id,val) in enumerate(values):
for s in reversed(xrange(target+1)): #for target, target-1,...,0
if reached[s] and s+val<=target:
reached[s+val]=True
val_ids[s+val]=val_id
#reconstruct the subset for target:
if not reached[target]:
return None # means not possible
else:
result=[]
current=target
while current!=0:# search backwards jumping from predecessor to predecessor
val_id=val_ids[current]
result.append(val_id)
current-=values[val_id]
return result
As an another approach you could use memoization to speed up your current solution remembering for the state (subsum, number_of_elements_not considered) whether it is possible to achieve the target sum. But I would say the standard dynamic programming is a less error prone possibility here.

Yes.
I can't provide full code at the moment, but instead of iterating each list of transactions twice until finding matches (O squared), try this concept:
setup a hashtable with the existing transaction amounts as entries, as well as the summation of each set of two transactions assuming each value is made of a max of two transactions (weekend credit card processing).
for each total, reference into the hashtable - the sets of transactions in that slot are the list of matching transactions.
Instead of O^2, you can get it down to 4*O, which would make a noticeable difference in speed.
Good luck!

Dynamic programming can solve this problem efficiently:
Assume you have n transactions and the max amount of transactions is m.
we can solve it just in the complexity of O(nm).
learn it at Knapsack problem.
for this problem we can define for pre i transactions the numbers of subset, add up to sum: dp[i][sum].
the equation:
for i 1 to n:
dp[i][sum] = dp[i - 1][sum - amount_i]
the dp[n][sum] is the numbers of you need, and you need to add some tricks to get what are all the subsets.
Blockquote

You have a couple of practical assumptions here that would make brute force with smartish branch pruning feasible:
items are unique, hence you wouldn't be getting combinatorial blow up of valid subsets (i.e. (1,1,1,1,1,1,1,1,1,1,1,1,1) adding up to 3)
if the number of resulting feasible sets is still huge, you would run out of memory collecting them before running into total runtime issues.
ordering input ascending would allow for an easy early stop check - if your remaining sum is smaller then the current element, then none of the yet unexamined items could possibly be in a result (as current and subsequent items would only get bigger)
keeping running sums would speed up each step, as you wouldn't be recalculating it over and over again
Here's a bit of code:
public static List<T[]> SubsetSums<T>(T[] items, int target, Func<T, int> amountGetter)
{
Stack<T> unusedItems = new Stack<T>(items.OrderByDescending(amountGetter));
Stack<T> usedItems = new Stack<T>();
List<T[]> results = new List<T[]>();
SubsetSumsRec(unusedItems, usedItems, target, results, amountGetter);
return results;
}
public static void SubsetSumsRec<T>(Stack<T> unusedItems, Stack<T> usedItems, int targetSum, List<T[]> results, Func<T,int> amountGetter)
{
if (targetSum == 0)
results.Add(usedItems.ToArray());
if (targetSum < 0 || unusedItems.Count == 0)
return;
var item = unusedItems.Pop();
int currentAmount = amountGetter(item);
if (targetSum >= currentAmount)
{
// case 1: use current element
usedItems.Push(item);
SubsetSumsRec(unusedItems, usedItems, targetSum - currentAmount, results, amountGetter);
usedItems.Pop();
// case 2: skip current element
SubsetSumsRec(unusedItems, usedItems, targetSum, results, amountGetter);
}
unusedItems.Push(item);
}
I've run it against 100k input that yields around 1k results in under 25 millis, so it should be able to handle your 740 case with ease.

Need math library for operations over sequences/ranges [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I have three kind of numerical ranges which are defined within some interval like:
1. countinous range (any value within specified interval)
2. periodical sequence (sequence start, step and steps count are specified)
3. a set of exact values (like 1, 3, 7 etc)
I need to union/intersect them (from 2 to N of different types) and get optimized results. Obviously, intersection of above will return result of one of types above and unioning them will result in 1 to M ranges of types above.
Example 1:
1st range is defined as continous range from 5 to 11 and 2nd is periodical sequence from 2 to 18 with step 2 (thus, 8 steps).
Intersection will return a periodic sequence from 6 to 10 with step 2.
Union will return three results: a periodic sequence from 2 to 4 with step 2, a continous range from 5 to 11 and a periodic sequence from 12 to 18 with step 2.
Example 2:
1st range is defined as periodic sequence 0 to 10 with step 2 and 2nd is periodical sequence from 1 to 7 with step 2 (thus, 3 steps).
Intersection will return null as they dont intersect.
Union will return two results: a periodic sequence from 1 to 8 with step 1 (note: optimized result) and an exact value of 10.
Hope i didnt make mistakes :)
Well, these operations over these kind of sequences shouldnt be too complex and i hope there is a library for things like here. Please advice any (will use in C#.NET).
Thanks!
UPDATE
Answering to "how do i think i use the library". All three types above can be easily defined in programming language as:
1. Contionous: { decimal Start; decimal End; } where Start is the beginning of range and End is the end
2. Periodical: { decimal Start; decimal Step; int Count; } where Start is the beginning of sequence, Step is increment and Count is steps count
3. Set of exact numbers: { decimal[] Values; } where Values is array of some decimals.
When i make an intersection of any two ranges of any types listed above then i'll certainly get result of one of that types, e.g. "continous" and "periodical" ranges intersection will result to "periodical", "cont." and "set of exact" will result to set of exact, "cont." and "exact" will return exacts as well. Intersection of the same types will return result of the input types.
Union of 2 ranges is a bit more complex but will anyway return 2 to 3 ranges defined in types above as well.
Having intersection/unioning functions i'll always be able run it over 2 to N ranges and will get results in the input types terms.

First, following other answers before: I don't believe there's a standard library for this: it looks very much as a special case.
Second, the problem / requirement is not stated sufficiently clear. I'll follow the problem introduction and refer to the 3 different types as "set", "periodic", "continuous".
Consider two sets, {1,4,5,6} and {4,5,6,8}. Their intersection is {4,5,6}. Do we have to label that as "periodic" because that description fits the case, or as "set" because it is an intersection of sets?
From this, more in general, do we need to change a "set"-label into "periodic" as soon as its contents IS periodic? After all, a "periodic" is a special case of a "set".
Likewise, consider the union of a "periodic" {4,6,8} and a set {10,15,16}. Do we have to define the result as a periodic {4,6,8,10} plus a periodic{15,16} or rather as one set with all the values, or yet another variety?
And how about the degenerate cases: is {3} a "set", a "periodic", or even a "continuous"? What type is the intersection of "continuous"{1-4} and "set"{4,7,8}? and the intersection of continuous{1-4} and continous{4-7}?
And so forth: it has to be clear how any result - from union and/or intersection - must be labeled/described, whether - for instance - the intersection or union of two discrete types (non-continuous) is always ONE type (usually a set, possibly a periodic), or rather always a series of periodics, or ...
Third, assuming that the above questions have been taken care of, I believe you may approach the implementation with the following guidelines:
consider only two "types", namely "continuous" and "set". The "periodic" is just a special form of a "set", and at any moment we can, if and where appropriate, label a "set" as "periodic".
define a common baseclass, and use appropriate derived (overloaded) methods for union, intersection, and whatever you may need, to produce - for instance - a list(of baseType) as result.
The case-by-case implementations are basically quite simple - assuming that my introductory questions have been answered. Wherever the initial problem appears somewhat "difficult", it is only because the problem and specifications are not yet very well defined.

That all sounds pretty custom - so I don't think you'll find a library that does all of that off the peg - but there's nothing too hard there.
I'll start you off. Works fine for the first example, but the method Range.ConstructAppropriate will need to be particularly complicated to return "a periodic sequence from 2 to 4 with step 2, a continous range from 5 to 11 and a periodic sequence from 12 to 18 with step 2" from the set { 5, 6, 7, 8, 9, 10, 11, 2, 4, 12, 14, 16, 18 } which results from the union of the two sets in example 1.
Also (after going to the trouble of wiring all that code) I note from the comments you have a requirement for non-integer numbers. This complicates things. I imagine you could come up with a creative way to apply this pattern.
class Program
{
static void Main(string[] args)
{
// Example 1:
// 1st range is defined as continous range from 5 to 11 and 2nd is periodical sequence from 2 to 18 with step 2 (thus, 8 steps).
// Intersection will return a periodic sequence from 6 to 10 with step 2.example 1
ContinuousRange ex1_r1 = new ContinuousRange(5, 11);
PeriodicRange ex1_r2 = new PeriodicRange(2, 2, 8);
IEnumerable<int> ex1_intersection = ex1_r1.Values.Intersect(ex1_r2.Values);
IList<Range> ex1_intersection_results = Range.ConstructAppropriate(ex1_intersection);
}
}
abstract class Range
{
public abstract IEnumerable<int> Values { get; }
public static IList<Range> ConstructAppropriate(IEnumerable<int> values)
{
var results = new List<Range>();
results.Add(ContinuousRange.ConstructFrom(values));
results.Add(PeriodicRange.ConstructFrom(values));
if (!results.Any(r => r != null))
results.Add(Points.ConstructFrom(values));
return results.Where(r => r != null).ToList();
}
}
class ContinuousRange : Range
{
int start, endInclusive;
public ContinuousRange(int start, int endInclusive)
{
this.start = start;
this.endInclusive = endInclusive;
}
public override IEnumerable<int> Values
{
get
{
return Enumerable.Range(start, endInclusive - start + 1);
}
}
internal static ContinuousRange ConstructFrom(IEnumerable<int> values)
{
int[] sorted = values.ToArray();
if (sorted.Length <= 1)
return null;
for (int i = 1; i < sorted.Length; i++)
{
if (sorted[i] - sorted[i - 1] != 1)
return null;
}
return new ContinuousRange(sorted.First(), sorted.Last());
}
}
class PeriodicRange : Range
{
int start, step, count;
public PeriodicRange(int start, int step, int count)
{
this.start = start;
this.step = step;
this.count = count;
}
public override IEnumerable<int> Values
{
get
{
var nums = new List<int>();
int i = start;
int cur = 0;
while (cur <= count)
{
nums.Add(i);
i += step;
cur++;
}
return nums;
}
}
internal static Range ConstructFrom(IEnumerable<int> values)
{
// check the difference is the same between all values
if (values.Count() < 2)
return null;
var sorted = values.OrderBy(a => a).ToArray();
int step = sorted[1] - sorted[0];
for (int i = 2; i < sorted.Length; i++)
{
if (step != sorted[i] - sorted[i - 1])
return null;
}
return new PeriodicRange(sorted[0], step, sorted.Length - 1);
}
}
class Points : Range
{
int[] nums;
public Points(params int[] nums)
{
this.nums = nums;
}
public override IEnumerable<int> Values
{
get { return nums; }
}
internal static Range ConstructFrom(IEnumerable<int> values)
{
return new Points(values.ToArray());
}
}

Optimizing this C# algorithm (K Difference)

This is the problem I'm solving (it's a sample problem, not a real problem):
Given N numbers , [N<=10^5] we need to count the total pairs of
numbers that have a difference of K. [K>0 and K<1e9]
Input Format: 1st line contains N & K (integers). 2nd line contains N
numbers of the set. All the N numbers are assured to be distinct.
Output Format: One integer saying the no of pairs of numbers that have
a diff K.
Sample Input #00:
5 2
1 5 3 4 2
Sample Output #00:
3
Sample Input #01:
10 1
363374326 364147530 61825163 1073065718 1281246024 1399469912 428047635 491595254 879792181 1069262793
Sample Output #01:
0
I already have a solution (and I haven't been able to optimize it as well as I had hoped). Currently my solution gets a score of 12/15 when it is run, and I'm wondering why I can't get 15/15 (my solution to another problem wasn't nearly as efficient, but got all of the points). Apparently, the code is run using "Mono 2.10.1, C# 4".
So can anyone think of a better way to optimize this further? The VS profiler says to avoid calling String.Split and Int32.Parse. The calls to Int32.Parse can't be avoided, although I guess I could optimize tokenizing the array.
My current solution:
using System;
using System.Collections.Generic;
using System.Text;
using System.Linq;
namespace KDifference
{
class Solution
{
static void Main(string[] args)
{
char[] space = { ' ' };
string[] NK = Console.ReadLine().Split(space);
int N = Int32.Parse(NK[0]), K = Int32.Parse(NK[1]);
int[] nums = Console.ReadLine().Split(space, N).Select(x => Int32.Parse(x)).OrderBy(x => x).ToArray();
int KHits = 0;
for (int i = nums.Length - 1, j, k; i >= 1; i--)
{
for (j = 0; j < i; j++)
{
k = nums[i] - nums[j];
if (k == K)
{
KHits++;
}
else if (k < K)
{
break;
}
}
}
Console.Write(KHits);
}
}
}

Your algorithm is still O(n^2), even with the sorting and the early-out. And even if you eliminated the O(n^2) bit, the sort is still O(n lg n). You can use an O(n) algorithm to solve this problem. Here's one way to do it:
Suppose the set you have is S1 = { 1, 7, 4, 6, 3 } and the difference is 2.
Construct the set S2 = { 1 + 2, 7 + 2, 4 + 2, 6 + 2, 3 + 2 } = { 3, 9, 6, 8, 5 }.
The answer you seek is the cardinality of the intersection of S1 and S2. The intersection is {6, 3}, which has two elements, so the answer is 2.
You can implement this solution in a single line of code, provided that you have sequence of integers sequence, and integer difference:
int result = sequence.Intersect(from item in sequence select item + difference).Count();
The Intersect method will build an efficient hash table for you that is O(n) to determine the intersection.

Try this (note, untested):
Sort the array
Start two indexes at 0
If difference between the numbers at those two positions is equal to K, increase count, and increase one of the two indexes (if numbers aren't duplicated, increase both)
If difference is larger than K, increase index #1
If difference is less than K, increase index #2, if that would place it outside the array, you're done
Otherwise, go back to 3 and keep going
Basically, try to keep the two indexes apart by K value difference.
You should write up a series of unit-tests for your algorithm, and try to come up with edge cases.

This would allow you to do it in a single pass. Using hash sets is beneficial if there are many values to parse/check. You might also want to use a bloom filter in combination with hash sets to reduce lookups.
Initialize. Let A and B be two empty hash sets. Let c be zero.
Parse loop. Parse the next value v. If there are no more values the algorithm is done and the result is in c.
Back check. If v exists in A then increment c and jump back to 2.
Low match. If v - K > 0 then:
insert v - K into A
if v - K exists in B then increment c (and optionally remove v - K from B).
High match. If v + K < 1e9 then:
insert v + K into A
if v + K exists in B then increment c (and optionally remove v + K from B).
Remember. Insert v into B.
Jump back to 2.

// php solution for this k difference
function getEqualSumSubstring($l,$s) {
$s = str_replace(' ','',$s);
$l = str_replace(' ','',$l);
for($i=0;$i<strlen($s);$i++)
{
$array1[] = $s[$i];
}
for($i=0;$i<strlen($s);$i++)
{
$array2[] = $s[$i] + $l[1];
}
return count(array_intersect($array1,$array2));
}
echo getEqualSumSubstring("5 2","1 3 5 4 2");

Actually that's trivially to solve with a hashmap:
First put each number into a hashmap: dict((x, x) for x in numbers) in "pythony" pseudo code ;)
Now you just iterate through every number in the hashmap and check if number + K is in the hashmap. If yes, increase count by one.
The obvious improvement to the naive solution is to ONLY check for the higher (or lower) bound, otherwise you get the double results and have to divide by 2 afterwards - useless.
This is O(N) for creating the hashmap when reading the values in and O(N) when iterating through, i.e. O(N) and about 8loc in python (and it is correct, I just solved it ;-) )

Following Eric's answer, paste the implementation of Interscet method below, it is O(n):
private static IEnumerable<TSource> IntersectIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource current in second)
{
set.Add(current);
}
foreach (TSource current2 in first)
{
if (set.Remove(current2))
{
yield return current2;
}
}
yield break;
}

Select items based on popularity: Avoiding a glorified sort

I have a site where users can post and vote on suggestions. On the from page I initially list 10 suggestions and the header fetches a new random suggestion every 7 seconds.
I want the votes to influence the probability a suggestion will show up, both on the 10-suggestion list and in the header-suggestion. To that end I have a small algorithm to calculate popularity, taking into account votes, age and a couple other things (needs lots of tweaking).
Anyway, after running the algorithm I have a dictionary of suggestions and popularity index, sorted by popularity:
{ S = Suggestion1, P = 0.86 }
{ S = Suggestion2, P = 0.643 }
{ S = Suggestion3, P = 0.134 }
{ S = Suggestion4, P = 0.07 }
{ S = Suggestion5, P = 0.0 }
{ . . .}
I don't want this to be a glorified sort, so I'd like to introduce some random element to the selection process.
In short, I'd like the popularity to be the probability a suggestion gets picked out of the list.
Having a full list of suggestion/popularity, how do I go about picking 10 out based on probabilities? How can I apply the same to the looping header suggestion?

I'm afraid I don't know how to do this very fast, but if you have the collection in memory you can do it like this:
Note that you do not need to sort the list for this algorithm to work.
First sum up all the probabilities (if the probability is linked to popularity, just sum the popularity numbers, where I assume higher values means higher probability)
Calculate a random number in the range of 0 up to but not including that sum
Start at one end of the list and iterate through it
For each element, if the random number you generated is less than the popularity, pick that element
If not, subtract the popularity of the element from the random number, and continue to the next
If the list is static, you could build ranges and do some binary searches, but if the list keeps changing, then I don't know a better way.
Here is a sample LINQPad program that demonstrates:
void Main()
{
var list = Enumerable.Range(1, 9)
.Select(i => new { V = i, P = i })
.ToArray();
list.Dump("list");
var sum =
(from element in list
select element.P).Sum();
Dictionary<int, int> selected = new Dictionary<int, int>();
foreach (var value in Enumerable.Range(0, sum))
{
var temp = value;
var v = 0;
foreach (var element in list)
{
if (temp < element.P)
{
v = element.V;
break;
}
temp -= element.P;
}
Debug.Assert(v > 0);
if (!selected.ContainsKey(v))
selected[v] = 1;
else
selected[v] += 1;
}
selected.Dump("how many times was each value selected?");
}
Output:
list
[] (9 items)
V P
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
45 45 <-- sum
how many times was each value selected?
Dictionary<Int32,Int32> (9 items)
Key Value
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
45 <-- again, sum

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.