get all subsets of a specific set iteratively - c#

I know the iterative solution:
given a set of n elements
save an int v = 2^n and generate all binaries number up to this v.
But what if n > 32?
I know it's already 2^32 subsets, but yet - what's the way to bypass the 32 elements limitation?

If you're happy with a 64 item limit, you can simply use long.
Array / ArrayList of ints / longs. Have a next function something like:
bool next(uint[] arr)
for (int i = 0; i < arr.length; i++)
if (arr[i] == 2^n-1) // 11111 -> 00000
arr[i] = 0
else
arr[i]++
return true
return false // reached the end -> there is no next
BitArray. Probably not a very efficient option compared to the above.
You can have a next function which sets the least significant bit 0 to 1 and all remaining bits to 0. e.g.:
10010 -> 10011
10011 -> 10100
Note - this will probably take forever, simply because there's so many subsets, but that's not the question.

You can use #biziclop approach, by propagating the carry bit in the following way: store your number as vector of 32-bit "digits" of length K. So, you can generate 2^(K*32) subsets, and every increment operation will take at most O(K) operations.
The other thing that I can think of is recursive backtrack, that will generate all subsets in one array.

You could write an analog of this concise Haskell implementation:
powerSet = filterM (const [True, False])
Except there is no built-in filterM in C#. That's no problem, you can implement it yourself.
Here is my attempt at it:
public static IEnumerable<IEnumerable<T>> PowerSet<T>(IEnumerable<T> els)
{
return FilterM(_ => new[] {true, false}, els);
}
public static IEnumerable<IEnumerable<T>> FilterM<T>(
Func<T, IEnumerable<bool>> p,
IEnumerable<T> els)
{
var en = els.GetEnumerator();
if (!en.MoveNext())
{
yield return Enumerable.Empty<T>();
yield break;
}
T el = en.Current;
IEnumerable<T> tail = els.Skip(1);
foreach (var x in
from flg in p(el)
from ys in FilterM(p, tail)
select flg ? new[] { el }.Concat(ys) : ys)
{
yield return x;
}
}
And then you can use it like this:
foreach (IEnumerable<int> subset in PowerSet(new [] { 1, 2, 3, 4 }))
{
Console.WriteLine("'{0}'", string.Join(",", subset));
}
As you can see, neither int nor long are explicitly used anywhere in the implementation, so the real limit here is the maximum recursion depth reachable with the current stack size limit.
UPD: Rosetta Code gives a non-recursive implementation:
public static IEnumerable<IEnumerable<T>> GetPowerSet<T>(IEnumerable<T> input)
{
var seed = new List<IEnumerable<T>>() { Enumerable.Empty<T>() }
as IEnumerable<IEnumerable<T>>;
return input.Aggregate(seed, (a, b) =>
a.Concat(a.Select(x => x.Concat(new List<T> { b }))));
}

Related

Group List into Ranges By Specific Amount

Let's say I have a List of items in which look like this:
Number Amount
1 10
2 12
5 5
6 9
9 4
10 3
11 1
I need it so that the method takes in any number even as a decimal and use that number to group the list into ranges based on that number. So let's say my number was 1 the following output would be...
Ranges Total
1-2 22
5-6 14
9-11 8
Because it basically grouped the numbers that are 1 away from each other into ranges. What's the most efficient way I can convert my list to look like the output?
There are a couple of approaches to this. Either you can partition the data and then sum on the partitions, or you can roll the whole thing into a single method.
Since partitioning is based on the gaps between the Number values you won't be able to work on unordered lists. Building the partition list on the fly isn't going to work if the list isn't ordered, so make sure you sort the list on the partition field before you start.
Partitioning
Once the lists is ordered (or if it was pre-ordered) you can partition. I use this kind of extension method fairly often for breaking up ordered sequences into useful blocks, like when I need to grab sequences of entries from a log file.
public static partial class Ext
{
public static IEnumerable<T[]> PartitionStream<T>(this IEnumerable<T> source, Func<T, T, bool> partitioner)
{
var partition = new List<T>();
T prev = default;
foreach (var next in source)
{
if (partition.Count > 0 && !partitioner(prev, next))
{
new { p = partition.ToArray(), prev, next }.Dump();
yield return partition.ToArray();
partition.Clear();
}
partition.Add(prev = next);
}
if (partition.Count > 0)
yield return partition.ToArray();
}
}
The partitioner parameter compares two objects and returns true if they belong in the same partition. The extension method just collects all the members of the partition together and returns them as an array once it finds something for the next partition.
From there you can just do simple summing on the partition arrays:
var source = new (int n, int v)[] { (1,10),(2,12),(5,5),(6,9),(9,4),(10,3),(11,1) };
var maxDifference = 2;
var aggregate =
from part in source.PartitionStream((l, r) => (r.n - l.n) <= maxDifference)
let low = grp.Min(g => g.n)
let high = grp.Max(g => g.n)
select new { Ranges = $"{low}-{high}", Total = grp.Sum(g => g.v) };
This gives the same output as your example.
Stream Aggregation
The second option is both simpler and more efficient since it does barely any memory allocations. The downside - if you can call it that - is that it's a lot less generic.
Rather than partitioning and aggregating over the partitions, this just walks through the list and aggregates as it goes, spitting out results when the partitioning criteria is reached:
IEnumerable<(string Ranges, int Total)> GroupSum(IEnumerable<(int n, int v)> source, int maxDistance)
{
int low = int.MaxValue;
int high = 0;
int total = 0;
foreach (var (n, v) in source)
{
// check partition boundary
if (n < low || (n - high) > maxDistance)
{
if (n > low)
yield return ($"{low}-{high}", total);
low = high = n;
total = v;
}
else
{
high = n;
total += v;
}
}
if (total > 0)
yield return ($"{low}-{high}", total);
}
(Using ValueTuple so I don't have to declare types.)
Output is the same here, but with a lot less going on in the background to slow it down. No allocated arrays, etc.

faster algorithm or technique for building sparse arrays in c#

I have a matrix-building problem. To build the matrix (for a 3rd party package), I need to do it row-by-row by passing a double[] array to the 3rd-party object. Here's my problem: I have a list of objects that represent paths on a graph. Each object is a path with a 'source' property (string) and a 'destination' property (also string). I need to build a 1-dimensional array where all the elements are 0 except where the source property is equal to a given name. The given name will occur multiple times in the path list. Here's my function for building the sparse array:
static double[] GetNodeSrcRow3(string nodeName)
{
double[] r = new double[cpaths.Count ];
for (int i = 1; i < cpaths.Count; i++)
{
if (cpaths[i].src == nodeName) r[i] = 1;
}
return r;
}
Now I need to call this function about 200k times with different names. The function itself takes between 0.05 and 0.1 seconds (timed with Stopwatch). As you can imagine, if we take the best possible case of 0.05 seconds * 200k calls = 10,000 seconds = 2.7 hours which is too long. The object 'cpaths' contains about 200k objects.
Can someone think of a way to accomplish this in a faster way?
I can't see the rest of your code, but I suspect most of the time is spent allocating and garbage collecting all the arrays. Assuming the size of cpaths doesn't change, you can reuse the same array.
private static double[] NodeSourceRow == null;
private static List<int> LastSetIndices = new List<int>();
static double[] GetNodeSrcRow3(string nodeName) {
// create new array *only* on the first call
NodeSourceRow = NodeSourceRow ?? new double[cpaths.Count];
// reset all elements to 0
foreach(int i in LastSetIndices) NodeSourceRow[i] = 0;
LastSetIndices.Clear();
// set the 1s
for (int i = 1; i < cpaths.Count; i++) {
if (cpaths[i].src == nodeName) {
NodeSourceRow[i] = 1;
LastSetIndices.Add(i);
}
}
// tada!!
return NodeSourceRow;
}
One drawback potential drawback would be if you need all the arrays to used at the same time, they will always have identical contents. But if you only use one at a time, this should be much faster.
if cpaths is normal list then that's not suitable for your case. you need a dictionary of src to list of indexes. like Dictionary<string, List<int>>.
then you can fill sparse array with random access. I would also suggest you to use Sparse list implementation for efficient memory usage rather than using memory inefficient double[]. a good implementation is SparseAList. (written by David Piepgrass)
Before generating your sparse lists, you should convert your cpaths list into a suitable dictionary, this step may take a little long (up to few seconds), but after that you will generate your sparse lists super fast.
public static Dictionary<string, List<int>> _dictionary;
public static void CacheIndexes()
{
_dictionary = cpaths.Select((x, i) => new { index = i, value = x })
.GroupBy(x => x.value.src)
.ToDictionary(x => x.Key, x => x.Select(a => a.index).ToList());
}
you should call CacheIndexes before starting to generate your sparse arrays.
public static double[] GetNodeSrcRow3(string nodeName)
{
double[] r = new double[cpaths.Count];
List<int> indexes;
if(!_dictionary.TryGetValue(nodeName, out indexes)) return r;
foreach(var index in indexes) r[index] = 1;
return r;
}
Note that if you use SparseAList it will occupy very small space. for example if double array is 10K length and has only one index set in it, with SparseAList you will have virtually 10K items, but in fact there is only one item stored in memory. its not hard to use that collection, I suggest you to give it a try.
same code using SparseAList
public static SparseAList<double> GetNodeSrcRow3(string nodeName)
{
SparseAList<double> r = new SparseAList<double>();
r.InsertSpace(0, cpaths.Count); // allocates zero memory.
List<int> indexes;
if(!_dictionary.TryGetValue(nodeName, out indexes)) return r;
foreach(var index in indexes) r[index] = 1;
return r;
}
You could make use of multi-threading using the TPL's Parallel.For method.
static double[] GetNodeSrcRow3(string nodeName)
{
double[] r = new double[cpaths.Count];
Parallel.For(1, cpaths.Count, (i, state) =>
{
if (cpaths[i].src == nodeName) r[i] = 1;
});
return r;
}
Fantastic Answers!
If I may add some, to the already great examples:
System.Numerics.Tensors.SparseTensor<double> GetNodeSrcRow3(string text)
{
// A quick NuGet System.Numerics.Tensors Install:
System.Numerics.Tensors.SparseTensor<double> SparseTensor = new System.Numerics.Tensors.SparseTensor<double>(new int[] { cpaths.Count }, true, 1);
Parallel.For(1, cpaths.Count, (i, state) =>
{
if (cpaths[i].src == nodeName) SparseTensor[i] = 1.0D;
});
return SparseTensor;
}
System.Numerics is optimised hugely, also uses hardware acceleration. It is also Threadsafe. At least from what I have read about it.
For Speed and scalability, a small bit of code that could make all the difference.

LINQ query and sub-query enumeration count in C#?

suppose I have this query :
int[] Numbers= new int[5]{5,2,3,4,5};
var query = from a in Numbers
where a== Numbers.Max (n => n) //notice MAX he should also get his value somehow
select a;
foreach (var element in query)
Console.WriteLine (element);
How many times does Numbers is enumerated when running the foreach ?
how can I test it ( I mean , writing a code which tells me the number of iterations)
It will be iterated 6 times. Once for the Where and once per element for the Max.
The code to demonstrate this:
private static int count = 0;
public static IEnumerable<int> Regurgitate(IEnumerable<int> source)
{
count++;
Console.WriteLine("Iterated sequence {0} times", count);
foreach (int i in source)
yield return i;
}
int[] Numbers = new int[5] { 5, 2, 3, 4, 5 };
IEnumerable<int> sequence = Regurgitate(Numbers);
var query = from a in sequence
where a == sequence.Max(n => n)
select a;
It will print "Iterated sequence 6 times".
We could make a more general purpose wrapper that is more flexible, if you're planning to use this to experiment with other cases:
public class EnumerableWrapper<T> : IEnumerable<T>
{
private IEnumerable<T> source;
public EnumerableWrapper(IEnumerable<T> source)
{
this.source = source;
}
public int IterationsStarted { get; private set; }
public int NumMoveNexts { get; private set; }
public int IterationsFinished { get; private set; }
public IEnumerator<T> GetEnumerator()
{
IterationsStarted++;
foreach (T item in source)
{
NumMoveNexts++;
yield return item;
}
IterationsFinished++;
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public override string ToString()
{
return string.Format(
#"Iterations Started: {0}
Iterations Finished: {1}
Number of move next calls: {2}"
, IterationsStarted, IterationsFinished, NumMoveNexts);
}
}
This has several advantages over the other function:
It records both the number of iterations started, the number of iterations that were completed, and the total number of times all of the sequences were incremented.
You can create different instances to wrap different underlying sequences, thus allowing you to inspect multiple sequences per program, instead of just one when using a static variable.
Here is how you can estimate a quick count of the number of times the collection is enumerated: wrap your collection in a CountedEnum<T>, and increment counter on each yield return, like this --
static int counter = 0;
public static IEnumerable<T> CountedEnum<T>(IEnumerable<T> ee) {
foreach (var e in ee) {
counter++;
yield return e;
}
}
Then change your array declaration to this,
var Numbers= CountedEnum(new int[5]{5,2,3,4,5});
run your query, and print the counter. For your query, the code prints 30 (link to ideone), meaning that your collection of five items has been enumerated six times.
Here is how you can check the count
void Main()
{
var Numbers= new int[5]{5,2,3,4,5}.Select(n=>
{
Console.Write(n);
return n;
});
var query = from a in Numbers
where a== Numbers.Max (n => n)
select a;
foreach (var element in query)
{
var v = element;
}
}
Here is output
5 5 2 3 4 5 2 5 2 3 4 5 3 5 2 3 4 5 4 5 2 3 4 5 5 5 2 3 4 5
The number of iteration has to be equal to query.Count().
So to the count of the elements in the result of the first query.
If you're asking about something else, please clarify.
EDIT
After clarification:
if you're searching for total count of the iteration in the code provided, there will be 7 iterations (for this concrete case).
var query = from a in Numbers
where a== Numbers.Max (n => n) //5 iterations to find MAX among 5 elements
select a;
and
foreach (var element in query)
Console.WriteLine (element); //2 iterations over resulting collection(in this question)
How many times does Numbers is enumerated when running the foreach
Loosely speaking, your code is morally equivalent to:
foreach(int a in Numbers)
{
// 1. I've gotten rid of the unnecessary identity lambda.
// 2. Note that Max works by enumerating the entire source.
var max = Numbers.Max();
if(a == max)
Console.WriteLine(a);
}
So we enumerate the following times:
One enumeration of the sequence for the outer loop (1).
One enumeration of the sequence for each of its members (Count).
So in total, we enumerate Count + 1 times.
You could bring this down to 2 by hoisting the Max query outside the loop by introducing a local.
how can I test it ( I mean , writing a code which tells me the number
of iterations)
This wouldn't be easy with a raw array. But you could write your own enumerable implementation (that perhaps wrapped an array) and add some instrumentation to the GetEnumerator method. Or if you want to go deeper, go the whole hog and write a custom enumerator with instrumentation on MoveNext and Current as well.
Count via public property also yields 6.
private static int ncount = 0;
private int[] numbers= new int[5]{5,2,3,4,5};
public int[] Numbers
{
get
{
ncount++;
Debug.WriteLine("Numbers Get " + ncount.ToString());
return numbers;
}
}
This brings the count down to 2.
Makes sense but I would not have thought of it.
int nmax = Numbers.Max(n => n);
var query = from a in Numbers
where a == nmax //notice MAX he should also get his value somehow
//where a == Numbers.Max(n => n) //notice MAX he should also get his value somehow
select a;
It will be iterated 6 times. Once for the Where and once per element for the Max.
Define and initialize a count variable outside the foreach loop and increment the count variable as count++ inside the loop to get the number of times of enumeration.

Optimizing this C# algorithm (K Difference)

This is the problem I'm solving (it's a sample problem, not a real problem):
Given N numbers , [N<=10^5] we need to count the total pairs of
numbers that have a difference of K. [K>0 and K<1e9]
Input Format: 1st line contains N & K (integers). 2nd line contains N
numbers of the set. All the N numbers are assured to be distinct.
Output Format: One integer saying the no of pairs of numbers that have
a diff K.
Sample Input #00:
5 2
1 5 3 4 2
Sample Output #00:
3
Sample Input #01:
10 1
363374326 364147530 61825163 1073065718 1281246024 1399469912 428047635 491595254 879792181 1069262793
Sample Output #01:
0
I already have a solution (and I haven't been able to optimize it as well as I had hoped). Currently my solution gets a score of 12/15 when it is run, and I'm wondering why I can't get 15/15 (my solution to another problem wasn't nearly as efficient, but got all of the points). Apparently, the code is run using "Mono 2.10.1, C# 4".
So can anyone think of a better way to optimize this further? The VS profiler says to avoid calling String.Split and Int32.Parse. The calls to Int32.Parse can't be avoided, although I guess I could optimize tokenizing the array.
My current solution:
using System;
using System.Collections.Generic;
using System.Text;
using System.Linq;
namespace KDifference
{
class Solution
{
static void Main(string[] args)
{
char[] space = { ' ' };
string[] NK = Console.ReadLine().Split(space);
int N = Int32.Parse(NK[0]), K = Int32.Parse(NK[1]);
int[] nums = Console.ReadLine().Split(space, N).Select(x => Int32.Parse(x)).OrderBy(x => x).ToArray();
int KHits = 0;
for (int i = nums.Length - 1, j, k; i >= 1; i--)
{
for (j = 0; j < i; j++)
{
k = nums[i] - nums[j];
if (k == K)
{
KHits++;
}
else if (k < K)
{
break;
}
}
}
Console.Write(KHits);
}
}
}
Your algorithm is still O(n^2), even with the sorting and the early-out. And even if you eliminated the O(n^2) bit, the sort is still O(n lg n). You can use an O(n) algorithm to solve this problem. Here's one way to do it:
Suppose the set you have is S1 = { 1, 7, 4, 6, 3 } and the difference is 2.
Construct the set S2 = { 1 + 2, 7 + 2, 4 + 2, 6 + 2, 3 + 2 } = { 3, 9, 6, 8, 5 }.
The answer you seek is the cardinality of the intersection of S1 and S2. The intersection is {6, 3}, which has two elements, so the answer is 2.
You can implement this solution in a single line of code, provided that you have sequence of integers sequence, and integer difference:
int result = sequence.Intersect(from item in sequence select item + difference).Count();
The Intersect method will build an efficient hash table for you that is O(n) to determine the intersection.
Try this (note, untested):
Sort the array
Start two indexes at 0
If difference between the numbers at those two positions is equal to K, increase count, and increase one of the two indexes (if numbers aren't duplicated, increase both)
If difference is larger than K, increase index #1
If difference is less than K, increase index #2, if that would place it outside the array, you're done
Otherwise, go back to 3 and keep going
Basically, try to keep the two indexes apart by K value difference.
You should write up a series of unit-tests for your algorithm, and try to come up with edge cases.
This would allow you to do it in a single pass. Using hash sets is beneficial if there are many values to parse/check. You might also want to use a bloom filter in combination with hash sets to reduce lookups.
Initialize. Let A and B be two empty hash sets. Let c be zero.
Parse loop. Parse the next value v. If there are no more values the algorithm is done and the result is in c.
Back check. If v exists in A then increment c and jump back to 2.
Low match. If v - K > 0 then:
insert v - K into A
if v - K exists in B then increment c (and optionally remove v - K from B).
High match. If v + K < 1e9 then:
insert v + K into A
if v + K exists in B then increment c (and optionally remove v + K from B).
Remember. Insert v into B.
Jump back to 2.
// php solution for this k difference
function getEqualSumSubstring($l,$s) {
$s = str_replace(' ','',$s);
$l = str_replace(' ','',$l);
for($i=0;$i<strlen($s);$i++)
{
$array1[] = $s[$i];
}
for($i=0;$i<strlen($s);$i++)
{
$array2[] = $s[$i] + $l[1];
}
return count(array_intersect($array1,$array2));
}
echo getEqualSumSubstring("5 2","1 3 5 4 2");
Actually that's trivially to solve with a hashmap:
First put each number into a hashmap: dict((x, x) for x in numbers) in "pythony" pseudo code ;)
Now you just iterate through every number in the hashmap and check if number + K is in the hashmap. If yes, increase count by one.
The obvious improvement to the naive solution is to ONLY check for the higher (or lower) bound, otherwise you get the double results and have to divide by 2 afterwards - useless.
This is O(N) for creating the hashmap when reading the values in and O(N) when iterating through, i.e. O(N) and about 8loc in python (and it is correct, I just solved it ;-) )
Following Eric's answer, paste the implementation of Interscet method below, it is O(n):
private static IEnumerable<TSource> IntersectIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource current in second)
{
set.Add(current);
}
foreach (TSource current2 in first)
{
if (set.Remove(current2))
{
yield return current2;
}
}
yield break;
}

How do I find the closest array element to an arbitrary (non-member) number?

Seemingly similar questions: "Finding closest number in an array" (in Java) and "find nearest match to array of doubles" (actually a geography problem).
I have a (sorted) array of doubles. Given an arbitrary number (which may or may not be an exact match for one of the array elements), how can I return the index of the number which is the closest match?
For example, using the following array:
1.8
2.4
2.7
3.1
4.5
Querying 2.5 would return with an index of 1, corresponding to the value of 2.4.
Bonus points for detecting values that lie completely outside of the range of the array elements. For example, using the array listed above, your code may decide that 4.6 is in, but 5.9 is out. If you want to try this part of the question, the specifics are in your hands.
Array.BinarySearch, which returns:
The index of the specified value in the specified array, if value is found. If value is not found and value is less than one or more elements in array, a negative number which is the bitwise complement of the index of the first element that is larger than value. If value is not found and value is greater than any of the elements in array, a negative number which is the bitwise complement of (the index of the last element plus 1).
Now that won't get you 100% of the way there, since you'll know the number is either less than or greater than the match, but it really only leaves you with two indices to check.
One way to do this using LINQ is like this:
public int GetClosestIndex( List<double> doublelist, double targetvalue )
{
return doublelist.IndexOf(doublelist.OrderBy(d => Math.Abs(d - targetvalue)).ElementAt(0));
}
It might have some performance issues, but If the list is not that long, it should not pose a problem. Also, if two elements are equally distant from the target value, it will return the first index of those.
Perhaps not the fastest solution, but certainly pleasant eye-candy:
double search;
double[] array;
var nearest = (
from value in array
orderby Math.Abs(value - search)
select value).First();
var index = array.IndexOf(nearest);
Note that this will absolutely be slower than a binary search algorithm, because it need to process each element in the array and sorting means building a hash table of those items.
Something like this:
double[] values = new double[]
{
1.8,
2.4,
2.7,
3.1,
4.5
};
double difference = double.PositiveInfinity;
int index = -1;
double input = 2.5;
for (int i = 0; i < values.Length; i++)
{
double currentDifference = Math.Abs(values[i] - input);
if (currentDifference < difference)
{
difference = currentDifference;
index = i;
}
// Stop searching when we've encountered a value larger
// than the inpt because the values array is sorted.
if (values[i] > input)
break;
}
Console.WriteLine("Found index: {0} value {1}", index, values[index]);
List<int> results;
int target = 0;
int nearestValue = 0;
if (results.Any(ab => ab == target)) {
nearestValue= results.FirstOrDefault<int>(i => i == target);
} else {
int greaterThanTarget = 0;
int lessThanTarget = 0;
if (results.Any(ab => ab > target) {
greaterThanTarget = results.Where<int>(i => i > target).Min();
}
if (results.Any(ab => ab < target)) {
lessThanTarget = results.Where<int>(i => i < target).Max();
}
if (lessThanTarget == 0 ) {
nearestValue= greaterThanTarget;
} else if (greaterThanTarget == 0) {
nearestValue = lessThanTarget;
} else {
if (target - lessThanTarget < greaterThanTarget - target) {
nearestValue = lessThanTarget;
} else {
nearestValue = greaterThanTarget;
}
}
}

Categories