Find Strings with certain Hamming distance LINQ - c#

If we run the following (thanks to #octavioccl for help) LINQ Query:
var result = stringsList
.GroupBy(s => s)
.Where(g => g.Count() > 1)
.OrderByDescending(g => g.Count())
.Select(g => g.Key);
It gives us all the strings which occur in the list atleast twice (but exactly matched i.e. Hamming Distance =0).
I was just wondering if there is an elegant solution (all solutions I have tried so far either use loops and a counter which is ugly or regex) possible where we can specify the hamming distance in the Where clause to get those strings as well which lie within the specified Hamming Distance range?
P.S: All the strings are of equal length
UPDATE
Really thanks to krontogiannis for his detailed answer. As I mentioned earlier, I want to get list of strings with hamming distance below the given threshold. His code is working perfectly fine for it (Thanks again).
Only thing remaining is to take the strings out of the 'resultset' and insert/add into a `List'
Basically this is what I want:
List<string> outputList = new List<string>();
foreach (string str in patternsList)
{
var rs = wordsList
.GroupBy(w => hamming(w, str))
.Where(h => h.Key <= hammingThreshold)
.OrderByDescending(h => h.Key)
.Select(h => h.Count());
outputList.Add(rs); //I know it won't work but just to show what is needed
}
Thanks

Calculating the hamming distance between two strings using LINQ can be done in an elegant way:
Func<string, string, int> hamming = (s1, s2) => s1.Zip(s2, (l, r) => l - r == 0 ? 0 : 1).Sum();
You question is a bit vague about the 'grouping'. As you can see to calculate the hamming distance you need two strings. So you either need to calculate the hamming distance for all the words in your string list vs an input, or calculate the distance between all for the words in your list (or something different that you need to tell us :-) ).
In any way i'll give two examples for input
var words = new[] {
"hello",
"rellp",
"holla",
"fooba",
"hempd"
};
Case 1
var input = "hello";
var hammingThreshold = 3;
var rs = words
.GroupBy(w => hamming(w, input))
.Where(h => h.Key <= hammingThreshold)
.OrderByDescending(h => h.Key);
Output would be something like
hempd with distance 3
rellp holla with distance 2
hello with distance 0
Case 2
var hs = words
.SelectMany((w1, i) =>
words
.Where((w2, j) => i > j)
.Select(w2 => new { Word1 = w1, Word2 = w2 })) // all word pairs except with self
.GroupBy(pair => hamming(pair.Word1, pair.Word2))
.Where(g => g.Key <= hammingThreshold)
.OrderByDescending(g => g.Key);
Output would be something like
(holla, rellp) (fooba, holla) (hempd, hello) with distance 3
(rellp, hello) (holla, hello) with distance 2
Edit To get only the words from the first grouping you can use SelectMany
var output = rs.SelectMany(g => g).ToList();

OP asked for Hamming distance, my algorithm uses Levenshtein distance algorithm. But the code is easily transformable.
namespace Program
{
public static class Utils
{
public static string LongestCommonSubstring(this IEnumerable<string> arr)
{
// Determine size of the array
var n = arr.Count();
// Take first word from array as reference
var s = arr.ElementAt(0);
var len = s.Length;
var res = "";
for (var i = 0; i < len; i++)
{
for (var j = i + 1; j <= len; j++)
{
// generating all possible substrings
// of our reference string arr[0] i.e s
var stem = s.Substring(i, j - i);
var k = 1;
//for (k = 1; k < n; k++) {
foreach (var item in arr.Skip(1))
{
// Check if the generated stem is
// common to all words
if (!item.Contains(stem))
break;
++k;
}
// If current substring is present in
// all strings and its length is greater
// than current result
if (k == n && res.Length < stem.Length)
res = stem;
}
}
return res;
}
public static HashSet<string> GetShortestGroupedString(this HashSet<string> items, int distanceThreshold = 3, int minimumStringLength = 2)
{
var cluster = new Dictionary<int, List<Tuple<string, string>>>();
var clusterGroups = new HashSet<string>();
var itemCount = items.Count * items.Count;
int k = 0;
var first = items.First();
var added = "";
foreach (var item in items)
//Parallel.ForEach(merged, item => // TODO
{
var computed2 = new List<string>();
foreach (var item2 in items)
{
var distance = LevenshteinDistance.Compute(item, item2);
var firstDistance = LevenshteinDistance.Compute(first, item2);
if (!cluster.ContainsKey(distance)) // TODO: check false
cluster.Add(distance, new List<Tuple<string, string>>());
if (distance > distanceThreshold)
{
++k;
continue;
}
cluster[distance].Add(new Tuple<string, string>(item, item2));
if (firstDistance > distance)
{
var computed = new List<string>();
foreach (var kv in cluster)
{
if (kv.Value.Count == 0) continue;
var longest = kv.Value.Select(dd => dd.Item1).LongestCommonSubstring();
if (string.IsNullOrEmpty(longest)) continue;
computed.Add(longest);
}
var currentAdded = computed.OrderBy(s => s.Length).FirstOrDefault();
var diff = string.IsNullOrEmpty(added) || string.IsNullOrEmpty(currentAdded)
? string.Empty
: currentAdded.Replace(added, string.Empty);
if (!string.IsNullOrEmpty(currentAdded) && diff.Length == currentAdded.Length)
{
var ff = computed2.OrderBy(s => s.Length).FirstOrDefault();
if (ff.Length >= minimumStringLength)
clusterGroups.Add(ff);
computed2.Clear(); // TODO: check false
computed2.Add(diff);
}
else
{
if (diff.Length == 0 && !string.IsNullOrEmpty(added) && !string.IsNullOrEmpty(currentAdded))
computed2.Add(diff);
}
added = currentAdded;
cluster.Clear();
first = item;
}
++k;
}
var f = computed2.OrderBy(s => s.Length).FirstOrDefault();
if (f.Length >= minimumStringLength)
clusterGroups.Add(f);
}
//});
return clusterGroups;
}
}
/// <summary>
/// Contains approximate string matching
/// </summary>
internal static class LevenshteinDistance
{
/// <summary>
/// Compute the distance between two strings.
/// </summary>
public static int Compute(string s, string t)
{
var n = s.Length;
var m = t.Length;
var d = new int[n + 1, m + 1];
// Step 1
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
// Step 2
for (var i = 0; i <= n; d[i, 0] = i++)
{
}
for (var j = 0; j <= m; d[0, j] = j++)
{
}
// Step 3
for (var i = 1; i <= n; i++)
{
//Step 4
for (var j = 1; j <= m; j++)
{
// Step 5
var cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
// Step 6
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
}
}
The code has two references:
The LevenshteinDistance class was extracted from: https://stackoverflow.com/a/2344347/3286975
The LongestCommonString method was extracted from: https://www.geeksforgeeks.org/longest-common-substring-array-strings/
My code is being reviewed at https://codereview.stackexchange.com/questions/272379/get-shortest-grouped-distinct-string-from-hashset-of-strings so I expect improvements on it.

You could do something like this:
int hammingDistance = 2;
var result = stringsList
.GroupBy(s => s.Substring(0, s.Length - hammingDistance))
.Where(g => g.Count() > 1)
.OrderbyDescending(g => g.Count())
.Select(g => g.Key);

Related

How to find all possible strings of a certain length using only the chars out of an array [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Listing all permutations of a string/integer
For example,
aaa .. aaz .. aba .. abz .. aca .. acz .. azz .. baa .. baz .. bba .. bbz .. zzz
Basically, imagine counting binary but instead of going from 0 to 1, it goes from a to z.
I have been trying to get this working to no avail and the formula is getting quite complex. I'm not sure if there's a simpler way to do it.
Edit
I have something like this at the moment but it's not quite there and I'm not sure if there is a better way:
private IEnumerable<string> GetWordsOfLength(int length)
{
char letterA = 'a', letterZ = 'z';
StringBuilder currentLetters = new StringBuilder(new string(letterA, length));
StringBuilder endingLetters = new StringBuilder(new string(letterZ, length));
int currentIndex = length - 1;
while (currentLetters.ToString() != endingLetters.ToString())
{
yield return currentLetters.ToString();
for (int i = length - 1; i > 0; i--)
{
if (currentLetters[i] == letterZ)
{
for (int j = i; j < length; j++)
{
currentLetters[j] = letterA;
}
if (currentLetters[i - 1] != letterZ)
{
currentLetters[i - 1]++;
}
}
else
{
currentLetters[i]++;
break;
}
}
}
}
For a variable amount of letter combinations, you can do the following:
var alphabet = "abcdefghijklmnopqrstuvwxyz";
var q = alphabet.Select(x => x.ToString());
int size = 4;
for (int i = 0; i < size - 1; i++)
q = q.SelectMany(x => alphabet, (x, y) => x + y);
foreach (var item in q)
Console.WriteLine(item);
var alphabet = "abcdefghijklmnopqrstuvwxyz";
//or var alphabet = Enumerable.Range('a', 'z' - 'a' + 1).Select(i => (char)i);
var query = from a in alphabet
from b in alphabet
from c in alphabet
select "" + a + b + c;
foreach (var item in query)
{
Console.WriteLine(item);
}
__EDIT__
For a general solution, you can use the CartesianProduct here
int N = 4;
var result = Enumerable.Range(0, N).Select(_ => alphabet).CartesianProduct();
foreach (var item in result)
{
Console.WriteLine(String.Join("",item));
}
// Eric Lippert’s Blog
// Computing a Cartesian Product with LINQ
// http://blogs.msdn.com/b/ericlippert/archive/2010/06/28/computing-a-cartesian-product-with-linq.aspx
public static IEnumerable<IEnumerable<T>> CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)
{
// base case:
IEnumerable<IEnumerable<T>> result = new[] { Enumerable.Empty<T>() };
foreach (var sequence in sequences)
{
var s = sequence; // don't close over the loop variable
// recursive case: use SelectMany to build the new product out of the old one
result =
from seq in result
from item in s
select seq.Concat(new[] { item });
}
return result;
}
You have 26^3 counts for 3 "digits". Just iterate from 'a' to 'z' in three loops.
Here's a very simple solution:
for(char first = 'a'; first <= (int)'z'; first++)
for(char second = 'a'; second <= (int)'z'; second++)
for(char third = 'a'; third <= (int)'z'; third++)
Console.WriteLine(first.ToString() + second + third);

Finding unique letter combinations within a word [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Listing all permutations of a string/integer
For example,
aaa .. aaz .. aba .. abz .. aca .. acz .. azz .. baa .. baz .. bba .. bbz .. zzz
Basically, imagine counting binary but instead of going from 0 to 1, it goes from a to z.
I have been trying to get this working to no avail and the formula is getting quite complex. I'm not sure if there's a simpler way to do it.
Edit
I have something like this at the moment but it's not quite there and I'm not sure if there is a better way:
private IEnumerable<string> GetWordsOfLength(int length)
{
char letterA = 'a', letterZ = 'z';
StringBuilder currentLetters = new StringBuilder(new string(letterA, length));
StringBuilder endingLetters = new StringBuilder(new string(letterZ, length));
int currentIndex = length - 1;
while (currentLetters.ToString() != endingLetters.ToString())
{
yield return currentLetters.ToString();
for (int i = length - 1; i > 0; i--)
{
if (currentLetters[i] == letterZ)
{
for (int j = i; j < length; j++)
{
currentLetters[j] = letterA;
}
if (currentLetters[i - 1] != letterZ)
{
currentLetters[i - 1]++;
}
}
else
{
currentLetters[i]++;
break;
}
}
}
}
For a variable amount of letter combinations, you can do the following:
var alphabet = "abcdefghijklmnopqrstuvwxyz";
var q = alphabet.Select(x => x.ToString());
int size = 4;
for (int i = 0; i < size - 1; i++)
q = q.SelectMany(x => alphabet, (x, y) => x + y);
foreach (var item in q)
Console.WriteLine(item);
var alphabet = "abcdefghijklmnopqrstuvwxyz";
//or var alphabet = Enumerable.Range('a', 'z' - 'a' + 1).Select(i => (char)i);
var query = from a in alphabet
from b in alphabet
from c in alphabet
select "" + a + b + c;
foreach (var item in query)
{
Console.WriteLine(item);
}
__EDIT__
For a general solution, you can use the CartesianProduct here
int N = 4;
var result = Enumerable.Range(0, N).Select(_ => alphabet).CartesianProduct();
foreach (var item in result)
{
Console.WriteLine(String.Join("",item));
}
// Eric Lippert’s Blog
// Computing a Cartesian Product with LINQ
// http://blogs.msdn.com/b/ericlippert/archive/2010/06/28/computing-a-cartesian-product-with-linq.aspx
public static IEnumerable<IEnumerable<T>> CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)
{
// base case:
IEnumerable<IEnumerable<T>> result = new[] { Enumerable.Empty<T>() };
foreach (var sequence in sequences)
{
var s = sequence; // don't close over the loop variable
// recursive case: use SelectMany to build the new product out of the old one
result =
from seq in result
from item in s
select seq.Concat(new[] { item });
}
return result;
}
You have 26^3 counts for 3 "digits". Just iterate from 'a' to 'z' in three loops.
Here's a very simple solution:
for(char first = 'a'; first <= (int)'z'; first++)
for(char second = 'a'; second <= (int)'z'; second++)
for(char third = 'a'; third <= (int)'z'; third++)
Console.WriteLine(first.ToString() + second + third);

Which iterator(other than for, foreach) can be used to count the number of character in a string?

I dont know what iteration method to be used for more efficiency, Here i have listed my solution which i have tried. is there any other way to iterate, i mean any special methods or ways?
Method One :
Here i have used two for loops so the iteration goes for 2N times
public void CountChar()
{
String s = Ipstring();
int[] counts = new int[256];
char[] c = s.ToCharArray();
for (int i = 0; i < c.Length; ++i)
{
counts[c[i]]++;
}
for (int i = 0; i < c.Length; i++)
{
Console.WriteLine(c[i].ToString() + " " + counts[c[i]]);
Console.WriteLine();
}
}
Method 2 :
public void CountChar()
{
_inputWord = Ipstring();
char[] test = _inputWord.ToCharArray();
char temp;
int count = 0, tcount = 0;
Array.Sort(test);
int length = test.Length;
temp = test[0];
while (length > 0)
{
for (int i = 0; i < test.Length; i++)
{
if (temp == test[i])
{
count++;
}
}
Console.WriteLine(temp + " " + count);
tcount = tcount + count;
length = length - count;
count = 0;
if (tcount != test.Length)
temp = test[tcount];
//atchutharam. aaachhmrttu
}
}
Method three:
public void CountChar()
{
int indexcount = 0;
s = Ipstring();
int[] count = new int[s.Length];
foreach (char c in s)
{
Console.Write(c);
count[s.IndexOf(c)]++;
}
foreach (char c in s)
{
if (indexcount <= s.IndexOf(c))
{
Console.WriteLine(c);
Console.WriteLine(count[s.IndexOf(c)]);
Console.WriteLine("");
}
indexcount++;
////atchutharam
}
}
You can use LINQ methods to group the characters and count them:
public void CountChar() {
String s = Ipstring();
foreach (var g in s.GroupBy(c => c)) {
Console.WriteLine("{0} : {1}", g.Key, g.Count());
}
}
Your loops are not nested so your complexity is not N*N (O(n^2)) but 2*N which gives O(N) because you can always ignore constants :
for(){}
for(){} // O(2N) = O(N)
for()
{
for(){}
} // O(N*N) = O(N^2)
If you really want to know which one of these 3 solutions have the fastest execution time in a specific environment, do a benchmark.
If you want the one that is the most clean and readable (And you should almost always aim for that), just use LINQ :
String s = Ipstring();
int count = s.Count();
It will execute in O(N) too.
If you need the results in arrays:
var groups = s.GroupBy(i => i ).OrderBy( g => g.Key );
var chars = groups.Select(g => g.Key).ToArray();
var counts = groups.Select(g => g.Count()).ToArray();
Otherwise:
var dict = s.GroupBy(i => i).ToDictionary(g => g.Key, g => g.Count());
foreach (var g in dict)
{
Console.WriteLine( "{0}: {1}", g.Key, g.Value );
}

How to get the most common value in an Int array? (C#)

How to get the most common value in an Int array using C#
eg: Array has the following values: 1, 1, 1, 2
Ans should be 1
var query = (from item in array
group item by item into g
orderby g.Count() descending
select new { Item = g.Key, Count = g.Count() }).First();
For just the value and not the count, you can do
var query = (from item in array
group item by item into g
orderby g.Count() descending
select g.Key).First();
Lambda version on the second:
var query = array.GroupBy(item => item).OrderByDescending(g => g.Count()).Select(g => g.Key).First();
Some old fashioned efficient looping:
var cnt = new Dictionary<int, int>();
foreach (int value in theArray) {
if (cnt.ContainsKey(value)) {
cnt[value]++;
} else {
cnt.Add(value, 1);
}
}
int mostCommonValue = 0;
int highestCount = 0;
foreach (KeyValuePair<int, int> pair in cnt) {
if (pair.Value > highestCount) {
mostCommonValue = pair.Key;
highestCount = pair.Value;
}
}
Now mostCommonValue contains the most common value, and highestCount contains how many times it occured.
I know this post is old, but someone asked me the inverse of this question today.
LINQ Grouping
sourceArray.GroupBy(value => value).OrderByDescending(group => group.Count()).First().First();
Temp Collection, similar to Guffa's:
var counts = new Dictionary<int, int>();
foreach (var i in sourceArray)
{
if (!counts.ContainsKey(i)) { counts.Add(i, 0); }
counts[i]++;
}
return counts.OrderByDescending(kv => kv.Value).First().Key;
public static int get_occure(int[] a)
{
int[] arr = a;
int c = 1, maxcount = 1, maxvalue = 0;
int result = 0;
for (int i = 0; i < arr.Length; i++)
{
maxvalue = arr[i];
for (int j = 0; j <arr.Length; j++)
{
if (maxvalue == arr[j] && j != i)
{
c++;
if (c > maxcount)
{
maxcount = c;
result = arr[i];
}
}
else
{
c=1;
}
}
}
return result;
}
Maybe O(n log n), but fast:
sort the array a[n]
// assuming n > 0
int iBest = -1; // index of first number in most popular subset
int nBest = -1; // popularity of most popular number
// for each subset of numbers
for(int i = 0; i < n; ){
int ii = i; // ii = index of first number in subset
int nn = 0; // nn = count of numbers in subset
// for each number in subset, count it
for (; i < n && a[i]==a[ii]; i++, nn++ ){}
// if the subset has more numbers than the best so far
// remember it as the new best
if (nBest < nn){nBest = nn; iBest = ii;}
}
// print the most popular value and how popular it is
print a[iBest], nBest
Yet another solution with linq:
static int[] GetMostCommonIntegers(int[] nums)
{
return nums
.ToLookup(n => n)
.ToLookup(l => l.Count(), l => l.Key)
.OrderBy(l => l.Key)
.Last()
.ToArray();
}
This solution can handle case when several numbers have the same number of occurences:
[1,4,5,7,1] => [1]
[1,1,2,2,3,4,5] => [1,2]
[6,6,6,2,2,1] => [6]

How to get all subsets of an array?

Given an array: [dog, cat, mouse]
what is the most elegant way to create:
[,,]
[,,mouse]
[,cat,]
[,cat,mouse]
[dog,,]
[dog,,mouse]
[dog,cat,]
[dog,cat,mouse]
I need this to work for any sized array.
This is essentially a binary counter, where array indices represent bits. This presumably lets me use some bitwise operation to count, but I can't see a nice way of translating this to array indices though.
Elegant? Why not Linq it.
public static IEnumerable<IEnumerable<T>> SubSetsOf<T>(IEnumerable<T> source)
{
if (!source.Any())
return Enumerable.Repeat(Enumerable.Empty<T>(), 1);
var element = source.Take(1);
var haveNots = SubSetsOf(source.Skip(1));
var haves = haveNots.Select(set => element.Concat(set));
return haves.Concat(haveNots);
}
string[] source = new string[] { "dog", "cat", "mouse" };
for (int i = 0; i < Math.Pow(2, source.Length); i++)
{
string[] combination = new string[source.Length];
for (int j = 0; j < source.Length; j++)
{
if ((i & (1 << (source.Length - j - 1))) != 0)
{
combination[j] = source[j];
}
}
Console.WriteLine("[{0}, {1}, {2}]", combination[0], combination[1], combination[2]);
}
You can use the BitArray class to easily access the bits in a number:
string[] animals = { "Dog", "Cat", "Mouse" };
List<string[]> result = new List<string[]>();
int cnt = 1 << animals.Length;
for (int i = 0; i < cnt; i++) {
string[] item = new string[animals.Length];
BitArray b = new BitArray(i);
for (int j = 0; j < item.Length; j++) {
item[j] = b[j] ? animals[j] : null;
}
result.Add(item);
}
static IEnumerable<IEnumerable<T>> GetSubsets<T>(IList<T> set)
{
var state = new BitArray(set.Count);
do
yield return Enumerable.Range(0, state.Count)
.Select(i => state[i] ? set[i] : default(T));
while (Increment(state));
}
static bool Increment(BitArray flags)
{
int x = flags.Count - 1;
while (x >= 0 && flags[x]) flags[x--] = false ;
if (x >= 0) flags[x] = true;
return x >= 0;
}
Usage:
foreach(var strings in GetSubsets(new[] { "dog", "cat", "mouse" }))
Console.WriteLine(string.Join(", ", strings.ToArray()));
Guffa's answer had the basic functionality that I was searching, however the line with
BitArray b = new BitArray(i);
did not work for me, it gave an ArgumentOutOfRangeException. Here's my slightly adjusted and working code:
string[] array = { "A", "B", "C","D" };
int count = 1 << array.Length; // 2^n
for (int i = 0; i < count; i++)
{
string[] items = new string[array.Length];
BitArray b = new BitArray(BitConverter.GetBytes(i));
for (int bit = 0; bit < array.Length; bit++) {
items[bit] = b[bit] ? array[bit] : "";
}
Console.WriteLine(String.Join("",items));
}
Here's a solution similar to David B's method, but perhaps more suitable if it's really a requirement that you get back sets with the original number of elements (even if empty):.
static public List<List<T>> GetSubsets<T>(IEnumerable<T> originalList)
{
if (originalList.Count() == 0)
return new List<List<T>>() { new List<T>() };
var setsFound = new List<List<T>>();
foreach (var list in GetSubsets(originalList.Skip(1)))
{
setsFound.Add(originalList.Take(1).Concat(list).ToList());
setsFound.Add(new List<T>() { default(T) }.Concat(list).ToList());
}
return setsFound;
}
If you pass in a list of three strings, you'll get back eight lists with three elements each (but some elements will be null).
Here's an easy-to-follow solution along the lines of your conception:
private static void Test()
{
string[] test = new string[3] { "dog", "cat", "mouse" };
foreach (var x in Subsets(test))
Console.WriteLine("[{0}]", string.Join(",", x));
}
public static IEnumerable<T[]> Subsets<T>(T[] source)
{
int max = 1 << source.Length;
for (int i = 0; i < max; i++)
{
T[] combination = new T[source.Length];
for (int j = 0; j < source.Length; j++)
{
int tailIndex = source.Length - j - 1;
combination[tailIndex] =
((i & (1 << j)) != 0) ? source[tailIndex] : default(T);
}
yield return combination;
}
}
This is a small change to Mehrdad's solution above:
static IEnumerable<T[]> GetSubsets<T>(T[] set) {
bool[] state = new bool[set.Length+1];
for (int x; !state[set.Length]; state[x] = true ) {
yield return Enumerable.Range(0, state.Length)
.Where(i => state[i])
.Select(i => set[i])
.ToArray();
for (x = 0; state[x]; state[x++] = false);
}
}
or with pointers
static IEnumerable<T[]> GetSubsets<T>(T[] set) {
bool[] state = new bool[set.Length+1];
for (bool *x; !state[set.Length]; *x = true ) {
yield return Enumerable.Range(0, state.Length)
.Where(i => state[i])
.Select(i => set[i])
.ToArray();
for (x = state; *x; *x++ = false);
}
}
I'm not very familiar with C# but I'm sure there's something like:
// input: Array A
foreach S in AllSubsetsOf1ToN(A.Length):
print (S.toArray().map(lambda x |> A[x]));
Ok, I've been told the answer above won't work. If you value elegance over efficiency, I would try recursion, in my crappy pseudocode:
Array_Of_Sets subsets(Array a)
{
if (a.length == 0)
return [new Set();] // emptyset
return subsets(a[1:]) + subsets(a[1:]) . map(lambda x |> x.add a[0])
}
Here is a variant of mqp's answer, that uses as state a BigInteger instead of an int, to avoid overflow for collections containing more than 30 elements:
using System.Numerics;
public static IEnumerable<IEnumerable<T>> GetSubsets<T>(IList<T> source)
{
BigInteger combinations = BigInteger.One << source.Count;
for (BigInteger i = 0; i < combinations; i++)
{
yield return Enumerable.Range(0, source.Count)
.Select(j => (i & (BigInteger.One << j)) != 0 ? source[j] : default);
}
}
Easy to understand version (with descriptions)
I assumed that source = {1,2,3,4}
public static IEnumerable<IEnumerable<T>> GetSubSets<T>(IEnumerable<T> source)
{
var result = new List<IEnumerable<T>>() { new List<T>() }; // empty cluster added
for (int i = 0; i < source.Count(); i++)
{
var elem = source.Skip(i).Take(1);
// for elem = 2
// and currently result = [ [],[1] ]
var matchUps = result.Select(x => x.Concat(elem));
//then matchUps => [ [2],[1,2] ]
result = result.Concat(matchUps).ToList();
// matchUps and result concat operation
// finally result = [ [],[1],[2],[1,2] ]
}
return result;
}
The way this is written, it is more of a Product (Cartesian product) rather than a list of all subsets.
You have three sets: (Empty,"dog"), (Empty,"cat"),(Empty,"mouse").
There are several posts on general solutions for products. As noted though, since you really just have 2 choices for each axis a single bit can represent the presence or not of the item.
So the total set of sets is all numbers from 0 to 2^N-1. If N < 31 an int will work.

Categories