Declarations and entries in two_Dict dictionary is created as given:
Dictionary<string, List<string>>two_Dict = new Dictionary<string, List<string>>();
List<string> list;
if (!two_Dict.TryGetValue(d.ToString(), out list))
{
two_Dict.Add( d.ToString(), list = new List<string>());
list.Add(possibility_cell_list[0]);
list.Add(possibility_cell_list[1]);
}
Sample entries in two_Dict:
two_Dict["5"] Count = 2 [0]: "A2" [1]: "D2"
two_Dict["6"] Count = 2 [0]: "A2" [1]: "D2"
I am looking to form a linq query to get the keys which have the same list entries in the dictionary two_Dict. Any help will be appreciated.
You can use a fairly simple expression with linq:
var keys = from kvp1 in two_dict
where two_dict.Any(kvp2 => kvp2.Key != kvp1.Key
&& kvp2.Value.SequenceEqual(kvp1.Value))
select kvp1.Key;
However, this does not give the best performance as it will search through the entire dictionary n times, where n is the number of entries in the dictionary.
You can get slightly better performance if you only look at the items that have already been looked at so far. This way, on average you only go through half of the dictionary n times, so it's theoretically twice as fast. Unfortunately, I don't think there is a good way to do this purely using linq.
public static IEnumerable GetDuplicates(IDictionary<string, List<string>> dict)
{
var previousItems = new List<KeyValuePair<string, List<string>>>(dict.Count);
var matchedItems = new List<bool>();
foreach (var kvp in dict)
{
var match = previousItems.Select((kvp2, i) => Tuple.Create(kvp2.Key, kvp2.Value, i)).FirstOrDefault(t => kvp.Value.SequenceEqual(t.Item2));
if (match != null)
{
var index = match.Item3;
if (!matchedItems[index])
{
yield return match.Item1;
matchedItems[index] = true;
}
yield return kvp.Key;
}
else
{
previousItems.Add(kvp);
matchedItems.Add(false);
}
}
}
You would call the function like this:
var keys = GetDuplicates(two_dict);
All linq, no duplicate result but also, be careful, no null checks:
var res = from k1 in dict.Keys
from k2 in dict.Keys
where string.Compare(k1, k2) == -1 &&
dict[k1].Count == dict[k2].Count && !dict[k1].Any(x => !dict[k2].Contains(x)) && !dict[k2].Any(x => !dict[k1].Contains(x))
select new { k1, k2 };
How about this
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Dictionary<string, List<string>> two_Dict = new Dictionary<string, List<string>>();
var keyIndex = two_Dict.Keys.AsEnumerable()
.Select((x, i) => new { index = i, value = x })
.ToList();
var result = (from key1 in keyIndex
from key2 in keyIndex
where key1.index > key2.index
where AreEqual(two_Dict[key1.value],two_Dict[key2.value])
select new {key1 = key1.value, key2 = key2.value}).ToList();
}
static bool AreEqual<T>(List<T> x, List<T> y)
{
// same list or both are null
if (x == y)
{
return true;
}
// one is null (but not the other)
if (x == null || y == null)
{
return false;
}
// count differs; they are not equal
if (x.Count != y.Count)
{
return false;
}
x.Sort();
y.Sort();
for (int i = 0; i < x.Count; i++)
{
if (!x[i].Equals(y[i]))
{
return false;
}
}
return true;
}
}
}
Related
I have a list of image name like this {"1.jpg", "10.jpg", "2.jpg"}.
I would like to sort like this {"1.jpg", "2.jpg", "10.jpg"}.
I created this comparer. That means if x or y == "DSC_10.jpg", so if list is {"DSC_1.jpg", "DSC_10.jpg", "DSC_2.jpg", ...} don't sort and keep the list.
var comparer = new CompareImageName();
imageUrls.Sort(comparer);
return imageUrls;
public class CompareImageName : IComparer<string>
{
public int Compare(string x, string y)
{
if (x == null || y == null) return 0;
var l = x.Split('/');
var l1 = y.Split('/');
int a, b;
var rs = int.TryParse(l[l.Length - 1].Split('.')[0], out a);
var rs2 = int.TryParse(l1[l1.Length - 1].Split('.')[0], out b);
if (!rs || !rs2) return 0;
if (a == b || a == 0 && b == 0) return 0;
return a > b ? 1 : -1;
}
}
This sort correctly with name {"1.jpg", "10.jpg", "2.jpg"}, but incorrectly if list is {"DSC_1.jpg", "DSC_10.jpg", "DSC_2.jpg", ...}.
I read in MSDN:
What wrong with my code?
I think you're better off doing a bit of Regex for this. Try this solution:
public class CompareImageName : IComparer<string>
{
public int Compare(string x, string y)
{
if (x == null || y == null) return 0;
var regex = new Regex(#"/(((?<prefix>\w*)_)|)((?<number>\d+))\.jpg$");
var mx = regex.Match(x);
var my = regex.Match(y);
var r = mx.Groups["prefix"].Value.CompareTo(my.Groups["prefix"].Value);
if (r == 0)
{
r = int.Parse(mx.Groups["number"].Value).CompareTo(int.Parse(my.Groups["number"].Value));
}
return r;
}
}
Apart from the Regex string itself this is easier to follow the logic.
Here is your solution check this example, following class will do the comparison
public class NumericCompare : IComparer<string>
{
public int Compare(string x, string y)
{
int input1,input2;
input1=int.Parse(x.Substring(x.IndexOf('_')+1).Split('.')[0]);
input2= int.Parse(y.Substring(y.IndexOf('_')+1).Split('.')[0]);
return Comparer<int>.Default.Compare(input1,input2);
}
}
You can make use of this class like the following:
var imageUrls = new List<string>() { "DSC_1.jpg", "DSC_10.jpg", "DSC_2.jpg" };
var comparer = new NumericCompare();
imageUrls.Sort(comparer);
Console.WriteLine(String.Join("\n",imageUrls));
Try this with simple OrderBy
var SortedList = imageUrls.OrderBy(
x=>int.Parse(
x.Substring(x.IndexOf('_')+1).Split('.')[0])
).ToList();
Basically what you want to do is sort by the numeric part within the string. You are almost there. You just have to handle the part when you split a case like this DSC_2.jpg using a . then the first part is not all digits. So you need to get digits and then compare those. Here is the code. Please note I have made the assumption you will have backslash and if that is not the case then please handle it:
public int Compare(string x, string y)
{
if (x == null || y == null) return 0;
var nameX = x.Substring(x.LastIndexOf('/'));
var nameY = y.Substring(y.LastIndexOf('/'));
var nameXParts = nameX.Split('.');
var nameYParts = nameY.Split('.');
int a, b;
var rs = int.TryParse(nameXParts[0], out a);
var rs2 = int.TryParse(nameYParts[0], out b);
var nameXDigits = string.Empty;
if (!rs)
{
for (int i = 0; i < nameXParts[0].Length; i++)
{
if (Char.IsDigit(nameXParts[0][i]))
nameXDigits += nameXParts[0][i];
}
}
var nameYDigits = string.Empty;
if (!rs2)
{
for (int i = 0; i < nameYParts[0].Length; i++)
{
if (Char.IsDigit(nameYParts[0][i]))
nameYDigits += nameYParts[0][i];
}
}
int.TryParse(nameXDigits, out a);
int.TryParse(nameYDigits, out b);
if (a == b || a == 0 && b == 0) return 0;
return a > b ? 1 : -1;
}
Don't use imageUrls.Sort(comparer); on List because it doesn't accept 0 value as keeping the order of elements.
Reason:
The Sort performs an unstable sort; that is, if two elements are equal, their order might not be preserved. In contrast, a stable sort preserves the order of elements that are equal.
Link: https://msdn.microsoft.com/en-gb/library/w56d4y5z.aspx
Solution: Let's try to use OrderBy with your compare
var imageUrls1 = new List<string>() { "1.jpg", "10.jpg", "2.jpg" };
var imageUrls2 = new List<string>() { "DSC_1.jpg", "DSC_10.jpg", "DSC_2.jpg" };
var comparer = new CompareImageName();
//Sort normally
imageUrls1 = imageUrls1.OrderBy(p=>p, comparer).ToList();
//Keep the order as your expectation
imageUrls2 = imageUrls2.OrderBy(p=>p, comparer).ToList();
Maybe you can try doing this in a function instead of writing a comparator. I can't think of a good way to implement this logic as a comparator since there are different rules based on the contents (don't sort if the file name is not numeric).
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace sortinglists
{
public class MainProgram
{
public static void Main()
{
var imageUrlsNumbers = new List<string>();
imageUrlsNumbers.Add("c:/a/b/1.jpg");
imageUrlsNumbers.Add("c:/a/b/10.jpg");
imageUrlsNumbers.Add("c:/a/b/2.jpg");
CustomSort(ref imageUrlsNumbers);
foreach (var imageUrl in imageUrlsNumbers)
{
Console.WriteLine(imageUrl);
}
var imageUrlsText = new List<string>();
imageUrlsText.Add("c:/a/b/DSC_1.jpg");
imageUrlsText.Add("c:/a/b/DSC_10.jpg");
imageUrlsText.Add("c:/a/b/DSC_2.jpg");
CustomSort(ref imageUrlsText);
foreach (var imageUrl in imageUrlsText)
{
Console.WriteLine(imageUrl);
}
}
public static void CustomSort(ref List<string> imageUrls)
{
if (imageUrls
.Select(s => s.Substring(s.LastIndexOf("/", StringComparison.OrdinalIgnoreCase) + 1))
.Select(t => t.Substring(0, t.IndexOf(".", StringComparison.OrdinalIgnoreCase)))
.Where(u => new Regex("[A-Za-z_]").Match(u).Success)
.Any())
{
imageUrls = imageUrls
.Select(x => x.Substring(x.LastIndexOf("/", StringComparison.OrdinalIgnoreCase) + 1))
.ToList();
}
else
{
imageUrls = imageUrls
.Select(v => v.Substring(v.LastIndexOf("/", StringComparison.OrdinalIgnoreCase) + 1))
.OrderBy(w => Convert.ToInt32(w.Substring(0, w.LastIndexOf(".", StringComparison.OrdinalIgnoreCase))))
.ToList();
}
}
}
}
The output for imageUrlsNumbers after sorting is:
1.jpg
2.jpg
10.jpg
And the output for imageUrlsText after sorting is:
DSC_1.jpg
DSC_10.jpg
DSC_2.jpg
This one should not be too hard but my mind seems to be having a stack overflow (huehue). I have a series of Lists and I want to find all permutations they can be ordered in. All of the lists have different lengths.
For example:
List 1: 1
List 2: 1, 2
All permutations would be:
1, 1
1, 2
In my case I don't switch the numbers around. (For example 2, 1)
What is the easiest way to write this?
I can't say if the following is the easiest way, but IMO it's the most efficient way. It's basically a generalized version of the my answer to the Looking at each combination in jagged array:
public static class Algorithms
{
public static IEnumerable<T[]> GenerateCombinations<T>(this IReadOnlyList<IReadOnlyList<T>> input)
{
var result = new T[input.Count];
var indices = new int[input.Count];
for (int pos = 0, index = 0; ;)
{
for (; pos < result.Length; pos++, index = 0)
{
indices[pos] = index;
result[pos] = input[pos][index];
}
yield return result;
do
{
if (pos == 0) yield break;
index = indices[--pos] + 1;
}
while (index >= input[pos].Count);
}
}
}
You can see the explanation in the linked answer (shortly it's emulating nested loops). Also since for performace reasons it yields the internal buffer w/o cloning it, you need to clone it if you want store the result for later processing.
Sample usage:
var list1 = new List<int> { 1 };
var list2 = new List<int> { 1, 2 };
var lists = new[] { list1, list2 };
// Non caching usage
foreach (var combination in lists.GenerateCombinations())
{
// do something with the combination
}
// Caching usage
var combinations = lists.GenerateCombinations().Select(c => c.ToList()).ToList();
UPDATE: The GenerateCombinations is a standard C# iterator method, and the implementation basically emulates N nested loops (where N is the input.Count) like this (in pseudo code):
for (int i0 = 0; i0 < input[0].Count; i0++)
for (int i1 = 0; i1 < input[1].Count; i1++)
for (int i2 = 0; i2 < input[2].Count; i2++)
...
for (int iN-1 = 0; iN-1 < input[N-1].Count; iN-1++)
yield { input[0][i0], input[1][i1], input[2][i2], ..., input[N-1][iN-1] }
or showing it differently:
for (indices[0] = 0; indices[0] < input[0].Count; indices[0]++)
{
result[0] = input[0][indices[0]];
for (indices[1] = 0; indices[1] < input[1].Count; indices[1]++)
{
result[1] = input[1][indices[1]];
// ...
for (indices[N-1] = 0; indices[N-1] < input[N-1].Count; indices[N-1]++)
{
result[N-1] = input[N-1][indices[N-1]];
yield return result;
}
}
}
Nested loops:
List<int> listA = (whatever), listB = (whatever);
var answers = new List<Tuple<int,int>>;
for(int a in listA)
for(int b in listB)
answers.add(Tuple.create(a,b));
// do whatever with answers
Try this:
Func<IEnumerable<string>, IEnumerable<string>> combine = null;
combine = xs =>
xs.Skip(1).Any()
? xs.First().SelectMany(x => combine(xs.Skip(1)), (x, y) => String.Format("{0}{1}", x, y))
: xs.First().Select(x => x.ToString());
var strings = new [] { "AB", "12", "$%" };
foreach (var x in combine(strings))
{
Console.WriteLine(x);
}
That gives me:
A1$
A1%
A2$
A2%
B1$
B1%
B2$
B2%
I made the following IEnumerable<IEnumerable<TValue>> class to solve this problem which allows use of generic IEnumerable's and whose enumerator returns all permutations of the values, one from each inner list. It can be conventiently used directly in a foreach loop.
It's a variant of Michael Liu's answer to IEnumerable and Recursion using yield return
I've modified it to return lists with the permutations instead of the single values.
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
namespace Permutation
{
public class ListOfListsPermuter<TValue> : IEnumerable<IEnumerable<TValue>>
{
private int count;
private IEnumerable<TValue>[] listOfLists;
public ListOfListsPermuter(IEnumerable<IEnumerable<TValue>> listOfLists_)
{
if (object.ReferenceEquals(listOfLists_, null))
{
throw new ArgumentNullException(nameof(listOfLists_));
}
listOfLists =listOfLists_.ToArray();
count = listOfLists.Count();
for (int i = 0; i < count; i++)
{
if (object.ReferenceEquals(listOfLists[i], null))
{
throw new NullReferenceException(string.Format("{0}[{1}] is null.", nameof(listOfLists_), i));
}
}
}
// A variant of Michael Liu's answer in StackOverflow
// https://stackoverflow.com/questions/2055927/ienumerable-and-recursion-using-yield-return
public IEnumerator<IEnumerable<TValue>> GetEnumerator()
{
TValue[] currentList = new TValue[count];
int level = 0;
var enumerators = new Stack<IEnumerator<TValue>>();
IEnumerator<TValue> enumerator = listOfLists[level].GetEnumerator();
try
{
while (true)
{
if (enumerator.MoveNext())
{
currentList[level] = enumerator.Current;
level++;
if (level >= count)
{
level--;
yield return currentList;
}
else
{
enumerators.Push(enumerator);
enumerator = listOfLists[level].GetEnumerator();
}
}
else
{
if (level == 0)
{
yield break;
}
else
{
enumerator.Dispose();
enumerator = enumerators.Pop();
level--;
}
}
}
}
finally
{
// Clean up in case of an exception.
enumerator?.Dispose();
while (enumerators.Count > 0)
{
enumerator = enumerators.Pop();
enumerator.Dispose();
}
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
You can use it directly in a foreach like this:
public static void Main(string[] args)
{
var listOfLists = new List<List<string>>()
{
{ new List<string>() { "A", "B" } },
{ new List<string>() { "C", "D" } }
};
var permuter = new ListOfListsPermuter<string>(listOfLists);
foreach (IEnumerable<string> item in permuter)
{
Console.WriteLine("{ \"" + string.Join("\", \"", item) + "\" }");
}
}
The output:
{ "A", "C" }
{ "A", "D" }
{ "B", "C" }
{ "B", "D" }
I have a MultiValueDictionary<string, string> where I am trying to get a key by value.
var dic = na.prevNext; // Getter to get MultiValueDictionary
string nodePointingToThisOne = "";
foreach (var item in dic)
{
if(item.Value == "test")
{
nodePointingToThisOne = item.Key;
}
break;
}
This does not work so I tried Linq:
string nodePointingToThisOne = dic.Where(x => x.Value == this.nodeID).Select(x => x.Key);
But on both I get this error: Operator '==' cannot be applied to operands of type 'System.Collections.Generic.IReadOnlyCollection<string>' and 'string'
So my question is how do I make this comparison work for a read-only collection? I am aware that I get problems if a key exists multiple times but I reduced the problem to this one for now.
I read
Get Dictionary key by using the dictionary value
LINQ: Getting Keys for a given list of Values from Dictionary and vice versa
get dictionary key by value
Getting key of value of a generic Dictionary?
Get key from value - Dictionary<string, List<string>>
but they deal with a "normal" dictionary.
Since the Value property it self may contain multiple values you can't compare it directly against certain string using == operator. Use Contains() instead :
.....
if (item.Value.Contains("test"))
{
.....
}
...or in method chain version :
string nodePointingToThisOne = dic.Where(x => x.Value.Contains("test"))
.Select(x => x.Key)
.FirstOrDefault();
Try to iterate by Keys and then compare the value, and return the Key only if the Value matches.
Like this:
foreach (var key in dic.Keys)
{
if(dic[key].Contains("your value"))
return key;
}
You can iterate over keys like this
foreach (var key in dic.Keys)
{
if(key == "your key")
return key;
}
You can also iterate over values like this
foreach (var v in dic.Values)
{
if(v == "your value")
return v;
}
Example:
Dictionary<string, string> c = new Dictionary<string, string>();
c.Add("Pk", "Pakistan");
c.Add("Aus", "Australia");
c.Add("Ind", "India");
c.Add("Nz", "New Zeland");
c.Add("SA", "South Africa");
foreach (var v in c.Values)
{
if (v == "Australia")
{
Console.WriteLine("Your Value is = " + v);
// perform your task
}
}
foreach (var k in c.Keys)
{
if (k == "Aus")
{
// perform your task
Console.WriteLine("Your Key is = " + k);
}
}
Output:
Your Value is = "Australia"
Your Key is = "Aus"
If you look at the MultiValueDictionary, line 81 will give you a hint. It is:
[TestInitialize]
public void TestInitialize()
{
MultiValueDictionary = new MultiValueDictionary<TestKey, string>();
}
protected static void AssertAreEqual( IDictionary<TestKey, string[]> expected,
IMultiValueDictionary<TestKey, string> actual )
{
Assert.AreEqual( expected.Count, actual.Count );
foreach ( var k in expected.Keys )
{
var expectedValues = expected[ k ];
var actualValues = actual[ k ];
AssertAreEqual( expectedValues, actualValues );
}
}
So for your case, the solution is similar:
foreach (var item in dic.keys)
{
if(dict[item] == "test")
{
nodePointingToThisOne = item;
return nodePointingToThisOne;
}
}
worked for me:
foreach (int i in Dictionary.Keys)
{
if (Dictionary.Values.ToString() == "Your Value")
{
return Dictionary.Keys;
}
}
Say I have the following array of strings as an input:
foo-139875913
foo-aeuefhaiu
foo-95hw9ghes
barbazabejgoiagjaegioea
barbaz8gs98ghsgh9es8h
9a8efa098fea0
barbaza98fyae9fghaefag
bazfa90eufa0e9u
bazgeajga8ugae89u
bazguea9guae
aifeaufhiuafhe
There are 3 different prefixes used here, "foo-", "barbaz" and "baz" - however these prefixes are not known ahead of time (they could be something completely different).
How could you establish what the different common prefixes are so that they could then be grouped by? This is made a bit tricky since in the data I've provided there's two that start with "bazg" and one that starts "bazf" where of course "baz" is the prefix.
What I've tried so far is sorting them into alphabetical order, and then looping through them in order and counting how many characters in a row are identical to the previous. If the number is different or when 0 characters are identical, it starts a new group. The problem with this is it falls over at the "bazg" and "bazf" problem I mentioned earlier and separates those into two different groups (one with just one element in it)
Edit: Alright, let's throw a few more rules in:
Longer potential groups should generally be preferred over shorter ones, unless there is a closely matching group of less than X characters difference in length. (So where X is 2, baz would be preferred over bazg)
A group must have at least Y elements in it or not be a group at all
It's okay to simply throw away elements that don't match any of the 'groups' to within the rules above.
To clarify the first rule in relation to the second, if X was 0 and Y was 2, then the two 'bazg' entries would be in a group, and the 'bazf' would be thrown away because its on its own.
Well, here's a quick hack, probably O(something_bad):
IEnumerable<Tuple<String, IEnumerable<string>>> GuessGroups(IEnumerable<string> source, int minNameLength=0, int minGroupSize=1)
{
// TODO: error checking
return InnerGuessGroups(new Stack<string>(source.OrderByDescending(x => x)), minNameLength, minGroupSize);
}
IEnumerable<Tuple<String, IEnumerable<string>>> InnerGuessGroups(Stack<string> source, int minNameLength, int minGroupSize)
{
if(source.Any())
{
var tuple = ExtractTuple(GetBestGroup(source, minNameLength), source);
if (tuple.Item2.Count() >= minGroupSize)
yield return tuple;
foreach (var element in GuessGroups(source, minNameLength, minGroupSize))
yield return element;
}
}
Tuple<String, IEnumerable<string>> ExtractTuple(string prefix, Stack<string> source)
{
return Tuple.Create(prefix, PopWithPrefix(prefix, source).ToList().AsEnumerable());
}
IEnumerable<string> PopWithPrefix(string prefix, Stack<string> source)
{
while (source.Any() && source.Peek().StartsWith(prefix))
yield return source.Pop();
}
string GetBestGroup(IEnumerable<string> source, int minNameLength)
{
var s = new Stack<string>(source);
var counter = new DictionaryWithDefault<string, int>(0);
while(s.Any())
{
var g = GetCommonPrefix(s);
if(!string.IsNullOrEmpty(g) && g.Length >= minNameLength)
counter[g]++;
s.Pop();
}
return counter.OrderBy(c => c.Value).Last().Key;
}
string GetCommonPrefix(IEnumerable<string> coll)
{
return (from len in Enumerable.Range(0, coll.Min(s => s.Length)).Reverse()
let possibleMatch = coll.First().Substring(0, len)
where coll.All(f => f.StartsWith(possibleMatch))
select possibleMatch).FirstOrDefault();
}
public class DictionaryWithDefault<TKey, TValue> : Dictionary<TKey, TValue>
{
TValue _default;
public TValue DefaultValue {
get { return _default; }
set { _default = value; }
}
public DictionaryWithDefault() : base() { }
public DictionaryWithDefault(TValue defaultValue) : base() {
_default = defaultValue;
}
public new TValue this[TKey key]
{
get { return base.ContainsKey(key) ? base[key] : _default; }
set { base[key] = value; }
}
}
Example usage:
string[] input = {
"foo-139875913",
"foo-aeuefhaiu",
"foo-95hw9ghes",
"barbazabejgoiagjaegioea",
"barbaz8gs98ghsgh9es8h",
"barbaza98fyae9fghaefag",
"bazfa90eufa0e9u",
"bazgeajga8ugae89u",
"bazguea9guae",
"9a8efa098fea0",
"aifeaufhiuafhe"
};
GuessGroups(input, 3, 2).Dump();
Ok, well as discussed, the problem wasn't initially well defined, but here is how I'd go about it.
Create a tree T
Parse the list, for each element:
for each letter in that element
if a branch labeled with that letter exists then
Increment the counter on that branch
Descend that branch
else
Create a branch labelled with that letter
Set its counter to 1
Descend that branch
This gives you a tree where each of the leaves represents a word in your input. Each of the non-leaf nodes has a counter representing how many leaves are (eventually) attached to that node. Now you need a formula to weight the length of the prefix (the depth of the node) against the size of the prefix group. For now:
S = (a * d) + (b * q) // d = depth, q = quantity, a, b coefficients you'll tweak to get desired behaviour
So now you can iterate over each of the non-leaf node and assign them a score S. Then, to work out your groups you would
For each non-leaf node
Assign score S
Insertion sort the node in to a list, so the head is the highest scoring node
Starting at the root of the tree, traverse the nodes
If the node is the highest scoring node in the list
Mark it as a prefix
Remove all nodes from the list that are a descendant of it
Pop itself off the front of the list
Return up the tree
This should give you a list of prefixes. The last part feels like some clever data structures or algorithms could speed it up (the last part of removing all the children feels particularly weak, but if you input size is small, I guess speed isn't too important).
I'm wondering if your requirements aren't off. It seems as if you are looking for a specific grouping size as opposed to specific key size requirements. I have below a program that will, based on a specified group size, break up the strings into the largest possible groups up too, and including the group size specified. So if you specify a group size of 5, then it will group items on the smallest key possible to make a group of size 5. In your example it would group foo- as f since there is no need to make a more complex key as an identifier.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication2
{
class Program
{
/// <remarks><c>true</c> in returned dictionary key are groups over <paramref name="maxGroupSize"/></remarks>
public static Dictionary<bool,Dictionary<string, List<string>>> Split(int maxGroupSize, int keySize, IEnumerable<string> items)
{
var smallItems = from item in items
where item.Length < keySize
select item;
var largeItems = from item in items
where keySize < item.Length
select item;
var largeItemsq = (from item in largeItems
let key = item.Substring(0, keySize)
group item by key into x
select new { Key = x.Key, Items = x.ToList() } into aGrouping
group aGrouping by aGrouping.Items.Count() > maxGroupSize into x2
select x2).ToDictionary(a => a.Key, a => a.ToDictionary(a_ => a_.Key, a_ => a_.Items));
if (smallItems.Any())
{
var smallestLength = items.Aggregate(int.MaxValue, (acc, item) => Math.Min(acc, item.Length));
var smallItemsq = (from item in smallItems
let key = item.Substring(0, smallestLength)
group item by key into x
select new { Key = x.Key, Items = x.ToList() } into aGrouping
group aGrouping by aGrouping.Items.Count() > maxGroupSize into x2
select x2).ToDictionary(a => a.Key, a => a.ToDictionary(a_ => a_.Key, a_ => a_.Items));
return Combine(smallItemsq, largeItemsq);
}
return largeItemsq;
}
static Dictionary<bool, Dictionary<string,List<string>>> Combine(Dictionary<bool, Dictionary<string,List<string>>> a, Dictionary<bool, Dictionary<string,List<string>>> b) {
var x = new Dictionary<bool,Dictionary<string,List<string>>> {
{ true, null },
{ false, null }
};
foreach(var condition in new bool[] { true, false }) {
var hasA = a.ContainsKey(condition);
var hasB = b.ContainsKey(condition);
x[condition] = hasA && hasB ? a[condition].Concat(b[condition]).ToDictionary(c => c.Key, c => c.Value)
: hasA ? a[condition]
: hasB ? b[condition]
: new Dictionary<string, List<string>>();
}
return x;
}
public static Dictionary<string, List<string>> Group(int maxGroupSize, IEnumerable<string> items, int keySize)
{
var toReturn = new Dictionary<string, List<string>>();
var both = Split(maxGroupSize, keySize, items);
if (both.ContainsKey(false))
foreach (var key in both[false].Keys)
toReturn.Add(key, both[false][key]);
if (both.ContainsKey(true))
{
var keySize_ = keySize + 1;
var xs = from needsFix in both[true]
select needsFix;
foreach (var x in xs)
{
var fixedGroup = Group(maxGroupSize, x.Value, keySize_);
toReturn = toReturn.Concat(fixedGroup).ToDictionary(a => a.Key, a => a.Value);
}
}
return toReturn;
}
static Random rand = new Random(unchecked((int)DateTime.Now.Ticks));
const string allowedChars = "aaabbbbccccc"; // "aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ";
static readonly int maxAllowed = allowedChars.Length - 1;
static IEnumerable<string> GenerateText()
{
var list = new List<string>();
for (int i = 0; i < 100; i++)
{
var stringLength = rand.Next(3,25);
var chars = new List<char>(stringLength);
for (int j = stringLength; j > 0; j--)
chars.Add(allowedChars[rand.Next(0, maxAllowed)]);
var newString = chars.Aggregate(new StringBuilder(), (acc, item) => acc.Append(item)).ToString();
list.Add(newString);
}
return list;
}
static void Main(string[] args)
{
// runs 1000 times over autogenerated groups of sample text.
for (int i = 0; i < 1000; i++)
{
var s = GenerateText();
Go(s);
}
Console.WriteLine();
Console.WriteLine("DONE");
Console.ReadLine();
}
static void Go(IEnumerable<string> items)
{
var dict = Group(3, items, 1);
foreach (var key in dict.Keys)
{
Console.WriteLine(key);
foreach (var item in dict[key])
Console.WriteLine("\t{0}", item);
}
}
}
}
I have this:
List<string> list = new List<string>();
list.Add("a-b-c>d");
list.Add("b>c");
list.Add("f>e");
list.Add("f>e-h");
list.Add("a-d>c-b");
I want to delete duplicates. In this case duplicates are "a-b-c>d" and "a-d>c-b". Both have same chars but in diferente order.
I tried with:
list.Distinct().ToList();
But didn't work!
It looks like you want:
var distinct = list
.Select((str, idx) => new { Str = str, Idx = idx })
.GroupBy(pair => new HashSet<char>(pair.Str), HashSet<char>.CreateSetComparer())
.Select(grp => grp.OrderBy(p => p.Idx).First())
.ToList();
This will keep the first element and remove any later strings in the sequence which contains the same characters.
You can also use Aggregate to track the character sets you've already seen:
var distinct = list
.Aggregate(new Dictionary<HashSet<char>, string>(HashSet<char>.CreateSetComparer()), (dict, str) =>
{
var set = new HashSet<char>(str);
if (!dict.ContainsKey(set))
dict.Add(set, str);
return dict;
})
.Values
.ToList();
You'll have to define a custom IEqualityComparer that allows the system to understand when you consider two strings to be "equal". For example:
List<string> list = new List<string>();
list.Add("a-b-c>d");
list.Add("b>c-d-f");
list.Add("c-d-f>e");
list.Add("a-d>c-b");
var distinctItems = list.Distinct(new KeyFuncEqualityComparer<string>(
s => new String(s.AsEnumerable().OrderBy(c => c).ToArray())));
Result:
a-b-c>d
b>c-d-f
c-d-f>e
... using this generic IEqualityComparer implementation:
public class KeyFuncEqualityComparer<T> :IEqualityComparer<T>
{
private readonly Func<T, object> _getKey;
public KeyFuncEqualityComparer(Func<T, object> getKey)
{
_getKey = getKey;
}
public bool Equals(T x, T y)
{
return _getKey(x).Equals(_getKey(y));
}
public int GetHashCode(T obj)
{
return _getKey(obj).GetHashCode();
}
}
untested
list<int> dups = new list<int>();
for(int i = 0; i < list.count - 1; i++)
{
for(int j = 1; j < list.count; j++)
{
int len = list[j].length;
bool b = false;
for(int k = 0; k < list[i].length; k++)
{
if(k < len && (list[i][k] != list[j][k]))
{
b = false;
break;
}
b = true;
}
if(b == true)
{
dups.add(i);
}
}
}
then remove all dups from list