LINQ: Compare two lists and count subset - c#

I am comparing 2 lists and I need to collect occurrences of a subset (modulesToDelete) from the master list (allModules) ONLY when MORE than one occurrence is found. (allModules contains modulesToDelete). Multiple occurrences of any module in modulesToDelete means those modules are being shared. One occurrence of a module in modulesToDelete means that module is isolated and is safe to delete (it just found itself). I can do this with nested foreach loops but this is as far as I got with a LINQ expression (which doesn't work)collect:
List<Module> modulesToDelete = { A, B, C, K }
List<string> allModules = {R, A, B, C, K, D, G, T, B, K } // need to flag B and K
var mods = from mod in modulesToDelete
where allModules.Any(name => name.Contains(mod.Name) && mod.Name.Count() > 1)
select mod;
here is my nested foreach loops which I want to replace with a LINQ expression:
foreach (Module mod in modulesToDelete)
{
int count = 0;
foreach (string modInAllMods in allModules)
{
if (modInAllMods == mod.Name)
{
count++;
}
}
if (count > 1)
{
m_moduleMarkedForKeep.Add(mod);
}
else if( count == 1)
{
// Delete the linked modules
}
}

You can use a lookup which is similar to a dictionary but allows multiple equal keys and returns an IEnumerable<T> as value.
var nameLookup = modulesToDelete.ToLookup(m => m.Name);
var safeToDelete = modulesToDelete.Where(m => nameLookup[m.Name].Count() == 1);
var sharedModules = modulesToDelete.Where(m => nameLookup[m.Name].Count() > 1);
Edit: However, i don't see how allModules is related at all.
Probably easier and with the desired result on your sample data:
var mods = modulesToDelete.Where(m => allModules.Count(s => s == m.Name) > 1);

One way of going about solving this will be to use Intersect function,
Intersection of two string array (ignore case)

Related

Loop to check for duplicate strings

I want to create a loop to check a list of titles for duplicates.
I currently have this:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach (var x in productTitles)
{
var title = x.Text;
productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach (var y in productTitles.Skip(productTitles.IndexOf(x) + 1))
{
if (title == y.Text)
{
Assert.Fail("Found duplicate product in the table");
}
}
}
But this is taken the item I skip out of the array for the next loop so item 2 never checks it's the same as item 1, it moves straight to item 3.
I was under the impression that skip just passed over the index you pass in rather than removing it from the list.
You can use GroupBy:
var anyDuplicates = SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.GroupBy(p => p.Text, p => p)
.Any(g => g.Count() > 1);
Assert.That(anyDuplicates, Is.False);
or Distinct:
var productTitles = SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.Select(p => p.Text)
.ToArray();
var distinctProductTitles = productTitles.Distinct().ToArray();
Assert.AreEqual(productTitles.Length, distinctProductTitles.Length);
Or, if it is enough to find a first duplicate without counting all of them it's better to use a HashSet<T>:
var titles = new HashSet<string>();
foreach (var title in SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.Select(p => p.Text))
{
if (!titles.Add(title))
{
Assert.Fail("Found duplicate product in the table");
}
}
All approaches are better in terms of computational complexity (O(n)) than what you propose (O(n2)).
You don't need a loop. Simply use the Where() function to find all same titles, and if there is more than one, then they're duplicates:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach(var x in productTitles) {
if (productTitles.Where(y => x.Text == y.Text).Count() > 1) {
Assert.Fail("Found duplicate product in the table");
}
}
I would try a slightly different way since you only need to check for duplicates in a one-dimensional array.
You only have to check the previous element with the next element within the array/collection so using Linq to iterate through all of the items seems a bit unnecessary.
Here's a piece of code to better understand:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
for ( int i = 0; i < productionTitles.Length; i++ )
{
var currentObject = productionTitles[i];
for ( int j = i + 1; j < productionTitles.Length; j++ )
{
if ( currentObject.Title == productionTitles[j].Title )
{
// here's your duplicate
}
}
}
Since you've checked that item at index 0 is not the same as item placed at index 3 there's no need to check that again when you're at index 3. The items will remain the same.
The Skip(IEnumerable, n) method returns an IEnumerable that doesn't "contain" the n first element of the IEnumerable it's called on.
Also I don't know what sort of behaviour could arise from this, but I wouldn't assign a new IEnumerable to the variable over which the foreach is being executed.
Here's another possible solution with LINQ:
int i = 0;
foreach (var x in productTitles)
{
var possibleDuplicate = productTitles.Skip(i++).Find((y) => y.title == x.title);
//if possibleDuplicate is not default value of type
//do stuff here
}
This goes without saying, but the best solution for you will depend on what you are trying to do. Also, I think the Skip method call is more trouble than it's worth, as I'm pretty sure it will most certainly make the search less eficient.

C# Delete one of two successive and same lines in a list

how can i delete one of two same successive lines in a list?
For example:
load
testtest
cd /abc
cd /abc
testtest
exit
cd /abc
In this case ONLY line three OR four.The lists have about 50000 lines, so it is also about speed.
Do you have an idea?
Thank you!
Homeros
You just have to look at the last added element in the second list:
var secondList = new List<string>(firstList.Count){ firstList[0] };
foreach(string next in firstList.Skip(1))
if(secondList.Last() != next)
secondList.Add(next);
Since you wanted to delete the duplicates you have to assign this new list to the old variable:
firstList = secondList;
This approach is more efficient than deleting from a list.
Side-note: since Enumerable.Last is optimized for collections with an indexer(IList<T>), is is as efficient as secondList[secondList.Count-1], but more readable.
user a reverse for-loop and check the adjacent elements:
List<string> list = new List<string>();
for (int i = list.Count-1; i > 0 ; i--)
{
if (list[i] == list[i-1])
{
list.RemoveAt(i);
}
}
the reverse version is advantageous here, because the list might shrink in size with every removed element
I would first split the list, then use LINQ to only select items that don't have the same previous item:
string[] source = text.Split(Environment.NewLine);
var list = source.Select((l, idx) => new { Line = l, Index = idx } )
.Where(x => x.Index == 0 || source[x.Index - 1] != x.Line)
.Select(x => x.Line)
.ToList() // materialize
;
O(n) as extension method
public static IEnumerable<string> RemoveSameSuccessiveItems(this IEnumerable<string> items)
{
string previousItem = null;
foreach(var item in list)
{
if (item.Equals(previousItem) == false)
{
previousItem = item;
yield item;
}
}
}
Then use it
lines = lines.RemoveSameSuccessiveItems();

How to Check All Values in Dictionary is same in C#?

I have a Dictionary, I want to write a method to check whether all values are same in this Dictionary.
Dictionary Type:
Dictionary<string, List<string>>
List {1,2,3}`and {2,1,3} are same in my case.
I have done this previously for simple datatype values, but I can not find logic for new requirement, please help me.
For simple values:
MyDict.GroupBy(x => x.Value).Where(x => x.Count() > 1)
I have also written a Generic Method to compare two datatypes in this way.
// 1
// Require that the counts are equal
if (a.Count != b.Count)
{
return false;
}
// 2
// Initialize new Dictionary of the type
Dictionary<T, int> d = new Dictionary<T, int>();
// 3
// Add each key's frequency from collection A to the Dictionary
foreach (T item in a)
{
int c;
if (d.TryGetValue(item, out c))
{
d[item] = c + 1;
}
else
{
d.Add(item, 1);
}
}
// 4
// Add each key's frequency from collection B to the Dictionary
// Return early if we detect a mismatch
foreach (T item in b)
{
int c;
if (d.TryGetValue(item, out c))
{
if (c == 0)
{
return false;
}
else
{
d[item] = c - 1;
}
}
else
{
// Not in dictionary
return false;
}
}
// 5
// Verify that all frequencies are zero
foreach (int v in d.Values)
{
if (v != 0)
{
return false;
}
}
// 6
// We know the collections are equal
return true;
Implement an IEqualityComparer for List<string> that compares two list based on their content. Then just use Distinct on Values and check the count:
dictionary.Values.Distinct(new ListEqualityComparer()).Count() == 1
This should do the trick
var lists = dic.Select(kv => kv.Value.OrderBy(x => x)).ToList();
var first = lists.First();
var areEqual = lists.Skip(1).All(hs => hs.SequenceEqual(first));
You'll need to add some checks to make this work for the empty case.
...or if you want to take #Selman's approach here's an implementation of the IEqualityComparer:
class SequenceComparer<T>:IEqualityComparer<IEnumerable<T>>
{
public bool Equals(IEnumerable<T> left, IEnumerable<T> right)
{
return left.OrderBy(x => x).SequenceEqual(right.OrderBy(x => x));
}
public int GetHashCode(IEnumerable<T> item)
{
//no need to sort because XOR is commutative
return item.Aggregate(0, (acc, val) => val.GetHashCode() ^ acc);
}
}
You could make a variant of this combining the best of both approaches using a HashSet<T> that might be considerably more efficient in the case that you have many candidates to test:
HashSet<IEnumerable<int>> hs = new HashSet<IEnumerable<int>>(new SequenceComparer<int>());
hs.Add(dic.First().Value);
var allEqual = dic.All(kvp => !hs.Add(kvp.Value));
This uses the feature of HashSets that disallows adding more than one item that is considered equal with an item already in the set. We make the HashSet use the custom IEqualityComparer above...
So we insert an arbitrary item from the dictionary before we start, then the moment another item is allowed into the set (i.e. hs.Add(kvp.Value) is true), we can say that there's more than one item in the set and bail out early. .All does this automatically.
Selman22's answer works perfectly - you can also do this for your Dictionary<string, List<string>> without having to implement an IEqualityComparer yourself:
var firstValue = dictionary.Values.First().OrderBy(x => x);
return dictionary.Values.All (x => x.OrderBy(y => y).SequenceEqual(firstValue));
We compare the first value to every other value, and check equality in each case. Note that List<string>.OrderBy(x => x) simply sorts the list of strings alphabetically.
Its not the fastest sdolution, but its works for me:
bool AreEqual = l1.Intersect(l2).ToList().Count() == l1.Count() && l1.Count() == l2.Count();

LINQ: Collapsing a series of strings into a set of "ranges"

I have an array of strings similar to this (shown on separate lines to illustrate the pattern):
{ "aa002","aa003","aa004","aa005","aa006","aa007", // note that aa008 is missing
"aa009"
"ba023","ba024","ba025"
"bb025",
"ca002","ca003",
"cb004",
...}
...and the goal is to collapse those strings into this comma-separated string of "ranges":
"aa002-aa007,aa009,ba023-ba025,bb025,ca002-ca003,cb004, ... "
I want to collapse them so I can construct a URL. There are hundreds of elements, but I can still convey all the information if I collapse them this way - putting them all into a URL "longhand" (it has to be a GET, not a POST) isn't feasible.
I've had the idea to separate them into groups using the first two characters as the key - but does anyone have any clever ideas for collapsing those sequences (without gaps) into ranges? I'm struggling with it, and everything I've come up with looks like spaghetti.
So the first thing that you need to do is parse the strings. It's important to have the alphabetic prefix and the integer value separately.
Next you want to group the items on the prefix.
For each of the items in that group, you want to order them by number, and then group items while the previous value's number is one less than the current item's number. (Or, put another way, while the previous item plus one is equal to the current item.)
Once you've grouped all of those items you want to project that group out to a value based on that range's prefix, as well as the first and last number. No other information from these groups is needed.
We then flatten the list of strings for each group into just a regular list of strings, since once we're all done there is no need to separate out ranges from different groups. This is done using SelectMany.
When that's all said and done, that, translated into code, is this:
public static IEnumerable<string> Foo(IEnumerable<string> data)
{
return data.Select(item => new
{
Prefix = item.Substring(0, 2),
Number = int.Parse(item.Substring(2))
})
.GroupBy(item => item.Prefix)
.SelectMany(group => group.OrderBy(item => item.Number)
.GroupWhile((prev, current) =>
prev.Number + 1 == current.Number)
.Select(range =>
RangeAsString(group.Key,
range.First().Number,
range.Last().Number)));
}
The GroupWhile method can be implemented like so:
public static IEnumerable<IEnumerable<T>> GroupWhile<T>(
this IEnumerable<T> source, Func<T, T, bool> predicate)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
yield break;
List<T> list = new List<T>() { iterator.Current };
T previous = iterator.Current;
while (iterator.MoveNext())
{
if (!predicate(previous, iterator.Current))
{
yield return list;
list = new List<T>();
}
list.Add(iterator.Current);
previous = iterator.Current;
}
yield return list;
}
}
And then the simple helper method to convert each range into a string:
private static string RangeAsString(string prefix, int start, int end)
{
if (start == end)
return prefix + start;
else
return string.Format("{0}{1}-{0}{2}", prefix, start, end);
}
Here's a LINQ version without the need to add new extension methods:
var data2 = data.Skip(1).Zip(data, (d1, d0) => new
{
value = d1,
jump = d1.Substring(0, 2) == d0.Substring(0, 2)
? int.Parse(d1.Substring(2)) - int.Parse(d0.Substring(2))
: -1,
});
var agg = new { f = data.First(), t = data.First(), };
var query2 =
data2
.Aggregate(new [] { agg }.ToList(), (a, x) =>
{
var last = a.Last();
if (x.jump == 1)
{
a.RemoveAt(a.Count() - 1);
a.Add(new { f = last.f, t = x.value, });
}
else
{
a.Add(new { f = x.value, t = x.value, });
}
return a;
});
var query3 =
from q in query2
select (q.f) + (q.f == q.t ? "" : "-" + q.t);
I get these results:

How to find duplicate pairs in a Dictionary?

I'd like to calculate the TCC metric:
The Tight Class Cohesion (TCC)
measures the ratio of the number of
method pairs of directly connected
visible methods in a class NDC(C) and
the number of maximal possible method
pairs of connections between the
visible methods of a class NP(C). Two
visible methods are directly
connected, if they are accessing the
same instance variables of the class.
n is the number of visible methods
leading to:
NP(C) = (n(n-1))/2
and
TCC(C) = NDC(C) / NP(C)
So i wrote a method that parse through all methods in the class i want to check. This method stores all methods in that class and there fields they are using in a dictionary that looks like this:
Dictionary<MethodDefinition, IList<FieldReference>> references = new Dictionary<MethodDefinition, IList<FieldReference>>();
So now, how do I iterate through this dictionnary to check the condition mentioned above? If I understand it correctly I have to find these two pairs of methods that are using the same set of fields? Then how can I do this the best way? I think I have to iterate over the dictionary and see if the IList contains the same set? (even not in the same order)?
Any oder ideas`?
My code is the following, but it does not work correctly:
class TCC
{
public static int calculate(TypeDefinition type)
{
int count = 0;
Dictionary<MethodDefinition, HashSet<FieldReference>> references = new Dictionary<MethodDefinition, HashSet<FieldReference>>();
foreach (MethodDefinition method in type.Methods)
{
if (method.IsPublic)
{
references.Add(method, calculateReferences(method));
}
}
for (int i = 0; i < references.Keys.Count; i++)
{
HashSet<FieldReference> list = new HashSet<FieldReference>();
references.TryGetValue(references.Keys.ElementAt(i), out list);
if (isPair(references, list)) {
count++;
}
}
if (count > 0)
{
count = count / 2;
}
return count;
}
private static bool isPair(Dictionary<MethodDefinition, HashSet<FieldReference>> references, HashSet<FieldReference> compare)
{
for (int j = 0; j < references.Keys.Count; j++)
{
HashSet<FieldReference> compareList = new HashSet<FieldReference>();
references.TryGetValue(references.Keys.ElementAt(j), out compareList);
for (int i = 0; i < compare.Count; i++)
{
if (containsAllElements(compareList, compare)) {
return true;
}
}
}
return false;
}
private static bool containsAllElements(HashSet<FieldReference> compareList, HashSet<FieldReference> compare)
{
for (int i = 0; i < compare.Count; i++)
{
if (!compareList.Contains(compare.ElementAt(i)))
{
return false;
}
}
return true;
}
private static HashSet<FieldReference> calculateReferences(MethodDefinition method)
{
HashSet<FieldReference> references = new HashSet<FieldReference>();
foreach (Instruction instruction in method.Body.Instructions)
{
if (instruction.OpCode == OpCodes.Ldfld)
{
FieldReference field = instruction.Operand as FieldReference;
if (field != null)
{
references.Add(field);
}
}
}
return references;
}
}
Well, if you don't mind keeping another dictionary, we can hit this thing with a big-durn-hammer.
Simply put, if we imagine a dictionary where ordered_set(field-references) is the key instead, and we keep a list of the values for each key.... Needless to say this isn't the most clever approach, but it is quick, easy, and uses data structures you are already familiar with.
EG:
hashset< hashset < FieldReference >, Ilist< methods >> Favorite_delicatessen
Build ReferenceSet for method
Look up ReferenceSet in Favorite_Delicatessen
If there:
Add method to method list
Else:
Add Referenceset,method pair
And your methods list is thus the list of methods that share the same state-signature, if you'll let me coin a term.
Since you didn't tell us how can we tell two FieldReferences are duplicated, I will use the default.
LINQ version:
int duplicated = references.SelectMany( p => p.Value )
.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Count();
Can you use ContainsValue to check for duplicates? From what you described it appears you only have duplicates if the values are the same.
How about getting a dictionary where the key is the duplicate item, and the value is a list of keys from the original dictionary that contain the duplicate:
var dupes = references
.SelectMany(k => k.Value)
.GroupBy(v => v)
.Where(g => g.Count() > 1)
.ToDictionary(i => i.Key, i => references
.Where(f => f.Value.Contains(i.Key))
.Select(o => o.Key));

Categories