How to find duplicate pairs in a Dictionary? - c#

I'd like to calculate the TCC metric:
The Tight Class Cohesion (TCC)
measures the ratio of the number of
method pairs of directly connected
visible methods in a class NDC(C) and
the number of maximal possible method
pairs of connections between the
visible methods of a class NP(C). Two
visible methods are directly
connected, if they are accessing the
same instance variables of the class.
n is the number of visible methods
leading to:
NP(C) = (n(n-1))/2
and
TCC(C) = NDC(C) / NP(C)
So i wrote a method that parse through all methods in the class i want to check. This method stores all methods in that class and there fields they are using in a dictionary that looks like this:
Dictionary<MethodDefinition, IList<FieldReference>> references = new Dictionary<MethodDefinition, IList<FieldReference>>();
So now, how do I iterate through this dictionnary to check the condition mentioned above? If I understand it correctly I have to find these two pairs of methods that are using the same set of fields? Then how can I do this the best way? I think I have to iterate over the dictionary and see if the IList contains the same set? (even not in the same order)?
Any oder ideas`?
My code is the following, but it does not work correctly:
class TCC
{
public static int calculate(TypeDefinition type)
{
int count = 0;
Dictionary<MethodDefinition, HashSet<FieldReference>> references = new Dictionary<MethodDefinition, HashSet<FieldReference>>();
foreach (MethodDefinition method in type.Methods)
{
if (method.IsPublic)
{
references.Add(method, calculateReferences(method));
}
}
for (int i = 0; i < references.Keys.Count; i++)
{
HashSet<FieldReference> list = new HashSet<FieldReference>();
references.TryGetValue(references.Keys.ElementAt(i), out list);
if (isPair(references, list)) {
count++;
}
}
if (count > 0)
{
count = count / 2;
}
return count;
}
private static bool isPair(Dictionary<MethodDefinition, HashSet<FieldReference>> references, HashSet<FieldReference> compare)
{
for (int j = 0; j < references.Keys.Count; j++)
{
HashSet<FieldReference> compareList = new HashSet<FieldReference>();
references.TryGetValue(references.Keys.ElementAt(j), out compareList);
for (int i = 0; i < compare.Count; i++)
{
if (containsAllElements(compareList, compare)) {
return true;
}
}
}
return false;
}
private static bool containsAllElements(HashSet<FieldReference> compareList, HashSet<FieldReference> compare)
{
for (int i = 0; i < compare.Count; i++)
{
if (!compareList.Contains(compare.ElementAt(i)))
{
return false;
}
}
return true;
}
private static HashSet<FieldReference> calculateReferences(MethodDefinition method)
{
HashSet<FieldReference> references = new HashSet<FieldReference>();
foreach (Instruction instruction in method.Body.Instructions)
{
if (instruction.OpCode == OpCodes.Ldfld)
{
FieldReference field = instruction.Operand as FieldReference;
if (field != null)
{
references.Add(field);
}
}
}
return references;
}
}

Well, if you don't mind keeping another dictionary, we can hit this thing with a big-durn-hammer.
Simply put, if we imagine a dictionary where ordered_set(field-references) is the key instead, and we keep a list of the values for each key.... Needless to say this isn't the most clever approach, but it is quick, easy, and uses data structures you are already familiar with.
EG:
hashset< hashset < FieldReference >, Ilist< methods >> Favorite_delicatessen
Build ReferenceSet for method
Look up ReferenceSet in Favorite_Delicatessen
If there:
Add method to method list
Else:
Add Referenceset,method pair
And your methods list is thus the list of methods that share the same state-signature, if you'll let me coin a term.

Since you didn't tell us how can we tell two FieldReferences are duplicated, I will use the default.
LINQ version:
int duplicated = references.SelectMany( p => p.Value )
.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Count();

Can you use ContainsValue to check for duplicates? From what you described it appears you only have duplicates if the values are the same.

How about getting a dictionary where the key is the duplicate item, and the value is a list of keys from the original dictionary that contain the duplicate:
var dupes = references
.SelectMany(k => k.Value)
.GroupBy(v => v)
.Where(g => g.Count() > 1)
.ToDictionary(i => i.Key, i => references
.Where(f => f.Value.Contains(i.Key))
.Select(o => o.Key));

Related

Loop to check for duplicate strings

I want to create a loop to check a list of titles for duplicates.
I currently have this:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach (var x in productTitles)
{
var title = x.Text;
productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach (var y in productTitles.Skip(productTitles.IndexOf(x) + 1))
{
if (title == y.Text)
{
Assert.Fail("Found duplicate product in the table");
}
}
}
But this is taken the item I skip out of the array for the next loop so item 2 never checks it's the same as item 1, it moves straight to item 3.
I was under the impression that skip just passed over the index you pass in rather than removing it from the list.
You can use GroupBy:
var anyDuplicates = SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.GroupBy(p => p.Text, p => p)
.Any(g => g.Count() > 1);
Assert.That(anyDuplicates, Is.False);
or Distinct:
var productTitles = SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.Select(p => p.Text)
.ToArray();
var distinctProductTitles = productTitles.Distinct().ToArray();
Assert.AreEqual(productTitles.Length, distinctProductTitles.Length);
Or, if it is enough to find a first duplicate without counting all of them it's better to use a HashSet<T>:
var titles = new HashSet<string>();
foreach (var title in SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.Select(p => p.Text))
{
if (!titles.Add(title))
{
Assert.Fail("Found duplicate product in the table");
}
}
All approaches are better in terms of computational complexity (O(n)) than what you propose (O(n2)).
You don't need a loop. Simply use the Where() function to find all same titles, and if there is more than one, then they're duplicates:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach(var x in productTitles) {
if (productTitles.Where(y => x.Text == y.Text).Count() > 1) {
Assert.Fail("Found duplicate product in the table");
}
}
I would try a slightly different way since you only need to check for duplicates in a one-dimensional array.
You only have to check the previous element with the next element within the array/collection so using Linq to iterate through all of the items seems a bit unnecessary.
Here's a piece of code to better understand:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
for ( int i = 0; i < productionTitles.Length; i++ )
{
var currentObject = productionTitles[i];
for ( int j = i + 1; j < productionTitles.Length; j++ )
{
if ( currentObject.Title == productionTitles[j].Title )
{
// here's your duplicate
}
}
}
Since you've checked that item at index 0 is not the same as item placed at index 3 there's no need to check that again when you're at index 3. The items will remain the same.
The Skip(IEnumerable, n) method returns an IEnumerable that doesn't "contain" the n first element of the IEnumerable it's called on.
Also I don't know what sort of behaviour could arise from this, but I wouldn't assign a new IEnumerable to the variable over which the foreach is being executed.
Here's another possible solution with LINQ:
int i = 0;
foreach (var x in productTitles)
{
var possibleDuplicate = productTitles.Skip(i++).Find((y) => y.title == x.title);
//if possibleDuplicate is not default value of type
//do stuff here
}
This goes without saying, but the best solution for you will depend on what you are trying to do. Also, I think the Skip method call is more trouble than it's worth, as I'm pretty sure it will most certainly make the search less eficient.

Is it possible to modify fields using a LINQ expression without using a SELECT?

I am using LINQ to populate a list of PhraseSource. Following this I am using a foreach to modify the list:
List<PhraseSource> phraseSources = App.DB.db1.Table<PhraseSource>().ToList();
List<PhraseSource> phraseSources2 = phraseSources
.Where(p => categorySources.Any(c => c.Id == p.CategoryId)).ToList();
foreach (var item in phraseSources2)
{
item.OneHash = Math.Abs(item.Id.GetHashCode() % 10);
item.TwoHash = Math.Abs(item.Id.GetHashCode() % 100);
}
My knowledge of LINQ is pretty limited and this was the code given to me.
Is there a way I could combine the foreach into the LINQ expression or must I use a foreach in this way to modify the data. Note that I think I could do the with a SELECT but I am wondering if there is another way as PhraseSource has a lot of fields.
Thanks
LINQ is to query a datasource, not to modify it. It is strongly recommended to not abuse LINQ to modify it. In this case you have a list so you could use List.ForEach:
phraseSources2.ForEach(item => {
item.OneHash = Math.Abs(item.Id.GetHashCode() % 10);
item.TwoHash = Math.Abs(item.Id.GetHashCode() % 100);
});
Although i don't see the benefit over foreach.
For what it's worth, you can write an extension which allows that with any sequence:
public static void ForEachDo<T>(this IEnumerable<T> seq, Action<T> doWithEveryItem, Action doIfEmpty = null)
{
bool isEmpty = true;
foreach (T item in seq)
{
isEmpty = false;
doWithEveryItem(item);
}
if (isEmpty && doIfEmpty != null)
doIfEmpty();
}
Now you could use it even if it's not a list:
var phraseSources2 = phraseSources.Where(p => categorySources.Any(c => c.Id == p.CategoryId));
phraseSources2.ForEachDo(item=> {
item.OneHash = Math.Abs(item.Id.GetHashCode() % 10);
item.TwoHash = Math.Abs(item.Id.GetHashCode() % 100);
})
But again, foreach is the perfect tool for this job.
Although LINQ does not let you modify items, you have several options at your disposal:
Since OneHash and TwoHash are computed based on other properties of item, you can replace stored properties with computed ones
If you do not need to modify items inside phraseSources (which you would need to do if you plan to save them) you could create copies with computed properties set
If OneHash and TwoHash are not stored in the PhraseSource table, you could remove them from PhraseSource object, and add them as properties of an anonymous "wrapper" object. This would let you compute properties in a LINQ query.
The first approach would look like this:
public class PhraseSource {
...
public int OneHash => Id.GetHashCode() % 10;
public int TwoHash => Id.GetHashCode() % 100;
...
}
Note: in older C# versions that do not support lambda-bodied properties use the getter syntax:
public class PhraseSource {
...
public int OneHash { get { return Id.GetHashCode() % 10; } }
public int TwoHash { get { return Id.GetHashCode() % 100; } }
...
}

What to do to get only one List?

Hello i have a method that compares the objects of 2 Lists for differences. Right now this works but only for one property at a time.
Here is the Method:
public SPpowerPlantList compareTwoLists(string sqlServer, string database, DateTime timestampCurrent, string noteCurrent, DateTime timestampOld, string noteOld)
{
int count = 0;
SPpowerPlantList powerPlantListCurrent = loadProjectsAndComponentsFromSqlServer(sqlServer, database, timestampCurrent, noteCurrent);
SPpowerPlantList powerPlantListOld = loadProjectsAndComponentsFromSqlServer(sqlServer, database, timestampOld, noteOld);
SPpowerPlantList powerPlantListDifferences = new SPpowerPlantList();
count = powerPlantListOld.Count - powerPlantListCurrent.Count;
var differentObjects = powerPlantListCurrent.Where(p => !powerPlantListOld.Any(l => p.mwWeb == l.mwWeb)).ToList();
foreach (var differentObject in differentObjects)
{
powerPlantListDifferences.Add(differentObject);
}
return powerPlantListDifferences;
}
This works and i get 4 Objects in the new List. The Problem is that i have a few other properties that i need to compare. Instead of mwWeb for example name. When i try to change it i need for every new property a new List and a new Foreach-Loop.
e.g.
int count = 0;
SPpowerPlantList powerPlantListCurrent = loadProjectsAndComponentsFromSqlServer(sqlServer, database, timestampCurrent, noteCurrent);
SPpowerPlantList powerPlantListOld = loadProjectsAndComponentsFromSqlServer(sqlServer, database, timestampOld, noteOld);
SPpowerPlantList powerPlantListDifferences = new SPpowerPlantList();
SPpowerPlantList powerPlantListDifferences2 = new SPpowerPlantList();
count = powerPlantListOld.Count - powerPlantListCurrent.Count;
var differentObjects = powerPlantListCurrent.Where(p => !powerPlantListOld.Any(l => p.mwWeb == l.mwWeb)).ToList();
var differentObjects2 = powerPlantListCurrent.Where(p => !powerPlantListOld.Any(l => p.shortName == l.shortName)).ToList();
foreach (var differentObject in differentObjects)
{
powerPlantListDifferences.Add(differentObject);
}
foreach (var differentObject in differentObjects2)
{
powerPlantListDifferences2.Add(differentObject);
}
return powerPlantListDifferences;
Is there a way to prevent this? or to make more querys and get only 1 List with all different Objects back?
I tried it with except and intersect but that didnt worked.
So any help or advise would be great and thx for your time.
PS: If there is something wrong with my question-style please say it to me becouse i try to learn to ask better questions.
You may be able to simply chain the properties that you wanted to compare within your Where() clause using OR statements :
// This should get you any elements that have different A properties, B properties, etc.
var different = current.Where(p => !old.Any(l => p.A == l.A || p.B == l.B))
.ToList();
If that doesn't work and you really want to use the Except() or Intersect() methods to properly compare the objects, you could write your own custom IEqualityComparer<YourPowerPlant> to use to properly compare them :
class PowerPlantComparer : IEqualityComparer<YourPowerPlant>
{
// Powerplants are are equal if specific properties are equal.
public bool Equals(YourPowerPlant x, YourPowerPlant y)
{
// Check whether the compared objects reference the same data.
if (Object.ReferenceEquals(x, y)) return true;
//Check whether any of the compared objects is null.
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
// Checks the other properties to compare (examples using mwWeb and shortName)
return x.mwWeb == y.mwWeb && x.shortName == y.shortName;
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public int GetHashCode(YourPowerPlant powerPlant)
{
// Check whether the object is null
if (Object.ReferenceEquals(powerPlant, null)) return 0;
// Get hash code for the mwWeb field if it is not null.
int hashA = powerPlant.mwWeb == null ? 0 : powerPlant.mwWeb.GetHashCode();
// Get hash code for the shortName field if it is not null.
int hashB = powerPlant.shortName == null ? 0 : powerPlant.shortName.GetHashCode();
// Calculate the hash code for the product.
return hashA ^ hashB;
}
}
and then you could likely use something like one of the following depending on your needs :
var different = current.Except(old,new PowerPlantComparer());
or :
var different = current.Intersect(old,new PowerPlantComparer());
One way is to use IEqualityComparer as Rion Williams suggested, if you'd like a more flexible solution you can split logic in to two parts. First create helper method that accepts two lists, and function where you can define what properties you wish to compare. For example :
public static class Helper
{
public static SPpowerPlantList GetDifference(this SPpowerPlantList current, SPpowerPlantList old, Func<PowerPlant, PowerPlant, bool> func)
{
var diff = current.Where(p => old.All(l => func(p, l))).ToList();
var result = new SPpowerPlantList();
foreach (var item in diff) result.Add(item);
return result;
}
}
And use it :
public SPpowerPlantList compareTwoLists(string sqlServer, string database,
DateTime timestampCurrent, string noteCurrent,
DateTime timestampOld, string noteOld)
{
var powerPlantListCurrent = ...;
var powerPlantListOld = ...;
var diff = powerPlantListCurrent.GetDifference(
powerPlantListOld,
(x, y) => x.mwWeb != y.mwWeb ||
x.shortName != y.shortName);
return diff;
}
P.S. if it better suits your needs, you could move method inside of existing class :
public class MyClass
{
public SPpowerPlantList GetDifference(SPpowerPlantList current, SPpowerPlantList old, Func<PowerPlant, PowerPlant, bool> func)
{
...
}
}
And call it (inside of class) :
var result = GetDifference(currentValues, oldValues, (x, y) => x.mwWeb != y.mwWeb);
The easiest way to do this would be to compare some unique identifier (ID)
var differentObjects = powerPlantListCurrent
.Where(p => !powerPlantListOld.Any(l => p.Id == l.Id)
.ToList();
If the other properties might have been updated and you want to check that too, you'll have to compare all of them to detect changes made to existing elements:
Implement a camparison-method (IComparable, IEquatable, IEqualityComparer, or override Equals) or, if that's not possible because you didn't write the class yourself (code generated or external assembly), write a method to compare two of those SPpowerPlantList elements and use that instead of comparing every single property in Linq. For example:
public bool AreThoseTheSame(SPpowerPlantList a,SPpowerPlantList b)
{
if(a.mwWeb != b.mwWeb) return false;
if(a.shortName != b.shortName) return false;
//etc.
return true;
}
Then replace your difference call with this:
var differentObjects = powerPlantListCurrent
.Where(p => !powerPlantListOld.Any(l => AreThoseTheSame(p,l))
.ToList();

LINQ: Compare two lists and count subset

I am comparing 2 lists and I need to collect occurrences of a subset (modulesToDelete) from the master list (allModules) ONLY when MORE than one occurrence is found. (allModules contains modulesToDelete). Multiple occurrences of any module in modulesToDelete means those modules are being shared. One occurrence of a module in modulesToDelete means that module is isolated and is safe to delete (it just found itself). I can do this with nested foreach loops but this is as far as I got with a LINQ expression (which doesn't work)collect:
List<Module> modulesToDelete = { A, B, C, K }
List<string> allModules = {R, A, B, C, K, D, G, T, B, K } // need to flag B and K
var mods = from mod in modulesToDelete
where allModules.Any(name => name.Contains(mod.Name) && mod.Name.Count() > 1)
select mod;
here is my nested foreach loops which I want to replace with a LINQ expression:
foreach (Module mod in modulesToDelete)
{
int count = 0;
foreach (string modInAllMods in allModules)
{
if (modInAllMods == mod.Name)
{
count++;
}
}
if (count > 1)
{
m_moduleMarkedForKeep.Add(mod);
}
else if( count == 1)
{
// Delete the linked modules
}
}
You can use a lookup which is similar to a dictionary but allows multiple equal keys and returns an IEnumerable<T> as value.
var nameLookup = modulesToDelete.ToLookup(m => m.Name);
var safeToDelete = modulesToDelete.Where(m => nameLookup[m.Name].Count() == 1);
var sharedModules = modulesToDelete.Where(m => nameLookup[m.Name].Count() > 1);
Edit: However, i don't see how allModules is related at all.
Probably easier and with the desired result on your sample data:
var mods = modulesToDelete.Where(m => allModules.Count(s => s == m.Name) > 1);
One way of going about solving this will be to use Intersect function,
Intersection of two string array (ignore case)

How to Check All Values in Dictionary is same in C#?

I have a Dictionary, I want to write a method to check whether all values are same in this Dictionary.
Dictionary Type:
Dictionary<string, List<string>>
List {1,2,3}`and {2,1,3} are same in my case.
I have done this previously for simple datatype values, but I can not find logic for new requirement, please help me.
For simple values:
MyDict.GroupBy(x => x.Value).Where(x => x.Count() > 1)
I have also written a Generic Method to compare two datatypes in this way.
// 1
// Require that the counts are equal
if (a.Count != b.Count)
{
return false;
}
// 2
// Initialize new Dictionary of the type
Dictionary<T, int> d = new Dictionary<T, int>();
// 3
// Add each key's frequency from collection A to the Dictionary
foreach (T item in a)
{
int c;
if (d.TryGetValue(item, out c))
{
d[item] = c + 1;
}
else
{
d.Add(item, 1);
}
}
// 4
// Add each key's frequency from collection B to the Dictionary
// Return early if we detect a mismatch
foreach (T item in b)
{
int c;
if (d.TryGetValue(item, out c))
{
if (c == 0)
{
return false;
}
else
{
d[item] = c - 1;
}
}
else
{
// Not in dictionary
return false;
}
}
// 5
// Verify that all frequencies are zero
foreach (int v in d.Values)
{
if (v != 0)
{
return false;
}
}
// 6
// We know the collections are equal
return true;
Implement an IEqualityComparer for List<string> that compares two list based on their content. Then just use Distinct on Values and check the count:
dictionary.Values.Distinct(new ListEqualityComparer()).Count() == 1
This should do the trick
var lists = dic.Select(kv => kv.Value.OrderBy(x => x)).ToList();
var first = lists.First();
var areEqual = lists.Skip(1).All(hs => hs.SequenceEqual(first));
You'll need to add some checks to make this work for the empty case.
...or if you want to take #Selman's approach here's an implementation of the IEqualityComparer:
class SequenceComparer<T>:IEqualityComparer<IEnumerable<T>>
{
public bool Equals(IEnumerable<T> left, IEnumerable<T> right)
{
return left.OrderBy(x => x).SequenceEqual(right.OrderBy(x => x));
}
public int GetHashCode(IEnumerable<T> item)
{
//no need to sort because XOR is commutative
return item.Aggregate(0, (acc, val) => val.GetHashCode() ^ acc);
}
}
You could make a variant of this combining the best of both approaches using a HashSet<T> that might be considerably more efficient in the case that you have many candidates to test:
HashSet<IEnumerable<int>> hs = new HashSet<IEnumerable<int>>(new SequenceComparer<int>());
hs.Add(dic.First().Value);
var allEqual = dic.All(kvp => !hs.Add(kvp.Value));
This uses the feature of HashSets that disallows adding more than one item that is considered equal with an item already in the set. We make the HashSet use the custom IEqualityComparer above...
So we insert an arbitrary item from the dictionary before we start, then the moment another item is allowed into the set (i.e. hs.Add(kvp.Value) is true), we can say that there's more than one item in the set and bail out early. .All does this automatically.
Selman22's answer works perfectly - you can also do this for your Dictionary<string, List<string>> without having to implement an IEqualityComparer yourself:
var firstValue = dictionary.Values.First().OrderBy(x => x);
return dictionary.Values.All (x => x.OrderBy(y => y).SequenceEqual(firstValue));
We compare the first value to every other value, and check equality in each case. Note that List<string>.OrderBy(x => x) simply sorts the list of strings alphabetically.
Its not the fastest sdolution, but its works for me:
bool AreEqual = l1.Intersect(l2).ToList().Count() == l1.Count() && l1.Count() == l2.Count();

Categories