Linq Except not giving desired results in C#, datatable - c#

I have two DataTables. I applied the Except operator as follows,
and got either unfiltered or undesired results.
resultDataTable = dtA.AsEnumerable().Except(dtB.AsEnumerable()).CopyToDataTable();
Could anyone please kindly explain to me why Except(dtB.AsEnumerable()) is not the way to put it?
Note:
Both DataTables are plain simple with just one column.
dtA contains a dozen rows of strings.
dtB contains thousands of rows of strings.
I also tried the same syntax with another use case the set operator, Intersect. This does not work either.
resultDataTable2 =dtA.AsEnumerable().Intersect(dtB.AsEnumerable()).CopyToDataTable();

Except will use default comparer ie. it will compare references.
I think you are expecting to filter result and comparison is based on members.
I will recommend you to implement your own IEqualityComparer to compare two objects based on member.
e.g.
resultDataTable = dtA.AsEnumerable().Except(dtB.AsEnumerable(), new TestComparer()).CopyToDataTable();
class TestComparer : IEqualityComparer<MyTestClass>
{
public bool Equals(MyTestClass b1, MyTestClass b2)
{
if (b2 == null && b1 == null)
return true;
else if (b1 == null || b2 == null)
return false;
else if(b1.Prop1 == b2.Prop1 && b1.Prop2 == b2.Prop2) // ToDo add more check based on class
return true;
else
return false;
}
public int GetHashCode(MyTestClass)
{
int hCode = MyTestClass.Height ^ MyTestClass.Length ^ ....; // Add more based on class properties
return hCode.GetHashCode();
}
}
Doc
https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.except?view=net-6.0#system-linq-enumerable-except-1(system-collections-generic-ienumerable((-0))-system-collections-generic-ienumerable((-0))-system-collections-generic-iequalitycomparer((-0)))

Could anyone please kindly explain to me why Except(dtB.AsEnumerable()) is not the way to put it?
When you do a.Except(b) the contents of b are loaded into a hash set. A hash set is a device that doesn't accept duplicates, so it returns false when you try to add something that is already there inside it.
After b is loaded into the hash set, then a is looped over, also being added to the hash set. Anything that adds successfully (set.Add returns true because it is not already there) is returned. Anything that is already there (set.Add returns false because it was added by being in b, or appearing earlier in a) is not returned. You should note that this process also dedupes a, so 1,1,2,3 except 2,3 would return just a single 1. You've achieved "every unique thing in a that isn't in b" - but do check whether you wanted a to be deduped too
A hash set is a wonderful thing, that enables super fast lookups. To do this it relies on two methods that every object has: GetHashcode and Equals. GetHashcode converts an object into a probably-unique number. Equals makes absolutely sure an object A equals B. Hash set tracks all the hashcodes and objects it's seen before, so when you add something it first gets the hashcode of what youre trying to add.. If it never saw that hashcode before it adds the item. If you try add anything that has the same hashcode as something it saw already, it uses Equals to check whether or not it's the same as what it saw it already (sometimes hashcodes are the same for different data) and adds the item if Equals declares it to be different. This whole operation is very fast, much faster than searching object by object through all the objects it saw before.
By default any class in C# gets its implementation of GetHashcode and Equals from object. Object's versions of these methods essentially return the memory address for GetHashcode, and compare the memory addresses for Equals
This works fine for stuff that really is at the same memory address:
var p = new Person(){Name="John"};
var q = p; //same mem address as p
But it doesn't work for objects that have the same data but live at different memory addresses:
var p = new Person(){Name="John"};
var q = new Person(){Name="John"}; //not same mem address as p
If you define two people as being equal if they have the same name, and you want C# to consider them equal in the same way, you have to instruct C# to compare the names, not the memory addresses.
A DataRow is like Person above: just because it has the same data as another DataRow, doesn't mean it's the same row in C#'s opinion. Further, because a single DataRow cannot belong to two datatables, it's certain that the "John" in row 1 of dtA, is a different object to the "John" in row 1 of dtB..
By defaul Equals returns false for these two data rows, so Except will never consider them equal and remove dtA's John because of the presence of John in dtB..
..unless you provide an alternative comparison strategy that overrides C#s default opinion of equality. That might look like:
provide a comparer, like Kalpesh's answer, typos on the class name aside),
override Equals/GetHashcode for the datarows so they work off column data, not memory addresses,
or use some other thing that does already have Equals and GetHashcode that work off data rather than memory addresses
As these are just datatables of a single column full of strings they're notionally not much more than an array of string. If we make them into an array of strings, when we do a.Except(b) we will be comparing strings. By default C#s opinion of whether one string equals another is based on the data content of the string rather than the memory address it lives at1, so you can either use string arrays/lists to start with or convert your dtA/B to a string array:
var arrA = dtA.Rows.Cast<DataRow>().Select(r => r[0] as string).ToArray();
var arrB = dtB.Rows.Cast<DataRow>().Select(r => r[0] as string).ToArray();
var result = arra.Except(arrB);
Techncially we don't even need to call ToArray()..
If you really need the result to be a datatable, make one and add all the strings to it:
var resultDt = new DataTable();
resultDt.Columns.Add("x");
foreach(var s in result)
resultDt.Rows.Add(s);
1: we'll ignore interning for now

Related

Remove duplicates in custom IComparable class

I have a table that has combo pairs identifiers, and I use that to go through CSV files looking for matches. I'm trapping the unidentified pairs in a List, and sending them to an output box for later addition. I would like the output to only have single occurrences of unique pairs. The class is declared as follows:
public class Unmatched:IComparable<Unmatched>
{
public string first_code { get; set; }
public string second_code { get; set; }
public int CompareTo(Unmatched other)
{
if (this.first_code == other.first_code)
{
return this.second_code.CompareTo(other.second_code);
}
return other.first_code.CompareTo(this.first_code);
}
}
One note on the above code: This returns it in reverse alphabetical order, to get it in alphabetical order use this line:
return this.first_code.CompareTo(other.first_code);
Here is the code that adds it. This is directly after the comparison against the datatable elements
unmatched.Add(new Unmatched()
{ first_code = fields[clients[global_index].first_match_column]
, second_code = fields[clients[global_index].second_match_column] });
I would like to remove all pairs from the list where both first code and second code are equal, i.e.;
PTC,138A
PTC,138A
PTC,138A
MA9,5A
MA9,5A
MA9,5A
MA63,138A
MA63,138A
MA59,87BM
MA59,87BM
Should become:
PTC, 138A
MA9, 5A
MA63, 138A
MA59, 87BM
I have tried adding my own Equate and GetHashCode as outlined here:
http://www.morgantechspace.com/2014/01/Use-of-Distinct-with-Custom-Class-objects-in-C-Sharp.html
The SE links I have tried are here:
How would I distinct my list of key/value pairs
Get list of distinct values in List<T> in c#
Get a list of distinct values in List
All of them return a list that still has all the pairs. Here is the current code (Yes, I know there are two distinct lines, neither appears to be working) that outputs the list:
parser.Close();
List<Unmatched> noDupes = unmatched.Distinct().ToList();
noDupes.Sort();
noDupes.Select(x => x.first_code).Distinct();
foreach (var pair in noDupes)
{
txtUnmatchedList.AppendText(pair.first_code + "," + pair.second_code + Environment.NewLine);
}
Here is the Equate/Hash code as requested:
public bool Equals(Unmatched notmatched)
{
//Check whether the compared object is null.
if (Object.ReferenceEquals(notmatched, null)) return false;
//Check whether the compared object references the same data.
if (Object.ReferenceEquals(this, notmatched)) return true;
//Check whether the UserDetails' properties are equal.
return first_code.Equals(notmatched.first_code) && second_code.Equals(notmatched.second_code);
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public override int GetHashCode()
{
//Get hash code for the UserName field if it is not null.
int hashfirst_code = first_code == null ? 0 : first_code.GetHashCode();
//Get hash code for the City field.
int hashsecond_code = second_code.GetHashCode();
//Calculate the hash code for the GPOPolicy.
return hashfirst_code ^ hashsecond_code;
}
I have also looked at a couple of answers that are using queries and Tuples, which I honestly don't understand. Can someone point me to a source or answer that will explain the how (And why) of getting distinct pairs out of a custom list?
(Side question-Can you declare a class as both IComparable and IEquatable?)
The problem is you are not implementing IEquatable<Unmatched>.
public class Unmatched : IComparable<Unmatched>, IEquatable<Unmatched>
EqualityComparer<T>.Default uses the Equals(T) method only if you implement IEquatable<T>. You are not doing this, so it will instead use Object.Equals(object) which uses reference equality.
The overload of Distinct you are calling uses EqualityComparer<T>.Default to compare different elements of the sequence for equality. As the documentation states, the returned comparer uses your implementation of GetHashCode to find potentially-equal elements. It then uses the Equals(T) method to check for equality, or Object.Equals(Object) if you have not implemented IEquatable<T>.
You have an Equals(Unmatched) method, but it will not be used since you are not implementing IEquatable<Unmatched>. Instead, the default Object.Equals method is used which uses reference equality.
Note your current Equals method is not overriding Object.Equals since that takes an Object parameter, and you would need to specify the override modifier.
For an example on using Distinct see here.
You have to implement the IEqualityComparer<TSource> and not IComparable<TSource>.

Using IndexOf to search a combo box

I've inserted a few StaffRole files into a combobox using the below;
for (int i=0; i < staffRoles.Count; i++)
{
user_Role_Combo.Items.Add(staffRoles[i]);
}
I'm trying to search the index of a specific element within the combo box so it displays the correct element when loaded, I've got this, but it just returns -1 everytime;
StaffRole sr = new StaffRole("",roleID);
int comboBoxID = user_Role_Combo.Items.IndexOf(sr);
I'm doing this correct way no?!
In order for your new StaffRole instance to be 'found' in the combobox you need to describe why two StaffRole instances should be considered equivalent.
So you need to override Equals and GetHashCode. Technically, you need only Equals, but these two methods need to be overriden together.
One way to deal with it is to base object equality on roleId equality, like this:
public override int GetHashCode() {
return roleId.GetHashCode();
}
public override bool Equals(object obj) {
if (obj == this) return true;
var other = obj as StaffRole;
if (other == null) return false;
return roleId == other.roleId;
}
I'm doing this correct way no?!
No. By default IndexOf will check if the same reference exists in the items list. Since it's a new StaffRole that you just instanciated, it doesn't exist in the list.
I think what you want to do is compare by ID. To do this, you could override Equals and GetHashCode in the StaffRole class. In your custom Equals method, you would compare two objects by role ID. After doing this, IndexOf will work as you expect it to, by comparing using IDs instead of references.
Perhaps you could use either
FindString(String)
FindStringExact(String)
Both methods will return the index of the element in the list that matches the value of the string parameter that the method receives.
Combobox documentation here.
I didn't want to replace the equal / hashcode mehtods as I need them to be different for different instances.
So, I used some Linq to find the proper element inside the collection:
this.comboBox_group.SelectedIndex =
this.comboBox_group.Items.IndexOf
(comboBox_group.Items.Cast<Group>().Where(x => x.Id == SelectedId).First());

C# return generic list of objects using linq

i got a generic list that looks like this:
List<PicInfo> pi = new List<PicInfo>();
PicInfo is a class that looks like this:
[ProtoContract]
public class PicInfo
{
[ProtoMember(1)]
public string fileName { get; set; }
[ProtoMember(2)]
public string completeFileName { get; set; }
[ProtoMember(3)]
public string filePath { get; set; }
[ProtoMember(4)]
public byte[] hashValue { get; set; }
public PicInfo() { }
}
what i'm trying to do is:
first, filter the list with duplicate file names and return the duplicate objects;
than, filter the returned list with duplicate hash value's;
i can only find examples on how to do this which return anonymous types. but i need it to be a generic list.
if someone can help me out, I'd appreciate it. also please explain your code. it's a learning process for me.
thanks in advance!
[EDIT]
the generic list contains a list of objects. these objects are pictures. every picture has a file name, hash value (and some more data which is irrelevant at this point). some pictures have the same name (duplicate file names). and i want to get a list of the duplicate file names from this generic list 'pi'.
But those pictures also have a hash value. from the file names that are identical, i want another list of those identical files names that also have identical hash values.
[/EDIT]
Something like this should work. Whether it is the best method I am not sure. It is not very efficient because for each element you are iterating through the list again to get the count.
List<PicInfo> pi = new List<PicInfo>();
IEnumerable<PicInfo> filt = pi.Where(x=>pi.Count(z=>z.FileName==x.FileName)>1);
I hope the code isn't too complicated to need explaining. I always think its best to work it out on your own anyway but if anythign is confusing then just ask and I'll explain.
If you want the second filter to be filtering for the same filename and same hash being a duplicate then you just need to extend the lambda in the Count to check against hash too.
Obviously if you just want filenames at the end then it is easy enough to do a Select to get just an enumerable list of those filenames, possibly with a Distinct if you only want them to appear once.
NB. Code written by hand so do forgive typos. May not compile first time, etc. ;-)
Edit to explain code - spoilers! ;-)
In english what we want to do is the following:
for each item in the list we want to select it if and only if there is more than one item in the list with the same filename.
Breaking this down to iterate over the list and select things based on a criteria we use the Where method. The condition of our where method is
there is more than one item in the list with the same filename
for this we clearly need to count the list so we use pi.Count. However we have a condition that we are only counting if the filename matches so we pass in an expression to tell it only to count those things.
The expression will work on each item of the list and return true if we want to count it and false if we don't want to.
The filename we are interested in is on x, the item we are filtering. So we want to count how many items have a filename the same as x.FileName. Thus our expression is z=>z.FileName==x.FileName. So z is our variable in this expression and x.FileName in this context is unchanging as we iterate over z.
We then of course put our criteria in of >1 to get the boolean value we want.
If you wanted those that are duplicates when considering the filename and hashvalue then you would expand the part in the Count to be z=>z.FileName==x.FileName && z.hashValue==x.hashValue.
So your final code to get the distinct on both values would be:
List pi = new List();
List filt = pi.Where(x=>pi.Count(z=>z.FileName==x.FileName && z.hashValue==x.hashValue)>1).ToList();
If you wanted those that are duplicates when considering the filename and hashvalue then you would expand the part in the Count to compare the hashValue as well. Since this is an array you will want to use the SequenceEqual method to compare them value by value.
So your final code to get the distinct on both values would be:
List<PicInfo> pi = new List<PicInfo>();
List<PicInfo> filt = pi.Where(x=>pi.Count(z=>z.FileName==x.FileName && z.hashValue.SequenceEqual(x.hashValue))>1).ToList();
Note that I didn't create the intermediary list and just went straight from the original list. You could go from the intermediate list but the code would be much the same if going from the original as from a filtered list.
I think, you have to use SequenceEqual method for finding dublicate
(http://msdn.microsoft.com/ru-ru/library/bb348567.aspx).
For filter use
var p = pi.GroupBy(rs => rs.fileName) // group by name
.Where(rs => rs.Count() > 1) // find group whose count greater than 1
.Select(rs => rs.First()) // select 1st element from each group
.GroupBy(rs => rs.hashValue) // now group by hash value
.Where(rs => rs.Count() > 1) // find group has multiple values
.Select(rs => rs.First()) // select first element from group
.ToList<PicInfo>() // make the list of picInfo of result

How to compare two struct lists?

I have a small struct and I have to compare the values to find which ones have the same FreeFlow text, and then grab that struct ENumber.
public struct Holder
{
public string FreeFlow;
public int ENumber;
}
and here is how I add them
foreach(Class1.TextElement re in Class1._TextElements)
{
//create struct with all details will be good for later
Holder ph = new Holder();
ph.FreeFlow = re.FreeFlow;
ph.ENumber = re.ENumber;
lstHolder.Add(ph);
}
foreach(Class1.TextElement2 re in Class1._TextElements2)
{
//create struct with all details will be good for later
Holder phi = new Holder();
phi.FreeFlow = re.FreeFlow;
phi.ENumber = re.ENumber;
lstHolder2.Add(phi);
}
I can do a comparing using a foreach within a foreach, but I think this will not be the most effective way. Any help?
EDIT: I am trying to determine if freeflow text is exactly the same as the other struct freeflow text
I have to compare the values to find
which ones have the same FreeFlow
text, and then grab that struct
ENumber.
If you can use LINQ you can join on the items with the same FreeFlow text then select the ENumber values of both items:
var query = from x in Class1._TextElements
join y in Class1._TextElements2 on x.FreeFlow equals y.FreeFlow
select new { xId = x.ENumber, yId = y.ENumber };
foreach (var item in query)
{
Console.WriteLine("{0} : {1}", item.xId, item.yId);
}
EDIT: my understanding is the FreeFlow text is the common member and that ENumber is probably different, otherwise it would make sense to determine equivalence based on that. If that is the case the join query above should be what you need.
If I'm interpreting you correctly, you want to find the elements that are in both lstHolder and lstHolder2 - which is the intersection. If I'm interpreting correctly, then 2 step solution: first, override Equals() on your Holder struct. then use teh LINQ intersect operator:
var result = lstHolder.Intersect(lstHolder2);
What do you mean by "compare"? This could mean a lot of things. Do you want to know which items are common to both sets? Do you want to know which items are different?
LINQ might have the answer no matter what you mean. Union, Except, etc.
If you are using C# 3.0 or higher then try the SequenceEqual method
Class1._TextElements.SequenceEqual(Class1._TextElements2);
This will run equality checks on the elements in the collection. If the sequences are of different lengths or any of the elements in the same position are not equal it will return false.

Search in a List<DataRow>?

I have a List which I create from a DataTabe which only has one column in it. Lets say the column is called MyColumn. Each element in the list is an object array containing my columns, in this case, only one (MyColumn). Whats the most elegant way to check if that object array contains a certain value?
var searchValue = SOME_VALUE;
var result = list.Where(row => row["MyColumn"].Equals(searchValue)); // returns collection of DataRows containing needed value
var resultBool = list.Any(row => row["MyColumn"].Equals(searchValue)); // checks, if any DataRows containing needed value exists
If you should make this search often, I think it's not convenient to write LINQ-expression each time. I'd write extension-method like this:
private static bool ContainsValue(this List<DataRow> list, object value)
{
return list.Any(dataRow => dataRow["MyColumn"].Equals(value));
}
And after that make search:
if (list.ContainsValue("Value"))
http://dotnetperls.com/list-find-methods has something about exists & find.
Well, it depends on what version C# and .NET you are on, for 3.5 you could do it with LINQ:
var qualiyfyingRows =
from row in rows
where Equals(row["MyColumn"], value)
select row;
// We can see if we found any at all through.
bool valueFound = qualifyingRows.FirstOrDefault() != null;
That will give you both the rows that match and a bool that tells you if you found any at all.
However if you don't have LINQ or the extension methods that come with it you will have to search the list "old skool":
DataRow matchingRow = null;
foreach (DataRow row in rows)
{
if (Equals(row["MyColumn"], value))
{
matchingRow = row;
break;
}
}
bool valueFound = matchingRow != null;
Which will give you the first row that matches, it can obviously be altered to find all the rows that match, which would make the two examples more or less equal.
The LINQ version has a major difference though, the IEnumerable you get from it is deferred, so the computation will not be done until you actually enumerate it's members. I do not know enough about DataRow or your application to know if this can be a problem or not, but it was a problem in a piece of my code that dealt with NHibernate. Basically I was enumerating a sequence which members where no longer valid.
You can create your own deferred IEnumerables easily through the iterators in C# 2.0 and higher.
I may have misread this but it seems like the data is currently in a List<object[]> and not in a datatable so to get the items that match a certain criteria you could do something like:
var matched = items.Where(objArray => objArray.Contains(value));
items would be your list of object[]:s and matched would be an IEnumerable[]> with the object[]:s with the value in.

Categories