Please read my previous question, because my fear of getting collision when using hashCode for strings !
Previous question
I having a database table with items in a repo, and a "incoming" function with items from a model that should sync - to the database table.
Im using intersect and except to make this possible.
The class i use for my sunc purpose:
private class syncItemModel
{
public override int GetHashCode()
{
return this.ItemLookupCode.GetHashCode();
}
public override bool Equals(object other)
{
if (other is syncItemModel)
return ((syncItemModel)other).ItemLookupCode == this.ItemLookupCode;
return false;
}
public string Description { get; set; }
public string ItemLookupCode { get; set; }
public int ItemID { get; set; }
}
Then i use this in my method:
1) Convert datatable items to syncmodel:
var DbItemsInCampaignDiscount_SyncModel =
DbItemsInCampaignDiscount(dbcampaignDiscount, datacontext)
.Select(i => new syncItemModel { Description = i.Description,
ItemLookupCode = i.ItemLookupCode,
ItemID = i.ID}).ToList();
2) Convert my incoming item model to syncmodel:
var ItemsInCampaignDiscountModel_SyncModel = modelItems
.Select(i => new syncItemModel { Description =
i.Description, ItemLookupCode = i.ItemLookUpCode, ItemID =0 }).ToList();
3) Make an intersect:
var CommonItemInDbAndModel =
ItemsInCampaignDiscountModel_SyncModel.Intersect(DbItemsInCampaignDiscount_SyncModel).ToList();
4) Take out items to be deleted in database (that not exist in incoming model items)
var SyncModel_OnlyInDb =
DbItemsInCampaignDiscount_SyncModel.Except(CommonItemInDbAndModel).ToList();
5) Take out items to be added to database, items that exist in incoming model but not in db:
var SyncModel_OnlyInModel =
ItemsInCampaignDiscountModel_SyncModel.Except(CommonItemInDbAndModel).ToList();
My question is then - can it be a collision? Can two differnt ItemLookupCode in my example be treated as the same ItemLookupCode? Because intersect and except using HashCode ! Or vill the Equal function "double check" it -so this approach is safe to use? If its a possible chance of collision how big is that chance?
Yes, there could be always a hash-collision, that's why identity should be confirmed by calling Equals(). GetHashCode() and Equals() must be implemented correctly.
Except() in LINQ to Objects internally uses HashSet, in case of hash-collision it will call Equals to guarantee identity. As you are using a single property, you are good to proxy calls to its hashcode and equals methods.
Please find some comments below about your implementation:
comparison with ==
This is fine to compare strings with ==, but if type is changed to non-primitive, you'll get issues because object reference instead of content will be compared. Proxy call to Equals() instead of ==.
mutability of the object
That is very error prone to bound gethashcode/Equals logic to mutable state. I'd strongly recommend to encapsulate your state so that once you create your object it could not be changed, make set private for a sake of safety.
Related
As I know the method "GetHashCode()" should use only readonly / immutable properties. But if I change for example id property which use GetHashCode() then I get new hash code. So why it should be immutable? If it wouldn't changed then I see problem but it changes.
class Program
{
public class Point
{
public int Id { get; set; }
public override bool Equals(object obj)
{
return obj is Point point &&
Id == point.Id;
}
public override int GetHashCode()
{
return HashCode.Combine(Id);
}
}
static void Main(string[] args)
{
Point point = new Point();
point.Id = 5;
var r1 = point.GetHashCode(); //467047723
point.Id = 10;
var r2 = point.GetHashCode(); //1141379410
}
}
GetHashCode() is there for mainly one reason: retrieval of an object from a hash table. You are right that it is desirable that the hash code should be computed only from immutable fields, but think about the reason for this. Since the hashcode is used to retrieve an object from a hashtable it will lead to errors when the hashcode changes while the object is stored in the hashtable.
To put it more generally: the value returned by GetHashCode must stay stable as long as a structure depends on that hashcode to stay stable. So for you example it means you can change the id field as long as the object is currently not used in any such structure.
Exactly because of this, because if it's not Immutable the hash code changes every time
A hash code is a numeric value that is used to identify an object
during equality testing. It can also serve as an index for an object
in a collection.
so if it changes every time you can't use it for its purpose. more info...
I think it is strange that the GetHashCode function of these collections don't base their hashcode on the items in their lists.
I need this to work in order to provide dirty checking (you have unsaved data).
I've written a wrapping class that overrides the GetHashCode method but I find it weird that this is not the default implementation.
I guess this is a performance optimization?
class Program
{
static void Main(string[] args)
{
var x = new ObservableCollection<test>();
int hash = x.GetHashCode();
x.Add(new test("name"));
int hash2 = x.GetHashCode();
var z = new List<test>();
int hash3 = z.GetHashCode();
z.Add(new test("tets"));
int hash4 = z.GetHashCode();
var my = new CustomObservableCollection<test>();
int hash5 = my.GetHashCode();
var test = new test("name");
my.Add(test);
int hash6 = my.GetHashCode();
test.Name = "name2";
int hash7 = my.GetHashCode();
}
}
public class test
{
public test(string name)
{
Name = name;
}
public string Name { get; set; }
public override bool Equals(object obj)
{
if (obj is test)
{
var o = (test) obj;
return o.Name == this.Name;
}
return base.Equals(obj);
}
public override int GetHashCode()
{
return Name.GetHashCode();
}
}
public class CustomObservableCollection<T> : ObservableCollection<T>
{
public override int GetHashCode()
{
int collectionHash = base.GetHashCode();
foreach (var item in Items)
{
var itemHash = item.GetHashCode();
if (int.MaxValue - itemHash > collectionHash)
{
collectionHash = collectionHash * -1;
}
collectionHash += itemHash;
}
return collectionHash;
}
}
If it did, it would break a few of the guidelines for implementing GetHashCode. Namely:
the integer returned by GetHashCode should never change
Since the content of a list can change, then so would its hash code.
the implementation of GetHashCode must be extremely fast
Depending on the size of the list, you could risk slowing down the calculation of its hash code.
Also, I do not believe you should be using an object's hashcode to check if data is dirty. The probability of collision is higher than you think.
The Equals/GetHashCode of lists checks for reference equality, not content equality. The reason behind this is, that lists are both mutable and by reference (not struct) objects. So every time you change the contents, the hash code would change.
The common use case of hash codes are hash tables (for example Dictionary<K,V> or HashSet), which sort their items based on hash when the are first inserted into the table. If the hash of an object wich is already in the table changes, it may no longer be found, wich leads to erratic behavior.
The key of GetHashCode is to reflect the Equals() logic, in a light weight way.
And List<T>.Equals() inherits Object.Equals(), and Object.Equals() compares the equality by reference, so that the list do not based on it's items, but the list itself
It would be helpful to have a couple types which behaved like List<T> and could generally be used interchangeably with it, but with GetHashCode and Equals methods which would define equivalence either in terms of the sequence of identities, or the Equals and GetHashCode behaviors of the items encapsulated therein. Making such methods to behave efficiently, however, would require that the class include code to cache its hash value but invalidate or update the cached hash value whenever the collection was modified (it would not be legitimate to modify a list while it was stored as a dictionary key, but it should be legitimate to remove a list, modify it, and re-add it, and it would be very desirable to avoid having such modification necessitate re-hashing the entire contents of the list). It was not considered worthwhile to have ordinary lists go through the effort of supporting such behavior at the cost of slowing down operations on lists that never get hashed; nor was it considered worthwhile to define multiple types of list, multiple types of dictionary, etc. based upon the kind of equivalence they should look for in their members or should expose to the outside world.
Before I start, I'd like to clarify that this is not like all the other somewhat "similar" questions out there. I've tried implementing each approach, but the phenomena I am getting here are really weird.
I have a dictionary where ContainsKey always returns false, even if their GetHashCode functions return the same output, and even if their Equals method returns true.
What could this mean? What am I doing wrong here?
Additional information
The two elements I am inserting are both of type Owner, with no GetHashCode or Equals method. These inherit from a type Storable, which then implements an interface, and also has GetHashCode and Equals defined.
Here's my Storable class. You are probably wondering if the two Guid properties are indeed equal - and yes, they are. I double-checked. See the sample code afterwards.
public abstract class Storable : IStorable
{
public override int GetHashCode()
{
return Id == default(Guid) ? 0 : Id.GetHashCode();
}
public override bool Equals(object obj)
{
var other = obj as Storable;
return other != null && (other.Id == Id || ReferenceEquals(obj, this));
}
public Guid Id { get; set; }
protected Storable()
{
Id = Guid.NewGuid();
}
}
Now, here's the relevant part of my code where the dictionary stuff occurs. It takes in a Supporter object which has a link to an Owner.
public class ChatSession : Storable, IChatSession
{
static ChatSession()
{
PendingSupportSessions = new Dictionary<IOwner, LinkedList<IChatSession>>();
}
private static readonly IDictionary<IOwner, LinkedList<IChatSession>> PendingSupportSessions;
public static ChatSession AssignSupporterForNextPendingSession(ISupporter supporter)
{
var owner = supporter.Owner;
if (!PendingSupportSessions.ContainsKey(owner)) //always returns false
{
var hashCode1 = owner.GetHashCode();
var hashCode2 = PendingSupportSessions.First().Key.GetHashCode();
var equals = owner.Equals(PendingSupportSessions.First().Key);
//here, equals is true, and the two hashcodes are identical,
//and there is only one element in the dictionary according to the debugger.
//however, calling two "Add" calls after eachother does indeed crash.
PendingSupportSessions.Add(owner, new LinkedList<IChatSession>());
PendingSupportSessions.Add(owner, new LinkedList<IChatSession>()); //crash
}
...
}
}
If you need additional information, let me know. I am not sure what kind of information would be sufficient, so it was hard for me to include more.
Guillaume was right. It appears that I was changing the value of one of my keys after it is added to the dictionary. Doh!
Make sure you are passing same object that is stored as key in dictionary. If you are creating new object each time and trying to find key assuming the object is already stored because of similar values, then containsKey returns false. Object comparisons are different than value comparisons.
Here is my situation. I have 2 list of the same type. Imagine the names like these. FullList and ElementsRemoved. So in order to avoid the database roundtrip, anytime I delete an element from the fulllist I added to the list of ElementsRemoved in case of regret's user so he can revert the deletion.
I was thinking to loop inside my ElementsRemoved to insert them again into the FullList from where initially were removed.
There is any way to do this as simple with List Methods.
Something like
FullList.Insert, Add, ..... (x =>
in order to reduce line code and optimized?
Instead of deleting the item from your database consider using a flag in the table.
For example consider this entities table (written in TSQL):
CREATE TABLE Entity
(
Id INT IDENTITY PRIMARY KEY
,Name NVARCHAR(20) NOT NULL
,IsDelete BIT NOT NULL DEFAULT 0
);
This way you can set the IsDelete bit when the user deletes the entity which will prevent the data from being lost. The data can be pruned on a job in the off hours.
The would lead to only needing one list instead of keeping track of two lists.
public class Entity
{
public int Id { get; set; }
public string Name { get; set; }
public bool IsDelete { get; set; }
}
public static void UndoDelete(IEnumerable<Entity> fullList, int[] removedIds)
{
foreach(var entity in fullList.Where(e => removedIds.Contains(e.Id)))
{
entity.IsDelete = false;
}
}
In case you cannot modify your application.
You can simply add the entities back in.
See List(T).AddRange
var entitiesToAdd = new[] { 2, 3, 4 };
var entitiesToInsert = ElementsRemoved.Where(e => entitiesToAdd.Contains(e.Id));
FullList.AddRange(entitiesToInsert);
In your front end make a class that holds a bool and your object:
public class DelPair<T>{
public bool IsDeleted{get;set;}
public T Item{get;set;}
}
Now instead of using a list of objects use a list of DelPair<YourClass> and set IsDeleted=true when deleting.
This pattern will also allow you to track other things, such as IsModified if it comes to that.
Based on OP comment that he's using an ENTITY class and needs it to function as such:
One option is to make your DelPair class inherit ENTITY. Another may be to put implicit casting operator:
...
// not exactly sure about the signature, trial/error should do :)
public static implicit operator T(DelPair<T> pair)
{
return pair.Item;
}
Suppose you have an element having a field id which uniquely identifies it.
class Element{public int id;}
In that case you can do this
FullList.Add(ElementsRemoved.FirstOrDefault(e=>e.id==id));
In case you want to add all elements use AddRange
FullList.AddRange(ElementsRemoved);
You can use the AddRange method
FullList.AddRange(ElementsRemoved);
But consider doing this
public class YourClass
{
public string AnyValue{get;set;}
public bool IsDeleted{get;set;}
}
And you have list like this List < YourClass> FullList. Now whenever user removes any item you just set the
IsDeleted = true
of the item that is removed. This will help you in keeping just one list and adding removing from the list
I'm adding values to a c# generic list while trying to prevent duplicates, but without success. Anyone know of a reason why this code below wouldn't work?
I have a simple class here:
public class DrivePairs
{
public int Start { get; set; }
public int End { get; set; }
}
And here is my method which tries to return a generic list of the above class:
ArrayList found = DriveRepository.GetDriveArray(9, 138);
List<DrivePairs> drivePairs = new List<DrivePairs>();
foreach (List<int> item in found)
{
int count = item.Count;
if (count > 1)
{
for (int i = 0; i < (count - 1); i++)
{
DrivePairs drivePair = new DrivePairs();
drivePair.Start = item[i];
drivePair.End = item[i + 1];
if (!drivePairs.Contains(drivePair))
drivePairs.Add(drivePair);
}
}
}
drivePairs = drivePairs.Distinct().ToList();
As you can maybe see, I have an ArrayList, and each row contains a List<int>. What I'm doing is going through each and adding to a list which contains only pairs. E.g. if my List<int> contains [1,3,6,9] I want to add three entries to my pairs list:
[1,3]
[3,6]
[6,9]
It all works fine apart from not recognising duplicates. I thought this line would be enough:
if (!drivePairs.Contains(drivePair))
drivePairs.Add(drivePair);
but it continues to add them all. Even when I add a Distinct() at the end, it still doesn't remove them. I've also tried adding them to a HashSet, but it still includes all the duplicates.
Anyone know of a reason why the duplicates might not be getting picked up?
Your DrivePairs class does not specify equality, as a result, the Contains method will be using reference equality. Add an Equals method that uses both Start and End to determine equality and you will probably find your code works.
See: Equality Comparisons (C# Programming Guide)
List.Contains Method
This method determines equality by using the default equality
comparer, as defined by the object's implementation of the
IEquatable.Equals method for T (the type of values in the list).
Change your DrivePairs class
public class DrivePairs: IEquatable<DrivePairs>
{
public int Start { get; set; }
public int End { get; set; }
public bool Equals(DrivePairs other)
{
return (this.Start == other.Start && this.End == other.End)
}
}
See: http://msdn.microsoft.com/en-us/library/bhkz42b3.aspx
Hope this helps
You are creating new List<int> objects - these are different objects and when compared to each other, even if they contain identical values (in the same or in different orders), will be evaluated as different as the default comparison method on reference types is a reference comparison.
You need to write a custom comparer that will identify equal lists in the manner your application requires.
I've marked Colin's as the answer, but here was the code just in case it's any use to anyone:
Equality comparer:
public class EqualityComparer : IEqualityComparer<DrivePairs>
{
public bool Equals(DrivePairs x, DrivePairs y)
{
return x.StartHub.Equals(y.Start);
}
public int GetHashCode(DrivePairs obj)
{
return obj.Start.GetHashCode();
}
}
and in the controller:
IEqualityComparer<DrivePairs> customComparer = new EqualityComparer();
IEnumerable<DrivePairs> distinctDrivePairs = drivePairs.Distinct(customComparer);
drivePairs = distinctDrivePairs.ToList();
Thanks for all the help and comments
I have not tested it but I think the default equality test is if it is the same instance. Try overriding the Equals method and make it use your properties.
The DrivePairs class type is a reference type(remember reference type and value type concept). So when you check if DrivePairs varible is already added in List collections or not it return false as every DrivePairs varibale has different memory location from other.
Try using either Dictionary or StringDictionary or any other Key value pair collection. It will definately work.