I have a question regarding updating of an object's memory.
If I have two object equal to each other and I change the property of one, it changes the property of the other. That's fine.
If I then make one object equal to a third object, the first two objects have lost all relationship.
Is there a way to make both the first and second equal to the third by only explicitly making one equal to the third?
var a = new ObjectA();
var b = a;
var equality = a.GetHashCode() == b.GetHashCode() // true => makes sense
var c = new ObjectA();
a = c;
equality = a.GetHashCode() == c.GetHashCode() // true => makes sense
equality = a.GetHashCode() == b.GetHashCode() // false => Is it possible to make this true with out again explicitly setting it to be equal
Is there a way to do this?
Thanks a lot!
var a = new ObjectA();
Now you have one object
var b = a;
Now you have two references pointing to the same object.
var c = new ObjectA();
a = c;
Now you have two (different) objects, and three references.
I think the important thing you are missing is value vs reference equality. Reference types, i.e. classes default to reference equality, that means you will compare the address of the objects, not the content. You can change this by overriding the Equals/GetHashCode methods, but you need to be careful if your object is mutable, since it can make your object incompatible with dictionaries and other collections that relies on the HashCode never changing.
Value types, i.e. structs default to value equality, meaning the actual contents are compared.
Records default also default to value equality, and have some nice syntax to help make them be immutable, avoiding the HashCode problem.
So I ended up using Reflection to manually set properties a bit like this:
var a = SomeObject();
var b = a;
var equality = object.ReferenceEquals(a, b) // true of course
var c = SomeObject();
a.GetType().GetProperties().ToList().ForEach(p =>
{
var value = c.GetType().GetProperty(p.Name).GetValue(c);
p.SetValue(a, value);
});
equality = object.ReferenceEquals(a, b) // still true
Now a, b and c are equivalent.
Otherwise #InBetween has a good answer.
Sorry if my question was not phrased so well.
Related
I have two DataTables. I applied the Except operator as follows,
and got either unfiltered or undesired results.
resultDataTable = dtA.AsEnumerable().Except(dtB.AsEnumerable()).CopyToDataTable();
Could anyone please kindly explain to me why Except(dtB.AsEnumerable()) is not the way to put it?
Note:
Both DataTables are plain simple with just one column.
dtA contains a dozen rows of strings.
dtB contains thousands of rows of strings.
I also tried the same syntax with another use case the set operator, Intersect. This does not work either.
resultDataTable2 =dtA.AsEnumerable().Intersect(dtB.AsEnumerable()).CopyToDataTable();
Except will use default comparer ie. it will compare references.
I think you are expecting to filter result and comparison is based on members.
I will recommend you to implement your own IEqualityComparer to compare two objects based on member.
e.g.
resultDataTable = dtA.AsEnumerable().Except(dtB.AsEnumerable(), new TestComparer()).CopyToDataTable();
class TestComparer : IEqualityComparer<MyTestClass>
{
public bool Equals(MyTestClass b1, MyTestClass b2)
{
if (b2 == null && b1 == null)
return true;
else if (b1 == null || b2 == null)
return false;
else if(b1.Prop1 == b2.Prop1 && b1.Prop2 == b2.Prop2) // ToDo add more check based on class
return true;
else
return false;
}
public int GetHashCode(MyTestClass)
{
int hCode = MyTestClass.Height ^ MyTestClass.Length ^ ....; // Add more based on class properties
return hCode.GetHashCode();
}
}
Doc
https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.except?view=net-6.0#system-linq-enumerable-except-1(system-collections-generic-ienumerable((-0))-system-collections-generic-ienumerable((-0))-system-collections-generic-iequalitycomparer((-0)))
Could anyone please kindly explain to me why Except(dtB.AsEnumerable()) is not the way to put it?
When you do a.Except(b) the contents of b are loaded into a hash set. A hash set is a device that doesn't accept duplicates, so it returns false when you try to add something that is already there inside it.
After b is loaded into the hash set, then a is looped over, also being added to the hash set. Anything that adds successfully (set.Add returns true because it is not already there) is returned. Anything that is already there (set.Add returns false because it was added by being in b, or appearing earlier in a) is not returned. You should note that this process also dedupes a, so 1,1,2,3 except 2,3 would return just a single 1. You've achieved "every unique thing in a that isn't in b" - but do check whether you wanted a to be deduped too
A hash set is a wonderful thing, that enables super fast lookups. To do this it relies on two methods that every object has: GetHashcode and Equals. GetHashcode converts an object into a probably-unique number. Equals makes absolutely sure an object A equals B. Hash set tracks all the hashcodes and objects it's seen before, so when you add something it first gets the hashcode of what youre trying to add.. If it never saw that hashcode before it adds the item. If you try add anything that has the same hashcode as something it saw already, it uses Equals to check whether or not it's the same as what it saw it already (sometimes hashcodes are the same for different data) and adds the item if Equals declares it to be different. This whole operation is very fast, much faster than searching object by object through all the objects it saw before.
By default any class in C# gets its implementation of GetHashcode and Equals from object. Object's versions of these methods essentially return the memory address for GetHashcode, and compare the memory addresses for Equals
This works fine for stuff that really is at the same memory address:
var p = new Person(){Name="John"};
var q = p; //same mem address as p
But it doesn't work for objects that have the same data but live at different memory addresses:
var p = new Person(){Name="John"};
var q = new Person(){Name="John"}; //not same mem address as p
If you define two people as being equal if they have the same name, and you want C# to consider them equal in the same way, you have to instruct C# to compare the names, not the memory addresses.
A DataRow is like Person above: just because it has the same data as another DataRow, doesn't mean it's the same row in C#'s opinion. Further, because a single DataRow cannot belong to two datatables, it's certain that the "John" in row 1 of dtA, is a different object to the "John" in row 1 of dtB..
By defaul Equals returns false for these two data rows, so Except will never consider them equal and remove dtA's John because of the presence of John in dtB..
..unless you provide an alternative comparison strategy that overrides C#s default opinion of equality. That might look like:
provide a comparer, like Kalpesh's answer, typos on the class name aside),
override Equals/GetHashcode for the datarows so they work off column data, not memory addresses,
or use some other thing that does already have Equals and GetHashcode that work off data rather than memory addresses
As these are just datatables of a single column full of strings they're notionally not much more than an array of string. If we make them into an array of strings, when we do a.Except(b) we will be comparing strings. By default C#s opinion of whether one string equals another is based on the data content of the string rather than the memory address it lives at1, so you can either use string arrays/lists to start with or convert your dtA/B to a string array:
var arrA = dtA.Rows.Cast<DataRow>().Select(r => r[0] as string).ToArray();
var arrB = dtB.Rows.Cast<DataRow>().Select(r => r[0] as string).ToArray();
var result = arra.Except(arrB);
Techncially we don't even need to call ToArray()..
If you really need the result to be a datatable, make one and add all the strings to it:
var resultDt = new DataTable();
resultDt.Columns.Add("x");
foreach(var s in result)
resultDt.Rows.Add(s);
1: we'll ignore interning for now
I've considered 2 cases:
var a = new { a = 5 };
var b = new { a = 6 };
Console.WriteLine(a.GetType() == b.GetType()); // True
Ideone: http://ideone.com/F8QwHY
and:
var a = new { a = 5, b = 7 };
var b = new { b = 7, a = 6 };
Console.WriteLine(a.GetType() == b.GetType()); // False
Ideone: http://ideone.com/hDTcxX
The question is why does order of fields actually matter?
Is there any reason for this or it's just simply because it is (such is the design).
If the reason is just that anonymus types are not supposed to be used this way and you are not supposed to appeal to GetType, then why does compiler re-use a single class in first case and not just generate a new class for each anonymus type declaration?
So the reason for the design decision was ToString. An anonymous type returns a different string accoding to the order. Read Eric Lippert's blog.
{ a = 5, b = 7 }
{ b = 7, a = 6 }
Demo
C# language specification, section 7.6.10.6, requires the sequence of properties to be the same in order for the anonymous types to be considered identical:
Within the same program, two anonymous object initializers that specify a sequence of properties of the same names and compile-time types in the same order will produce instances of the same anonymous type.
In your second example, the first anonymous class has the sequence of properties {a, b}, while the second anonymous class has the sequence {b, a}. These sequences are not considered to be the same.
You lead on to what I'm guessing the reason is: the types are anonymous so depending on GetType() to return reliable results is a terrible idea.
As for why the compiler re-uses a type if the order matches, I'm guessing it's simply to save time and space. When you take order into account, it's far easier to cache generated classes during compilation and then re-use them when needed.
I have a List that I have populated in the main method of a console project. I pass this population to a method which is meant to take two members of the population and decompose and recombine them in a way to create two new unique members which will later be added to the population.
However when I manipulate the two original members to create the two new unique members the two original members change with in the initial population (hence altering the initial population).This means that when I go to add the new members I get duplication of entries into the List.
I'm not doing anything overly complicated I think I am just doing something stupid.
Does any one have any insight as to why this is happening ?
Here is the method that gets called to choose to initial two members of the population:
public static List<Chromosome<Gene>> runEpoch(Random rand, List<Chromosome<Gene>> population, SelectionMethod selectionMethod)
{
int populationSize = population.Count;
int selectionCount = (int)Math.Truncate((population.Count * 0.75));
if (selectionMethod == SelectionMethod.Tournament)
{
for (int i = 0; i < selectionCount; i++)
{
Chromosome<Gene> parent = selection.runTournament(rand, population);
Chromosome<Gene> parentTwo = selection.runTournament(rand, population);
//Checks for the presence of incestuous mating. In some cases incestuous mating causes a stack overflow to occur that the program can not recover from
if (parent != parentTwo)
{
//where runGeneOperators calls the crossOver method directly
offSpring = runGeneOperators(rand, parent, parentTwo);
}
else
{
i--;
}
}
}
else
{
//NSGAII
}
//fixPopulation is meant to sort and remove any excess members
return fixPopulation(rand, population, selectionMethod, populationSize); ;
}
And here is the code that is creating the two new unique members :
public List<Chromosome<Gene>> crossOver(Random rand, Chromosome<Gene> parentOne, Chromosome<Gene> parentTwo)
{
List<Chromosome<Gene>> offSpring = new List<Chromosome<Gene>>();
int crossPtOne = rand.Next(0, parentOne.Length);
int crossPtTwo = rand.Next(0, parentTwo.Length);
if ((crossPtOne == 0) && (crossPtTwo == 0))
{
offSpring.Add(parentOne);
offSpring.Add(parentTwo);
return offSpring;
}
else
{
GeneNode<Gene> fragOne = parentOne.Children[crossPtOne];
GeneNode<Gene> fragTwo = parentTwo.Children[crossPtTwo];
crossOverPoint = crossPtOne;
GeneNode<Gene> genotype = performCrossOver(parentOne.Genotype, fragTwo);
success = false;
parentOne.repair(genotype);
offSpring.Add(parentOne);
crossOverPoint = crossPtTwo;
GeneNode<Gene> genotype2 = performCrossOver(parentTwo.Genotype, fragOne);
success = false;
parentTwo.repair(genotype2);
offSpring.Add(parentTwo);
}
return offSpring;
}
private GeneNode<Gene> performCrossOver(GeneNode<Gene> tree, GeneNode<Gene> frag)
{
if (tree != null)
{
if (crossOverPoint > 0)
{
if (!success && tree.Left != null)
{
crossOverPoint--;
tree.Children[0] = performCrossOver(tree.Left, frag);
}
}
if (crossOverPoint > 0)
{
if (!success && tree.Right != null)
{
crossOverPoint--;
tree.Children[1] = performCrossOver(tree.Right, frag);
}
}
}
if (!success)
{
if (crossOverPoint == 0)
{
success = true;
return frag;
}
}
return tree;
}
In C#, objects are reference types, meaning adding something to a collection only adds a reference. If you manipulate a variable with the same reference (in your case, the "original" objects), all references pointing to that object will be changed as well. You need to copy the object somehow to have a different object and manipulate that.
To manipulate your collection and objects within it without altering it, you need to do a deep clone of the collection and it's containing objects, so you can alter list A without altering clone B. If you just wish to change some object inside the list you need to make a deep clone of the object inside the list and then alter the deep clone. This will make sure that your original is not altered.
The IClonable interface should be implemented on your objects to make them clonable.
EDIT: Per Henk's Comments, you should just implement your own copy method without implementing the IClonable interface, as per this article.
Cloning of objects comes in two forms, deep and shallow.
Shallow: This form of cloning creates a new object and then copyies the nonstatic fields of the current object to the new object. If a field is a value type, a bit-by-bit copy of the field is performed. If a field is a reference type, the reference is copied but the referred object is not; therefore, the original object and its clone refer to the same object.
Deep:This creates a new object and clones all data members and fields of the clonable object. This object references X2 where the original will reference X.
Here a post from SO that might assist you in cloning.
Item 1 - Keep your list references private
From a API perspective, an object should never give out the reference to its internal list. This is due to the issue you have just found, whereby some other consumer code can go ahead and modify that list as it sees fit. Every reference of that list, as shared around, including the 'owning' object, has their list updated. This is usually not what was intended.
Further, the responsibility is not on the consumer to make its own copy. The responsibility is on the owner of the list to keep its version private.
It's easy enough to create a new List
private List<Gene> initialGeneList;
public List<Gene> InitialGenes
{
get { return new List<Gene>(initialGeneList); } // return a new list
}
This advice also includes passing your own list to another class/method.
Item 2 - Consider using an immutable object
Without going too much into reference vs value types, what this means is treat the object like a value type, and whenever you have an operation, return a new object instead of mutating the original.
Real value types in .net, like integers and DateTimes have the semantic of copy on assignment. e.g. When you assign, a copy is created.
Reference types can also behave like value types, and strings are good example. Many people make the following mistake, because they fail to realize the value-type semantics of System.String.
string s = "Hello World";
s.Substring(0,5); // s unchanged. still 'Hello World'
s = s.Substring(0,5); // s changed. now 'Hello'
The string method Substring() doesn't mutate the original object, but instead returns a new one containing the result of the operation.
Where appropriate, the use of immutable objects can make the program a lot easier to understand and can reduce defects considerably.
When implementing immutable objects, one caveat is the likely need to implement value equality for comparing two objects (by overriding Equals). Objects are reference types, and by default two objects are equal if their references are equal.
In this snippet, the values of the objects are equal, but they are different objects.
string s1 = "Hello";
string s2 = string.Concat("He","l", "lo");
bool equal = s1.Equals(s2);
bool referenceEqual = object.ReferenceEquals(s1, s2);
// s1 == s2 => True, Reference Equal => False
Console.Write("s1 == s2 => {0}, Reference Equal => {1}", equal, referenceEqual);
This is probably a basic question for some, but it affects how I design a piece of my program.
I have a single collection of type A:
IEnumerable<A> myCollection;
I am filtering my collection on 2 different criteria:
IEnumerable<A> subCollection1 = myCollection.Where(x => x.Count > 10);
etc.
Now, I know that the .Where expression will return a new instance of IEnumerable, but does the new collection contain the same reference to an instance of type A that 'myCollection' references, or are new copies of type A created? If new instances of type 'A' are created, is there a way to say that 'subCollection1' references the same instances of A as 'myCollection' references?
Edit: To Add further clarification.
I am looking for a way so that when I make a change to an instance of 'A' in 'subCollection1', that it is also modified for 'myCollection'.
The instances are the same if they are classes, but copies if they are structs/value types.
int, byte and double are value types, as are structs (like System.Drawing.Point and self-defined structs).
But strings, all of your own classes, basically "the rest", are reference types.
Note: LINQ uses the same rules as all other assignments.
For objects:
Person p1 = new Person();
p1.Name = "Mr Jones";
Person p2 = p1;
p2.Name = "Mr Anderssen";
// Now p1.Name is also "Mr Anderssen"
For structs:
Point p1 = new Point();
p1.x = 5;
Point p2 = p1;
p2.x = 10;
// p1.x is still 5
The same rules apply when using LINQ.
Actually it depends on the collection. In some cases, LINQ methods can return cloned objects instead of references to originals. Take a look at this test:
[Test]
public void Test_weird_linq()
{
var names = new[]{ "Fred", "Roman" };
var list = names.Select(x => new MyClass() { Name = x });
list.First().Name = "Craig";
Assert.AreEqual("Craig", list.First().Name);
}
public class MyClass
{
public string Name { get; set; }
}
This test will fail, even though many people believe that the same object will be returned by list.First(). It will work if you use another collection "modified with ToList()".
var list = names.Select(x => new MyClass() { Name = x }).ToList();
I don't know for sure why it works this way, but it's something to have in mind when you write your code :)
This question can help you understand how LINQ works internally.
They are same objects. Where only filters, Select produces (can produce) new instances.
Making a new object that is a reference type is non-trivial. LINQ would have no idea how to do it. LINQ always returns the same instances when dealing with reference types.
I just wanted to add to some of the other answers -- in general, when I'm not sure of something but require a particular behavior, I'll add a unit test for it. You could easily put this into a test and then check for equality, which will tell you if you're looking at a reference of the object in the original container. Some may argue that this is stupid because you "should just know" what happens, but for me I know I will either be 1) unsure because I'm not an awesome programmer, and 2) there are always nights where I have to burn the midnight oil, and it's good to have the reassurance that something behaves as you need it to.
I have a simple class representing an object. It has 5 properties (a date, 2 decimals, an integer and a string). I have a collection class, derived from CollectionBase, which is a container class for holding multiple objects from my first class.
My question is, I want to remove duplicate objects (e.g. objects that have the same date, same decimals, same integers and same string). Is there a LINQ query I can write to find and remove duplicates? Or find them at the very least?
You can remove duplicates using the Distinct operator.
There are two overloads - one uses the default equality comparer for your type (which for a custom type will call the Equals() method on the type). The second allows you to supply your own equality comparer. They both return a new sequence representing your original set without duplicates. Neither overload actually modifies your initial collection - they both return a new sequence that excludes duplicates..
If you want to just find the duplicates, you can use GroupBy to do so:
var groupsWithDups = list.GroupBy( x => new { A = x.A, B = x.B, ... }, x => x )
.Where( g => g.Count() > 1 );
To remove duplicates from something like an IList<> you could do:
yourList.RemoveAll( yourList.Except( yourList.Distinct() ) );
If your simple class uses Equals in a manner that satisfies your requirements then you can use the Distinct method
var col = ...;
var noDupes = col.Distinct();
If not then you will need to provide an instance of IEqualityComparer<T> which compares values in the way you desire. For example (null problems ignored for brevity)
public class MyTypeComparer : IEqualityComparer<MyType> {
public bool Equals(MyType left, MyType right) {
return left.Name == right.Name;
}
public int GetHashCode(MyType type) {
return 42;
}
}
var noDupes = col.Distinct(new MyTypeComparer());
Note the use of a constant for GetHashCode is intentional. Without knowing intimate details about the semantics of MyType it is impossible to write an efficient and correct hashing function. In lieu of an efficient hashing function I used a constant which is correct irrespective of the semantics of the type.