.Net Merge two identical objects, all properties, recursively when null

.Net Merge two identical objects, all properties, recursively when null - c#

I have a task where I am provided a base object that contains objects and primitives from our dataservice, and couple it with data provided on an identical object that the client provides. It needs to end up being one complete object. I'll refer to the object as "MyObject".
Here is what the object looks like from the dataservice:
MyObject.FirstName = null
MyObject.LastName = null
MyObject.DataProperty1 = anotherobject
anotherobject.property1 = somevalue1
anotherobject.property2 = somevalue2
anotherobject.property2 = somevalue2
MyObject.DataProperty2 = yetanotherobject
yetanotherobject.property1 = someothervalue1
yetanotherobject.property2 = someothervalue2
yetanotherobject.property3 = someothervalue3
yetanotherobject.property4 = someothervalue4
Here is what the object looks like when provided by the client side
MyObject.FirstName = John
MyObject.LastName = Doe
MyObject.DataProperty1 = anotherobject
anotherobject.property1 = null
anotherobject.property2 = null
anotherobject.property2 = null
MyObject.DataProperty2 = yetanotherobject
yetanotherobject.property1 = null
yetanotherobject.property2 = null
yetanotherobject.property3 = null
yetanotherobject.property4 = null
I can't expect to know exactly which sub item objects will be null or not, but I do know that recursively, I need the final merged object to contain the actual data from both original objects, and not the nulls. Obviously the objects are going to be way more complicated that what i've typed above, but the gist is of my question is valid.
I've tried doing something like merging two objects in C# I couldn't figure out the non-primitives.
I really don't think this is a task for AutoMapper since the type of MyObject is the same class for both the client and the data side. It wouldn't make sense to map it to itself.
Too bad i cant just go
MyObject1 + MyObject2 = NewCombinedObject haha.
Also, this is legacy code and I realize its not 'best practice' at all. Still need to solve the problem though.

I can see two issues you will run into with this that are potential show-stoppers: nested object references and children that refer to their parents. Both can lead to infinite recursion scenarios and out of memory errors.
You can mitigate the nesting issue by simply deciding you won't merge any deeper than, say, x levels deep. It's not ideal, but it can prevent infinite recursion for that scenario.
The issue with children referring to their parents is more complex. You have to be able to map merged objects to their originals and then, in your output, map the referring node back to the one you've already merged. This is not a task for the faint of heart.
Is this a task that AutoMapper can't do for you? If it is, my gut tells me that it will do this for you far better than you can reinvent it, and with a greatly reduced risk of errors.

Related

Tell if an instance is pointed to by other variables?

This is a longshot but...
I understand there is a way to tell if variable A (var1, var2...., varX) points to instance of class by using Equals:
List<string> _mainInstance = new();
var var1 = _mainInstance;
var var2 = _mainInstance;
var var3 = _mainInstance;
//I understand how to tell if variable points to _mainInstance
if (var1.Equals(_mainInstance))
{
//It does because it points by reference
}
//I want to check if any variable still point to _mainInstance
if (!_mainInstance.HasOthersPointingToIt())
{
//Safe to delete _mainInstance
}
I don't know at design time how many pointers to _myInstance there will be. I want to check if my _mainInstance has any variables still pointing to it every so often. I thought maybe reflection or garbage collection but all my research there is coming up with nothing.
Is there a way to examine a variable (in this case an instance of a class) and tell if anything else (variables, properties of a class instance) still point to it?
Edit:
#GuruStron asks what is the underlying problem I am trying to solve?
Answer: I have a parent/child "tree" that I need to keep track of. That in itself is pretty easy. My difficulty is that the tree has blocks that have a single definition and that definition gets reused.
In the graph below it shows the top tree with stars. The stars represents pointers to block definitions. Below that on the left are the actual block definitions.
The the block definition gets substituted and the final tree looks like this:
When a block pointer is deleted I need to make sure that nothing else is pointing to that block definition and delete the block definition when it is no longer needed.

Diferences between object instantiation in C#: storing objects in references vs. calling a method directly

I have a doubt with the objects declarations in c#. I explain with this example
I can do this:
MyObject obj = New MyObject();
int a = obj.getInt();
Or I can do this
int a = new MyObject().getInt();
The result are the same, but, exists any diferences between this declarations? (without the syntax)
Thanks.

This isn't a declararation: it's a class instantiation.
There's no practical difference: it's all about readability and your own coding style.
I would add that there're few cases where you will need to declare reference to some object: when these objects are IDisposable.
For example:
// WRONG! Underlying stream may still be locked after reading to the end....
new StreamReader(...).ReadToEnd();
// OK! Store the whole instance in a reference so you can dispose it when you
// don't need it anymore.
using(StreamReader r = new StreamReader(...))
{
} // This will call r.Dispose() automatically
As some comment has added, there're a lot of edge cases where instantiating a class and storing the object in a reference (a variable) will be better/optimal, but about your simple sample, I believe the difference isn't enough and it's still a coding style/readability issue.

It's mostly syntax.
The main difference is that you can't use the instance of MyObject in the second example. Also, it may be nominated for Garbage Collection immediately.

No, technically they are the same.
The only thing I would suggest to consider in this case, as if the function does not actual need of instance creation, you may consider declare it static, so you can simply call it like:
int a = MyObject.getInt();
but this naturally depends on concrete implementation.

Understanding Lists of Objects

Was introduced to the concept of the generic List<>. Consider the following
// Create an instance of the Theater class.
this.MarcusTheater = new Theater();
// Set the field values of the Theater.
this.MarcusTheater.Screens = new List<Screen>();
this.MarcusTheater.Screen1 = new Screen();
// Set the field values for Screen1.
this.MarcusTheater.Screen1.Lenght = 23;
this.MarcusTheater.Screen1.Width = 50;
// Add Screen1 to the Screen list.
this.MarcusTheater.Screens.Add(this.MarcusTheater.Screen1);
From my understanding Screen1 is a temporary holder for the Screen instance. Once added to the list it becomes indexed within that list and isn't really Screen1? Since the instance of the Screen object is being stored within the Screen List, can I pull back this object in it's entirety? If so what is the best way to loop through a list<> of Screens in order to find Screen1? I know this might seem like a trivial question but I'm trying to nail down the basics. Thank you in advance.

From my understanding Screen1 is a temporary holder for the Screen
instance.
Kind of. It's is a "holder", but it's not temporary. Even when you add Screen1 to ScreenList, Screen1 is still valid. You've just copied the reference to that object. Figure this:
this.MarcusTheater = new Theater();
this.MarcusTheater.Screens = new List<Screen>();
this.MarcusTheater.Screen1 = new Screen();
// <your stuff here>
this.MarcusTheater.Screens.Add(this.MarcusTheater.Screen1);
Screen1 thisIsTheSameScreen = this.MarcusTheater.Screens[0];
At this point, this.MarcusTheater.Screen1 and thisIsTheSameScreen point to the same object. We're just passing its reference around.
So, if we did something like
thisIsTheSameScreen.Lenght = 20;
We would be changing for everyone, because it's the same object.
Once added to the list it becomes indexed within that list and isn't
really Screen1?
No. It's still the same, we are simply sharing the reference.
Since the instance of the Screen object is being stored within the
Screen List, can I pull back this object in it's entirety?
Sure, just like I did above.
If so what is the best way to loop through a list<> of Screens in
order to find Screen1?
You need a way to identify each screen, like an ID or a name. This way you can iterate that list and fetch the one you're looking for, either using Linq or a simple foreach.
I know this might seem like a trivial question but I'm trying to nail
down the basics. Thank you in advance.
And you're perfectly correct. We all should understand the tools we're using.

I would suggest that you think of class-type storage locations (fields, parameters, array elements, etc.) as holding "object IDs". The statement someScreen.Width = 123; doesn't change someScreen. If, before that statement executed, someScreen identified the 24,601st object that was created since the program started, the statement will ask object #24601 to set its Width property to 123, and leave someScreen referring to object #24601. If one says someList.Add(someScreen), and someList identifies the 8,675,309th object, then, then object #8675309 will be asked to add "object #24601" to itself. Note that the actual object #24601 will not be added to the list--merely its identity.
(I'm unaware of .NET providing a means by which one could determine which object was created between the 24,600th and 24,602nd objects, but if more than 24,602 objects have been created, exactly one such object must exist; during that run of the program, that object can never be anything other than the 24,601st object, nor can any other object ever be the 24,601st; if one accepts hypothetically that a particular object is the 24,601st one created, then "object #24601" may, within that hypothetical context, be used to refer to the object in question.)

Screens is not a temporary screen, it is the one and only instance. When you add it to the list a pointer or reference to screen 1 is made. If you were to modify screen 1 via either the instance or the list, both instances would change in step.
You should have a read about references, and possibly c++ pointers and that should help you understand.

Yes, the objects are reference types and therefore both variables will point to the same object in memory, there's a dead easy way to test this. You can use the Object.ReferencesEquals method to determine if they reference the same object.
Screen screen1 = new Screen();
screen1.Length = 23;
screen1.Width = 50;
Screen screen2 = new Screen();
screen2.Length = 23;
screen2.Width = 50;
List<Screen> screens = new List<Screen>();
screens.Add(screen1);
screens.Add(screen2);
Debug.WriteLine(Object.ReferenceEquals(screen1, screens[0])); // outputs true
Debug.WriteLine(Object.ReferenceEquals(screen2, screens[1])); // outputs true
Debug.WriteLine(Object.ReferenceEquals(screen1, screen2)); // outputs false

Handling null parent T of all object T

BaseUnit > Unit > ContainerUnit
BaseUnit is the core class.
Unit adds a ContainerUnit property called Parent.
ContainerUnit adds a List<Unit> property called Children.
So, all Unit types (including ContainerUnit) must have a parent that is a ContainerUnit. ContainerUnit types can have children that are ContainerUnit types or just Unit types.
So you can have a box of items, some of which are boxes of items.
I want to have a master ContainerUnit that is treated as the highest level parent of all Unit types. But that would make its Parent property null. Meaning, I want to say "who's your daddy?" to anything, without being aware of its position in the hierarchy, but then if I ask (say, in a loop) who the master container's parent is, it gets handled gracefully.
I'm looking for approaches that others have taken to solve this. I did search for this, I just didn't have much luck with my queries.

Having the outermost "universe" container return null for its container is the traditional thing to do. This has the advantage that it is easy. It has the disadvantage that you don't know that you've gone past the edge of the universe until it is too late to get back. As you said in a comment: using "null" as a flag is weak.
Two other solutions that I've seen employed are:
1) The universe object is its own container. This has the advantage that nothing is null; it has the disadvantage that it is easy to go into an infinite loop when walking the container chain, and it is unintuitive; the universe does not actually contain itself. Basically you're using equality as a flag instead of nullability as a flag; this seems weak too.
2) The universe object throws an exception when you ask for the container. This forces the caller to, instead of checking for null container, check instead for "are you the entire universe?" before asking for the container. That is, stop when you get to the top, instead of stopping when you get beyond the top. This is actually a kind of nice solution because it forces people to write defensive code. You can't just ask for a container unless you know there is one. Of course, it requires that the caller be somehow able to identify the universe object without inspecting its parent. You need an "Am I the entire universe?" method, or a well-known singleton object to compare against, or some other mechanism for identifying which is the topmost container.
A third approach is to deny the premise of the question; is it possible to construct your data type so that the container need not be known, or such that the importance of knowing it is minimized?
For example, in the compiler of course we have lots of "container" chains to walk, and we signal the global namespace by having its containing symbol be null (and by it being a well-known singleton object.) But a lot of the time we don't need to ever check for whether the parent is null because instead I write code that builds an abstraction on top of it:
static IEnumerable<Container> AllContainers(this Thing thing)
{
if (thing == null) yield break;
Container current = thing.Container;
while(current != null)
{
yield return current;
current = current.Container;
}
}
Great. Now that I have that helper method, I don't ever need to check the Container property of a thing. If I want to know, "is there any container of this thing that contains this other thing?" then I can say:
var query = from container in oneThing.AllContainers()
where container.Contains(otherThing)
select container;
bool result = query.Any();
Use the power of LINQ to move mechanistic implementation details like "how do I determine when I'm at the top?" into higher-level helper methods. Then write your logic in at the "business" level, not at the "mechanism" level.

The best way I can think of is just to handle your null case. That is, when you're looking at a ContainerUnit and trying to get its Parent you just add a check for null.
And example might be:
//Get all the Master Units from batch XYZ
var masterUnits = batches.Where(b => b.BatchId = XYZ)
.Single().Units.Where(u=> u.Parent == null);
Another example
//Get only units which have a parent [batch is an already initialized variable]
var childUnits = batch.Units.Where(u=> u.Parent != null);

I think having the Parent property return null when it is the highest object in the hierarchy makes the most sense and is the most graceful solution.

Is object creation in getters bad practice?

Let's have an object created in a getter like this :
public class Class1
{
public string Id { get; set; }
public string Oz { get; set; }
public string Poznamka { get; set; }
public Object object
{
get
{
// maybe some more code
return new Object { Id = Id, poznamla = Poznamka, Oz = OZ };
}
}
}
Or should I rather create a Method that will create and return the object ?

Yes, it is bad practice.
Ideally, a getter should not be changing or creating anything (aside from lazy loading, and even then I think it leads to less clear code...). That way you minimise the risk of unintentional side effects.

Properties look like fields but they are methods. This has been known to cause a phenomenal amount of confusion. When a programmer sees code that appears to be accessing a field, there are many assumptions that the programmer makes that may not be true for a property.So there are some common properties design guidelines.
Avoid returning different values from the property getter. If called multiple times in a row, a property method may return a different value each time; a field returns the same value each time.
A property method may require additional memory or return a reference to something that is not actually part of the object's state, so modifying the returned object has no effect on the original object; querying a field always returns a reference to an object that is guaranteed to be part of the original object's state. Working with a property that returns a copy can be very confusing to developers, and this characteristic is frequently not documented.
Consider that a property cannot be passed as an out or ref parameter to a method; a field can.
Avoid long running property getters. A property method can take a long time to execute; field access always completes immediately.
Avoid throwing exceptions from getters.
Do preserve previous values if a property setter throws an exception
Avoid observable side effects.
Allow properties to be set in any order even if this results in a temporary invalid state of objects.
Sources
"CLR via C#", Jeffrey Richter. Chapter 9. Defining Properties Intelligently
"Framework Design Guidelines" 2nd edition, Brad Abrams, Krzysztof Cwalina, Chapter 5.2 Property Design

If you want your getter to create a new object every time it is accessed, that's the way to do it. This pattern is normally refered to as a Factory Method.
However, this is not normally needed on properties (ie. getters and setters), and as such is considered bad practice.

yes, it is ... from the outside, it should be transparent, whether you access a property or a field ...
when reading twice from field, or a property, you expect two things:
there is no impact on the object's (external) behaviour
you get identical results
I have no real knowledge of C#, but I hope, the following makes my point clear. let's start like this:
Object o1 = myInst.object;
Object o2 = myInst.object;
o1.poznamka = "some note";
in the case of a field, conditions like the following will be true:
o1 == o2;
o2.poznamka == "some note";
if you use a property with a getter, that returns a new object every time called, both conditions will be false ...
your getter seems to be meant to produce a temporary snapshot of your instance ... if that is what you want to do, than make it a plain method ... it avoids any ambiguities ...

A property should, to all intents and purposes, act like a field. That means no exceptions should be thrown, and no new objects should be created (so you don't create lots of unneccessary objects if the property is used in a loop)
Use a wrapper class or similar instead.

According to me if something is 'property' the getter should return you a property (basically a data that is already existing) relevant to the object.
In your case, you are returning something that is not a property of that object at that moment. You are not returning a property of your object but a product of some action.
I would go with a method something like GetMyObject() instead. Especially if there is an 'action' will take place, I think it is most of the time best to have a method than a property name.
And try to imagine what would other developers who are not familiar with your code expect after seeing your property.

A property is just a convenient way to express a calculated field.
It should still represent something about an object, regardless of how the value itself is arrived at. For example, if the object in question is an invoice, you might have to add up the cost of each line item, and return the total.
What's written in the question breaks that rule, because returning a copy of the object isn't something that describes the object. If the return value changes between calls to the property without an explicit change of object state, then the object model is broken.
Speaking in generalities, returning a new object like this will just about always break the rule (I can't think of a counter-example right now), so I would say that it's bad practice.
There's also the gotcha of properties where you can so easily and innocently call on a property multiple times and end up running the same code (which hopefully isn't slow!).

For writing code that is easily tested, you have to maintain separation of Object initialization.
i.e while in test cases you do not have hold on test some specific items.
like in House object you dont want to test anything related to kitchen object.
and you wana test only the garden. so while you initiate a house class and initiate object in some constructors or in getters you wont be coding good that will support testing.

As an aside to the comments already made, you can run into some real debugging headaches when lazy loading fields via a property.
I had a class with
private Collection<int> moo;
public Collection<int> Moo
{
get
{
if (this.moo == null) this.moo = new Collection<int>();
return this.moo;
}
}
Then somewhere else in the class there was a public method that referenced
this.moo.Add(baa);
without checking it was instantiated.
It threw a null reference exception, as expected. But the exception was on a UI thread so not immediately obvious where it was coming from. I started tracing through, and everytime I traced through, the error dissapeared.
For a while I have to admit I thought I was going crazy. Debugger - no error. Runtime, error. Much scratching of head later I spotted the error, and realised that the Visual Studio debugger was instantiating the Collection as it displayed the public properties of the class.

It's maybe at most acceptable for structs. For reference types, I would only create a new object in a getter when it's only done once using some lazy-load pattern.

It depends on the use of the getter. It's a great place to include this kind of code for lazy loading.

It is a bad practice. In your example, you should be able to expect the same Object every time you access the object property.

As you have it it is bad but not dis similar to an acceptable practice called lazy loading which can be read about here.
http://www.aspcode.net/Lazy-loading-of-structures-in-C-howto-part-8.aspx

It is a bad practice. But if you are thinking of objects as a bunch of getters & setters you should check the classical discussions about the topic.
As some folks mentioned, lazy loading could be a reason to do so. Depends on the actual business logic you are modeling here. You should create a separate method if it is better for legibility purposes, but if the code to create the object is simple you could avoid the indirection.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

.Net Merge two identical objects, all properties, recursively when null - c#

Related

Tell if an instance is pointed to by other variables?

Diferences between object instantiation in C#: storing objects in references vs. calling a method directly

Understanding Lists of Objects

Handling null parent T of all object T

Is object creation in getters bad practice?

Categories

Resources