Comparing boxed value types

Comparing boxed value types - c#

Today I stumbled upon an interesting bug I wrote. I have a set of properties which can be set through a general setter. These properties can be value types or reference types.
public void SetValue( TEnum property, object value )
{
if ( _properties[ property ] != value )
{
// Only come here when the new value is different.
}
}
When writing a unit test for this method I found out the condition is always true for value types. It didn't take me long to figure out this is due to boxing/unboxing. It didn't take me long either to adjust the code to the following:
public void SetValue( TEnum property, object value )
{
if ( !_properties[ property ].Equals( value ) )
{
// Only come here when the new value is different.
}
}
The thing is I'm not entirely satisfied with this solution. I'd like to keep a simple reference comparison, unless the value is boxed.
The current solution I am thinking of is only calling Equals() for boxed values. Doing a check for a boxed values seems a bit overkill. Isn't there an easier way?

If you need different behaviour when you're dealing with a value-type then you're obviously going to need to perform some kind of test. You don't need an explicit check for boxed value-types, since all value-types will be boxed** due to the parameter being typed as object.
This code should meet your stated criteria: If value is a (boxed) value-type then call the polymorphic Equals method, otherwise use == to test for reference equality.
public void SetValue(TEnum property, object value)
{
bool equal = ((value != null) && value.GetType().IsValueType)
? value.Equals(_properties[property])
: (value == _properties[property]);
if (!equal)
{
// Only come here when the new value is different.
}
}
( ** And, yes, I know that Nullable<T> is a value-type with its own special rules relating to boxing and unboxing, but that's pretty much irrelevant here.)

Equals() is generally the preferred approach.
The default implementation of .Equals() does a simple reference comparison for reference types, so in most cases that's what you'll be getting. Equals() might have been overridden to provide some other behavior, but if someone has overridden .Equals() in a class it's because they want to change the equality semantics for that type, and it's better to let that happen if you don't have a compelling reason not to. Bypassing it by using == can lead to confusion when your class sees two things as different when every other class agrees that they're the same.

Since the input parameter's type is object, you will always get a boxed value inside the method's context.
I think your only chance is to change the method's signature and to write different overloads.

How about this:
if(object.ReferenceEquals(first, second)) { return; }
if(first.Equals(second)) { return; }
// they must differ, right?
Update
I realized this doesn't work as expected for a certain case:
For value types, ReferenceEquals returns false so we fall back to Equals, which behaves as expected.
For reference types where ReferenceEquals returns true, we consider them "same" as expected.
For reference types where ReferenceEquals returns false and Equals returns false, we consider them "different" as expected.
For reference types where ReferenceEquals returns false and Equals returns true, we consider them "same" even though we want "different"
So the lesson is "don't get clever"

I suppose
I'd like to keep a simple reference comparison, unless the value is boxed.
is somewhat equivalent to
If the value is boxed, I'll do a non-"simple reference comparison".
This means the first thing you'll need to do is to check whether the value is boxed or not.
If there exists a method to check whether an object is a boxed value type or not, it should be at least as complex as that "overkill" method you provided the link to unless that is not the simplest way. Nonetheless, there should be a "simplest way" to determine if an object is a boxed value type or not. It's unlikely that this "simplest way" is simpler than simply using the object Equals() method, but I've bookmarked this question to find out just in case.
(not sure if I was logical)

Related

Check if object is null without using ==

In my program, I have objects derived from Dictionary. I need to check if 2 objects are equal, so I made an overload operator ==.
But at a later point, I need to check if a object is null.
If (object == null)
{...}
So at this point, the program goes into the overload operation I have defined, but will throw a NullReferenceException, since one of the objects to compare is null.
So within the overload operation, I need to check if one object is null, but without using ==, since that would give me a StackOverflowException.
How can I check this?

In my program, I have objects derived from Dictionary.
I would strongly reconsider deriving from Dictionary in the first place. That's very rarely a good idea. Favour composition over inheritance in general, and think very carefully about how you want equality (and hash codes) to behave in the face of mutable data. It's usually a bad sign if two objects are equal at one point and then not equal later.
So at this point, the program goes into the overload operation I have defined, but will throw a NullReferenceException, since one of the objects to compare is null.
That's the problem. Your overloaded == operator should not throw an exception - it should compare the value with null.
Typically an overload of == looks something like this:
public static bool ==(Foo left, Foo right)
{
if (ReferenceEquals(left, right))
{
return true;
}
if (ReferenceEquals(left, null))
{
return false;
}
// Equals should be overridden, and should be handling the case where
// right is null (by checking with ReferenceEquals too). Also consider
// implementing IEquatable<Foo>
return left.Equals(right);
}
Here Object.ReferenceEquals is used to perform the reference identity comparison. You could use this outside (in the code that you're currently considering changing) but it would be cleaner just to make == behave correctly.

You either need to cast to object,
(object)obj == null
or use object.ReferenceEquals(obj, null);
(object is not a nice name for an object)

You have to check for null in your implementation of overriden ==, following these guidelines for example :
http://msdn.microsoft.com/en-us/library/ms173147%28v=vs.80%29.aspx
You could also use
object.ReferenceEquals(myInstance, null);
Please note that overriding base operators such as == and/or directly overriding .Net base types such as Dictionary are both not very good practices.

overriding the == operator, as you have discovered, has non-obvious side effects and generally should be avoided.
You might consider having your object implement IComparable instead, and then calling CompareTo instead of ==.

use the is null operator from c# 7
if(object is null)...

What's a maintainable way to determine if a value is set to its type's default value?

Say I have some code such as the following:
var someCollection = new int[] {};
var result = someCollection.SingleOrDefault();
I then want to determine if result was the default value. However, I want to do so in a maintainable way so that if the element type of someCollection changes, the rest of the code doesn't require changing.
The way this typically seems to be done (in a general sense) is result == null. In this case, of course, the type is not a reference type, so this won't work.
An improvement that avoids this assumption is result == default(int). However, changing the element type would also require changing the argument to default, so the requirement of only changing the type in one place is still not met.
Acceptance Criteria
Built-in logic is preferred over custom logic.
Elegant and concise code is preferred.
Efficient code is preferred. (For reference types, only a reference comparison should occur.)

You can use the default keyword. Since you don't know what the type will be, you can use generics.
public bool IsDefault<T>(T value)
{
return EqualityComparer<T>.Default.Equals(value, default(T));
}

Stealing from Sam, and improving it:
public static bool IsDefault<T>(this T value)
{
return value == null || value.Equals(default(T));
}
No need for a type check. The JIT will make it work because it knows what T is at JIT time.
Note that if the type overrides Equals then it might say false even
when it is default(T) and it might say true even when it is not.
– commented by Eric Lippert

I think this will work.
public static bool IsDefault<T>(this T value)
{
var isValueType = typeof(T).IsValueType;
if (isValueType)
return value.Equals(default(T));
else
return value == null;
}
However, for value types, I figure this will call their overloaded Equals methods, which may or may not be a problem.

Why is the default value of the string type null instead of an empty string?

It's quite annoying to test all my strings for null before I can safely apply methods like ToUpper(), StartWith() etc...
If the default value of string were the empty string, I would not have to test, and I would feel it to be more consistent with the other value types like int or double for example.
Additionally Nullable<String> would make sense.
So why did the designers of C# choose to use null as the default value of strings?
Note: This relates to this question, but is more focused on the why instead of what to do with it.

Why is the default value of the string type null instead of an empty
string?
Because string is a reference type and the default value for all reference types is null.
It's quite annoying to test all my strings for null before I can
safely apply methods like ToUpper(), StartWith() etc...
That is consistent with the behaviour of reference types. Before invoking their instance members, one should put a check in place for a null reference.
If the default value of string were the empty string, I would not have
to test, and I would feel it to be more consistent with the other
value types like int or double for example.
Assigning the default value to a specific reference type other than null would make it inconsistent.
Additionally Nullable<String> would make sense.
Nullable<T> works with the value types. Of note is the fact that Nullable was not introduced on the original .NET platform so there would have been a lot of broken code had they changed that rule.(Courtesy #jcolebrand)

Habib is right -- because string is a reference type.
But more importantly, you don't have to check for null each time you use it. You probably should throw a ArgumentNullException if someone passes your function a null reference, though.
Here's the thing -- the framework would throw a NullReferenceException for you anyway if you tried to call .ToUpper() on a string. Remember that this case still can happen even if you test your arguments for null since any property or method on the objects passed to your function as parameters may evaluate to null.
That being said, checking for empty strings or nulls is a common thing to do, so they provide String.IsNullOrEmpty() and String.IsNullOrWhiteSpace() for just this purpose.

You could write an extension method (for what it's worth):
public static string EmptyNull(this string str)
{
return str ?? "";
}
Now this works safely:
string str = null;
string upper = str.EmptyNull().ToUpper();

You could also use the following, as of C# 6.0
string myString = null;
string result = myString?.ToUpper();
The string result will be null.

Empty strings and nulls are fundamentally different. A null is an absence of a value and an empty string is a value that is empty.
The programming language making assumptions about the "value" of a variable, in this case an empty string, will be as good as initiazing the string with any other value that will not cause a null reference problem.
Also, if you pass the handle to that string variable to other parts of the application, then that code will have no ways of validating whether you have intentionally passed a blank value or you have forgotten to populate the value of that variable.
Another occasion where this would be a problem is when the string is a return value from some function. Since string is a reference type and can technically have a value as null and empty both, therefore the function can also technically return a null or empty (there is nothing to stop it from doing so). Now, since there are 2 notions of the "absence of a value", i.e an empty string and a null, all the code that consumes this function will have to do 2 checks. One for empty and the other for null.
In short, its always good to have only 1 representation for a single state. For a broader discussion on empty and nulls, see the links below.
https://softwareengineering.stackexchange.com/questions/32578/sql-empty-string-vs-null-value
NULL vs Empty when dealing with user input

Why the designers of C# chose to use null as the default value of
strings?
Because strings are reference types, reference types are default value is null. Variables of reference types store references to the actual data.
Let's use default keyword for this case;
string str = default(string);
str is a string, so it is a reference type, so default value is null.
int str = (default)(int);
str is an int, so it is a value type, so default value is zero.

The fundamental reason/problem is that the designers of the CLS specification (which defines how languages interact with .net) did not define a means by which class members could specify that they must be called directly, rather than via callvirt, without the caller performing a null-reference check; nor did it provide a meany of defining structures which would not be subject to "normal" boxing.
Had the CLS specification defined such a means, then it would be possible for .net to consistently follow the lead established by the Common Object Model (COM), under which a null string reference was considered semantically equivalent to an empty string, and for other user-defined immutable class types which are supposed to have value semantics to likewise define default values. Essentially, what would happen would be for each member of String, e.g. Length to be written as something like [InvokableOnNull()] int String Length { get { if (this==null) return 0; else return _Length;} }. This approach would have offered very nice semantics for things which should behave like values, but because of implementation issues need to be stored on the heap. The biggest difficulty with this approach is that the semantics of conversion between such types and Object could get a little murky.
An alternative approach would have been to allow the definition of special structure types which did not inherit from Object but instead had custom boxing and unboxing operations (which would convert to/from some other class type). Under such an approach, there would be a class type NullableString which behaves as string does now, and a custom-boxed struct type String, which would hold a single private field Value of type String. Attempting to convert a String to NullableString or Object would return Value if non-null, or String.Empty if null. Attempting to cast to String, a non-null reference to a NullableString instance would store the reference in Value (perhaps storing null if the length was zero); casting any other reference would throw an exception.
Even though strings have to be stored on the heap, there is conceptually no reason why they shouldn't behave like value types that have a non-null default value. Having them be stored as a "normal" structure which held a reference would have been efficient for code that used them as type "string", but would have added an extra layer of indirection and inefficiency when casting to "object". While I don't foresee .net adding either of the above features at this late date, perhaps designers of future frameworks might consider including them.

Because a string variable is a reference, not an instance.
Initializing it to Empty by default would have been possible but it would have introduced a lot of inconsistencies all over the board.

If the default value of string were the empty string, I would not have to test
Wrong! Changing the default value doesn't change the fact that it's a reference type and someone can still explicitly set the reference to be null.
Additionally Nullable<String> would make sense.
True point. It would make more sense to not allow null for any reference types, instead requiring Nullable<TheRefType> for that feature.
So why did the designers of C# choose to use null as the default value of strings?
Consistency with other reference types. Now, why allow null in reference types at all? Probably so that it feels like C, even though this is a questionable design decision in a language that also provides Nullable.

Perhaps if you'd use ?? operator when assigning your string variable, it might help you.
string str = SomeMethodThatReturnsaString() ?? "";
// if SomeMethodThatReturnsaString() returns a null value, "" is assigned to str.

A String is an immutable object which means when given a value, the old value doesn't get wiped out of memory, but remains in the old location, and the new value is put in a new location. So if the default value of String a was String.Empty, it would waste the String.Empty block in memory when it was given its first value.
Although it seems minuscule, it could turn into a problem when initializing a large array of strings with default values of String.Empty. Of course, you could always use the mutable StringBuilder class if this was going to be a problem.

Since string is a reference type and the default value for reference type is null.

Since you mentioned ToUpper(), and this usage is how I found this thread, I will share this shortcut (string ?? "").ToUpper():
private string _city;
public string City
{
get
{
return (this._city ?? "").ToUpper();
}
set
{
this._city = value;
}
}
Seems better than:
if(null != this._city)
{ this._city = this._city.ToUpper(); }

Maybe the string keyword confused you, as it looks exactly like any other value type declaration, but it is actually an alias to System.String as explained in this question.
Also the dark blue color in Visual Studio and the lowercase first letter may mislead into thinking it is a struct.

Nullable types did not come in until 2.0.
If nullable types had been made in the beginning of the language then string would have been non-nullable and string? would have been nullable. But they could not do this du to backward compatibility.
A lot of people talk about ref-type or not ref type, but string is an out of the ordinary class and solutions would have been found to make it possible.

Why can an int? set to null have instance properties?

I'm curious as to why the following code works (run under the VS debugger):
int? x = null;
null
x.HasValue
false
If x is indeed null, what instance does HasValue refer to? Is HasValue implemented as an extension method, or does the compiler special case this to make it magically work?

Because x isn't a reference type. The ? is just syntactic sugar for Nullable<T>, which is a struct (value type).

int? is actually a structure Nullable<int>. Hence this, your x cannot be null, because it is always instance of a structure.

Hand-waving answer: Nullable structs are magic.
Longer answer: Null is not actually what is represented by the value. When you assign null to a nullable struct, what you will see happen behind the scenes is different.
int? val = null; // what you wrote
Nullable<Int32> val = new Nullable<Int32>(); // what is actually there
In this case, an instance of the struct is created that has the T Value set to a default value and the bool HasValue property set to false.
The only time you will actually obtain a null reference from a Nullable<T> is when it is boxed, as a Nullable<T> boxes directly to T or null, depending upon if the value is set.

There are several meanings to null.
One in programming languages which present variables and memory in a pointer-based manner (which includes C#'s references though it hides some of the details) is "this doesn't point to anything".
Another is "this has no meaningful value".
With reference types, we often use the former to represent the latter. We might use string label = null to mean "no meaningful label. It remains though that it's still also a matter of what's going on in terms of what's where in memory and what's pointing to it. Still, it's pretty darn useful, what a shame we couldn't do so with int and DateTime in C#1.1
That's what Nullable<T> provides, a means to say "no meaningful value", but at the level below it's not null in the same way a null string is (unless boxed). It's been assigned null and is equal to null so it's logically null and null according to some other semantics, but it's not null in the "doesn't point to anything" implementation difference between reference and value types.
It's only the "doesn't point to anything" aspect of reference-type null that stops you from calling instance methods on it.
And actually, even that isn't strictly true. IL let's you call instance methods on a null reference and as long as it doesn't interact with any fields, it will work. It can't work if it needs (directly or indirectly) those fields since they don't exist on a null refernce, but it could call null.FineWithNull() if that method was defined as:
int FineWithNull()
{
//note that we don't actually do anything relating to the state of this object.
return 43;
}
With C# it was decided to disallow this, but it's not a rule for all .NET (I think F# allows it, but I'm not sure, I know unmanaged C++ allowed it and it was useful in some very rare cases).

When using int? x = null then x is assigned a new instance of Nullable<int> and ist value is set to null.
I don't exactly know the internals but I would assume that the assignment operator itself is somewhat overloaded.

A nullable type isn't actually null since it still doesn't get around the fact that value types can't be null. It is, instead, a reference to the Nullable<> struct (which is also a value type and can't be null).
More information here.
Basically, you're always referring to an instance of something when you use a nullable type. The default information returned by that reference is the stored value (or null if there is no stored value).

Nullable isn't really a reference type, and its instance methods are one of the places where this shows up. Fundamentally, it is a struct type containing a boolean flag and a value of the type it is a nullable of. The languages special-case various operators [to be lifting, or to consider (bool?)null false in some cases] and the null literal, and the runtime special-cases boxing, but apart from that it's just a struct.

It's a completely new type. Nullable is not T.
What you have is a generic class something like this:
public struct Nullable<T>
{
public bool HasValue { get { return Value != null; } }
public T Value { get; set; }
}
I'm sure there's more to it (particularly in the getter and setter, but that's it in a nutshell.

The nullable type (in this case: nullable int) has a property of HasValue which is boolean. If HasValue is True, the Value property (of Type T, in this case, an int) will have a valid value.

C# System.Object.operator==()

I'm tryign to get my head around the use of System.Object.operator==().
My Effective C# book, and the page here (http://www.srtsolutions.com/just-what-is-the-default-equals-behavior-in-c-how-does-it-relate-to-gethashcode), says that:
"System.Object.operator==() will call a.Equals(b) to determine if a and b are equal".
So with my code:
object a = 1;
object b = 1;
if(object.Equals(a, b))
{
// Will get here because it calls Int32.Equals(). I understand this.
}
if(a == b)
{
// I expected it to get here, but it doesn't.
}
I expected (a == b) to call Int32's overriden Equals and compare values in the same way that static objet.Equals() does. What am I missing?
Edit: I should perhaps have added that I can see what (a == b) is testing - it's testing reference equality. I was thrown by the book which seems to suggest it will work internally much as static object.Equals(obect, object) will.

I'm not sure why the book would say that; it is emphatically untrue that the default == calls Equals. Additionally, object does NOT overload ==. The operator == by default performs a value-equality comparison for value types and a reference-equality comparison for reference types. Again, it is NOT overloaded for object (it is for string). Therefore, when you compare object a = 1 and object b = 1 using the == operator you are doing a reference-equality comparison. As these are different instances of a boxed int, they will compare differently.
For all that are confused by this issue, I encourage you to read §7.10 and especially §7.10.6 of the specification extremely carefully.
For more on the subtleties of boxing (or why we need it in the first place), I refer you to a previous post on this subject.

As the object type doesn't override == and == checks for reference equality by default, the references of a and b are compared, as both are objects. If you want to compare value equality, you have to unbox the ints first.

When two objects are tested for equality they are tested to see if they are referencing the same object. (EDIT: this is generally true, however == could be overloaded to provide the functionality that you receive from a.equals)
So
object a = 1;
object b = 1;
These do not point to the same address space.
However if you did
object a = 1;
object b = a;
Then these would point to the same address.
For a real life example, take two different apartments in the same building, they have the exact same layout, same 1 bedroom, same kitchen, same paint everything about them is the same, except that apartment a is #101 and apartment b is #102. In one sense they are the same a.equals(b), but in another sense they are completely different a != b.

== implementation of object checks for identity, not equality. You have two variables that point to two different objects that's why == returns false.

You declared a and b as object which is a reference type and not a value type. So with a==b you are comparing references of objects (which will be different) rather than the values.

System.Object.operator==() will call a.Equals(b) to determine if a and b are equal
This is simply not true. If that were the case then you'd have a == b returning true, since a.Equals(b) returns true. Equals is a virtual method, so it doesn't matter that int values are boxed; if you call a.Equals, that compiles to a callvirt and the vtable is used.
So the static == operator does not use a.Equals(b) internally. It tests for reference equality by default. It only does otherwise if the static == operator has been overloaded for the types in the expression as they are declared at compile time.

System.Object does not overload ==, so a == b just tests for reference equality (and returns false). Since operator overloading is implemented as a static method, it's not virtual.
Object.Equals, on the other side, is specified as follows:
The default implementation of Equals supports reference equality for reference types, and bitwise equality for value types. Reference equality means the object references that are compared refer to the same object. Bitwise equality means the objects that are compared have the same binary representation.
Since a and b have the same binary representation, Object.Equals(a, b) returns true.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.