What is the difference between string.Empty and null in memory - c#

I understand the difference of assigning a value or not, what I would like to understand is how the assignment is handle in memory.
What will be stored in the HEAP and in the STACK? Which one is the most efficient?
For example is more efficient to have a method signature like
private Item GetItem(pageModel page, string clickableText = null);
Or
private Item GetItem(pageModel page, string clickableText = "");
Note:
The question is not about which one to use. It is about how their differ in memory.
The proposed method might be called a few hundred times - therefore a different variable assignment might/could have an impact?

There's no difference. The compiler interns string literals, so you're not creating a new string with the call, just referencing an existing string.
The heap and the stack are implementation details in C#. There is some behaviour that depends on the runtime, but the only real contract is that the runtime provides as much memory as you ask for, and guarantees the memory is still there if you access it in the future.
If you do care about the implementation details of the current desktop .NET runtimes, reference types are never passed on the stack. String is a reference type, so it is always passed by reference, and never by value. However, arguments aren't even required to be on the stack in the first place - the reference can also be passed in a register.
In general, in a managed language like C#, you should only care about what exactly happens in memory if you have a good reason it affects the characteristics of your program. The default case should always be thinking about the semantics. Should an empty string mean "no value"? Should a null string mean "no value"? That depends on the semantics of your program. Until you have a good reason to believe the decision is e.g. performance critical, just go with the most clear option, least prone to mistakes, and easiest to read and modify.

A null string is a string that has not been initialized. It is a string variable that hasn't even been given some memory to store data. This will create a null string:
string myString; //Without initializing it, will create a null string.
An empty string is a string that has been initialized and given some memory, but it just doesn't contain any characters (except a null terminator at the end, but you don't see that) so as far as the compiler and you are concerned, it is a string with a length of 0.
string myString = String.Empty; //Will create an empty string.
In terms of efficiency, there shouldn't be a difference at all, but it would good to keep in mind that NULL's can cause projects to crash more than empty strings, unless you are using the NULL pattern in your code.

We have four main types of things we'll be putting in the Stack and Heap as our code is executing: Value Types, Reference Types, Pointers, and Instructions.
Rules
A Reference Type always goes on the Heap.
Value Types and Pointers always go where they were declared. This is a little more complex and needs a bit more understanding of how the Stack works to figure out where "things" are declared.
The Stack, as we mentioned earlier, is responsible for keeping track of where each thread is during the execution of our code (or what's been called).
You can think of it as a thread "state" and each thread has its own stack. When our code makes a call to execute a method the thread starts executing the instructions that have been JIT compiled and and live on the method table, it also puts the method's parameters on the thread stack. Then, as we go through the code and run into variables within the method they are placed on top of the stack.
String.Empty and "" are almost the same, both refer to an existing string that has no content.
Said almost because, "" creates a temporary string in memory (to have something to compare against) while String.Empty is a language constant.
On the other hand, null means nothing, no object at all.
In more familiar terms, String.Empty is like having an empty drawer while null means no drawer at all!

Related

What does the stack entry for a reference type contain?

I'm refreshing my memory on how reference and value types work in .NET. I understand that the entry on the stack for a reference type contains a pointer to a memory location on the heap. What I can't seem to find details about is what else the stack entry contains. So, given the following:
Customer customer;
customer = new Customer();
After the first line of code an entry on the stack containing a null pointer will exist. Does that entry also contain the identifying name "customer"? Does it contain type information?
Have I fundamentally misunderstood something?
Only the pointer to the object is stored, nothing else. Not on the stack, it is stored in a processor register. Only if the method is large and the register is better used for other purposes will it be stored on the stack. Making that the decision is the job of the optimizer. Which is in general fairly unlikely to happen since you'll probably be using properties and methods of the Customer class so keeping the pointer in a register is efficient.
This is otherwise the basic reason why a NullReferenceException can tell you nothing useful about what reference is null. Associating the pointer value back to a named "customer" variable can only be done by a debugger. It needs the PDB file to know the name of the variable, that name is not present in the metadata of the assembly. Which is the basic reason why you cannot use Reflection to discover local variable names and values.
And this is also why debugging only works well when you use it on the Debug build of your project. That disables the optimizer and gets the pointer value to always be stored back to a stack slot, allowing the debugger to reliably read it back. At a significant cost, the basic reason you should only ever deploy the Release build of your project.
The variable name customer does only exist in the source code (as it is of no interest to your computer). During runtime, nobody knows how the variable was called in the code.
Jon Skeet wrote a good article about what is stored on the heap and what goes to the stack.
The first paragraph (what's in a variable?) should answer your question:
The value of a reference type variable is always either a reference or null. If it's a reference, it must be a reference to an object which is compatible with the type of the variable. For instance, a variable declared as Stream s will always have a value which is either null or a reference to an instance of the Stream class. (Note that an instance of a subclass of Stream, eg FileStream, is also an instance of Stream.) The slot of memory associated with the variable is just the size of a reference, however big the actual object it refers to might be. (On the 32-bit version of .NET, for instance, a reference type variable's slot is always just 4 bytes.)
1- Variable names are not stored in memory along with the reference.
2- A null reference doesn't store the type of the object.
Example to make things clear:
string str = null;
if(str is String)
{
Console.WriteLine("I'm a string");
}
else
{
Console.WriteLine("I'm not a string");
}
// This will print: "I'm not a string"
You're actually telling the compiler, allow me to create only string references using the mapped storage location by str.
Variables represent storage locations. Every variable has a type that
determines what values can be stored in the variable

Why do local variables require initialization, but fields do not?

If I create a bool within my class, just something like bool check, it defaults to false.
When I create the same bool within my method, bool check(instead of within the class), i get an error "use of unassigned local variable check". Why?
Yuval and David's answers are basically correct; summing up:
Use of an unassigned local variable is a likely bug, and this can be detected by the compiler at low cost.
Use of an unassigned field or array element is less likely a bug, and it is harder to detect the condition in the compiler. Therefore the compiler makes no attempt to detect the use of an uninitialized variable for fields, and instead relies upon the initialization to the default value in order to make the program behavior deterministic.
A commenter to David's answer asks why it is impossible to detect the use of an unassigned field via static analysis; this is the point I want to expand upon in this answer.
First off, for any variable, local or otherwise, it is in practice impossible to determine exactly whether a variable is assigned or unassigned. Consider:
bool x;
if (M()) x = true;
Console.WriteLine(x);
The question "is x assigned?" is equivalent to "does M() return true?" Now, suppose M() returns true if Fermat's Last Theorem is true for all integers less than eleventy gajillion, and false otherwise. In order to determine whether x is definitely assigned, the compiler must essentially produce a proof of Fermat's Last Theorem. The compiler is not that smart.
So what the compiler does instead for locals is implements an algorithm which is fast, and overestimates when a local is not definitely assigned. That is, it has some false positives, where it says "I can't prove that this local is assigned" even though you and I know it is. For example:
bool x;
if (N() * 0 == 0) x = true;
Console.WriteLine(x);
Suppose N() returns an integer. You and I know that N() * 0 will be 0, but the compiler does not know that. (Note: the C# 2.0 compiler did know that, but I removed that optimization, as the specification does not say that the compiler knows that.)
All right, so what do we know so far? It is impractical for locals to get an exact answer, but we can overestimate not-assigned-ness cheaply and get a pretty good result that errs on the side of "make you fix your unclear program". That's good. Why not do the same thing for fields? That is, make a definite assignment checker that overestimates cheaply?
Well, how many ways are there for a local to be initialized? It can be assigned within the text of the method. It can be assigned within a lambda in the text of the method; that lambda might never be invoked, so those assignments are not relevant. Or it can be passed as "out" to anothe method, at which point we can assume it is assigned when the method returns normally. Those are very clear points at which the local is assigned, and they are right there in the same method that the local is declared. Determining definite assignment for locals requires only local analysis. Methods tend to be short -- far less than a million lines of code in a method -- and so analyzing the entire method is quite quick.
Now what about fields? Fields can be initialized in a constructor of course. Or a field initializer. Or the constructor can call an instance method that initializes the fields. Or the constructor can call a virtual method that initailizes the fields. Or the constructor can call a method in another class, which might be in a library, that initializes the fields. Static fields can be initialized in static constructors. Static fields can be initialized by other static constructors.
Essentially the initializer for a field could be anywhere in the entire program, including inside virtual methods that will be declared in libraries that haven't been written yet:
// Library written by BarCorp
public abstract class Bar
{
// Derived class is responsible for initializing x.
protected int x;
protected abstract void InitializeX();
public void M()
{
InitializeX();
Console.WriteLine(x);
}
}
Is it an error to compile this library? If yes, how is BarCorp supposed to fix the bug? By assigning a default value to x? But that's what the compiler does already.
Suppose this library is legal. If FooCorp writes
public class Foo : Bar
{
protected override void InitializeX() { }
}
is that an error? How is the compiler supposed to figure that out? The only way is to do a whole program analysis that tracks the initialization static of every field on every possible path through the program, including paths that involve choice of virtual methods at runtime. This problem can be arbitrarily hard; it can involve simulated execution of millions of control paths. Analyzing local control flows takes microseconds and depends on the size of the method. Analyzing global control flows can take hours because it depends on the complexity of every method in the program and all the libraries.
So why not do a cheaper analysis that doesn't have to analyze the whole program, and just overestimates even more severely? Well, propose an algorithm that works that doesn't make it too hard to write a correct program that actually compiles, and the design team can consider it. I don't know of any such algorithm.
Now, the commenter suggests "require that a constructor initialize all fields". That's not a bad idea. In fact, it is such a not-bad idea that C# already has that feature for structs. A struct constructor is required to definitely-assign all fields by the time the ctor returns normally; the default constructor initializes all the fields to their default values.
What about classes? Well, how do you know that a constructor has initialized a field? The ctor could call a virtual method to initialize the fields, and now we are back in the same position we were in before. Structs don't have derived classes; classes might. Is a library containing an abstract class required to contain a constructor that initializes all its fields? How does the abstract class know what values the fields should be initialized to?
John suggests simply prohibiting calling methods in a ctor before the fields are initialized. So, summing up, our options are:
Make common, safe, frequently used programming idioms illegal.
Do an expensive whole-program analysis that makes the compilation take hours in order to look for bugs that probably aren't there.
Rely upon automatic initialization to default values.
The design team chose the third option.
When I create the same bool within my method, bool check(instead of
within the class), i get an error "use of unassigned local variable
check". Why?
Because the compiler is trying to prevent you from making a mistake.
Does initializing your variable to false change anything in this particular path of execution? Probably not, considering default(bool) is false anyway, but it is forcing you to be aware that this is happening. The .NET environment prevents you from accessing "garbage memory", since it will initialize any value to their default. But still, imagine this was a reference type, and you'd pass an uninitialized (null) value to a method expecting a non-null, and get a NRE at runtime. The compiler is simply trying to prevent that, accepting the fact that this may sometimes result in bool b = false statements.
Eric Lippert talks about this in a blog post:
The reason why we want to make this illegal is not, as many people
believe, because the local variable is going to be initialized to
garbage and we want to protect you from garbage. We do in fact
automatically initialize locals to their default values. (Though the C
and C++ programming languages do not, and will cheerfully allow you to
read garbage from an uninitialized local.) Rather, it is because the
existence of such a code path is probably a bug, and we want to throw
you in the pit of quality; you should have to work hard to write that
bug.
Why doesn't this apply to a class field? Well, I assume the line had to be drawn somewhere, and local variables initialization are a lot easier to diagnose and get right, as opposed to class fields. The compiler could do this, but think of all the possible checks it would need to be making (where some of them are independent of the class code itself) in order to evaluate if each field in a class is initialized. I am no compiler designer, but I am sure it would be definitely harder as there are plenty of cases that are taken into account, and has to be done in a timely fashion as well. For every feature you have to design, write, test and deploy and the value of implementing this as opposed to the effort put in would be non-worthy and complicated.
Why do local variables require initialization, but fields do not?
The short answer is that code accessing uninitialised local variables can be detected by the compiler in a reliable way, using static analysis. Whereas this isn't the case of fields. So the compiler enforces the first case, but not the second.
Why do local variables require initialization?
This is no more than a design decision of the C# language, as explained by Eric Lippert. The CLR and the .NET environment do not require it. VB.NET, for example, will compile just fine with uninitialised local variables, and in reality the CLR initialises all uninitialised variables to default values.
The same could occur with C#, but the language designers chose not to. The reason is that initialised variables are a huge source of bugs and so, by mandating initialisation, the compiler helps to cut down on accidental mistakes.
Why don't fields require initialization?
So why doesn't this compulsory explicit initialisation happen with fields within a class? Simply because that explicit initialisation could occur during construction, through a property being called by an object initializer, or even by a method being called long after the event. The compiler cannot use static analysis to determine if every possible path through the code leads to the variable being explicitly initialised before us. Getting it wrong would be annoying, as the developer could be left with valid code that won't compile. So C# doesn't enforce it at all and the CLR is left to automatically initialise fields to a default value if not explicitly set.
What about collection types?
C#'s enforcement of local variable initialisation is limited, which often catches developers out. Consider the following four lines of code:
string str;
var len1 = str.Length;
var array = new string[10];
var len2 = array[0].Length;
The second line of code won't compile, as it's trying to read an uninitialised string variable. The fourth line of code compiles just fine though, as array has been initialised, but only with default values. Since the default value of a string is null, we get an exception at run-time. Anyone who's spent time here on Stack Overflow will know that this explicit/implicit initialisation inconsistency leads to a great many "Why am I getting a “Object reference not set to an instance of an object” error?" questions.
Good answers above, but I thought I'd post a much simpler/shorter answer for people to lazy to read a long one (like me).
Class
class Foo {
private string Boo;
public Foo() { /** bla bla bla **/ }
public string DoSomething() { return Boo; }
}
Property Boo may or may not have been initialized in the constructor. So when it finds return Boo; it doesn't assume that it's been initialized. It simply suppresses the error.
Function
public string Foo() {
string Boo;
return Boo; // triggers error
}
The { } characters define the scope of a block of code. The compiler walks the branches of these { } blocks keeping track of stuff. It can easily tell that Boo was not initialized. The error is then triggered.
Why does the error exist?
The error was introduced to reduce the number of lines of code required to make source code safe. Without the error the above would look like this.
public string Foo() {
string Boo;
/* bla bla bla */
if(Boo == null) {
return "";
}
return Boo;
}
From the manual:
The C# compiler does not allow the use of uninitialized variables. If the compiler detects the use of a variable that might not have been initialized, it generates compiler error CS0165. For more information, see Fields (C# Programming Guide). Note that this error is generated when the compiler encounters a construct that might result in the use of an unassigned variable, even if your particular code does not. This avoids the necessity of overly-complex rules for definite assignment.
Reference: https://msdn.microsoft.com/en-us/library/4y7h161d.aspx

Why does the String class not have a parameterless constructor?

int and object have a parameterless constructor. Why not string?
Because there is no point in doing that.
string is immutable. Creating an empty string is just useless.
MSDN:
Strings are immutable--the contents of a string object cannot be changed after the object is created, although the syntax makes it appear as if you can do this.
As Jonathan Lonowski pointed out, we have string.Empty for that.
Update:
To provide more information for you.
You don't have an empty Constructor with a string, however you do have String.Empty. The reason is because a string is an immutable object every instance of a string you modify is actually creating a new string in memory.
For instance: string name = ""; though it is an empty string it will still hold around twenty bytes. Where the string.Empty will only hold around four or eight bytes. So though they mean the same thing, one is more efficient than the other.
However I believe you want an empty Constructor to do manipulation that may be more commonly handled by the StringBuilder. Some really nice usage between the two can be found here (Determine performance hit / usage).
Some additional information on the string can be found here. They are immutable thus the contents cannot be changed afterwards.
Example:
string first = "Greg "; // Creates string "first" in memory.
string last = "Arrigotti "; // Creates string "last" in memory.
string name = first + last; // Creates string "name" in memory.
As you edit one of these, it is simply creating a whole new string in memory. If you are looking at a way to potentially handler user data in a field where no middle name exist for instance, the empty string may contain valid usage.
Hopefully these point you in the proper direction.
Strings are immutable, therefore new String() has no purpose. What would you do with it?
As said before, strings are immutable and therefore if you manipulate a string you actually create a new one every time.
Example:
string s = "str"; // str was created in the memory.
s += "2"; // str2 was created in the memory.
Use StringBuilder when you want to manipulate string(that's why you wanted an empty ctor, right?)
Why indeed?
It would be completely logical and sensical to provide a parameterless constructor for the string type, yet it doesn't have one.
The reason is because the designers of that type thought it would be a much better idea to have string.Empty.
There could be a logical reason for having the ability to construct multiple empty strings that are different instances. I fail to see one off the top of my head, but that doesn't mean someone else can't see one.
There are some technical reasons behind why limiting the usage to string.Empty might be a good idea. First, all empty strings are considered equal, though not necessarily ReferenceEquals, so having multiple empty strings would seemingly make no sense. The second you say that "I have these two seemingly similar things, yet I've attached a different meaning to each" then perhaps you're trying to solve a problem with the wrong tool.
There's also some upshots of having a predefined string.Empty. Whenever you reference it, you're referencing the same object instance as every other place, and thus you don't have lots of empty (and identical) string objects in memory.
But could it be done? Sure.
So while everybody here has tried to justify that there should be no such constructor, I am saying that there could be such a constructor.
However, someone decided to design the type without one.
Also there is already a defined constant for this: String.Empty
int is a value type, and as such it must have a parameterless constructor. There is no consideration that can be made here.
object has no reason to have anything but a parameterless constructor. There is no data to give it. What parameters would you expect it to take? objects constructed with a parameterless constructor also have a purpose; they are used, for example, as objects to lock on. It is however a class, so it doesn't need to have a public parameterless constructor, however since it has no need for parameters, it's a question of whether you want instance of it to be constructed at all; Microsoft chose to make it concrete, rather than abstract.
string is a class, so it isn't required to have a parameterless constructor. The team building it simply never saw a need to have one. One could sensibly use such a constructor to create an empty string, but they choose to expose string.Empty (as well as an empty string literal) as a way of explicitly creating an empty string. Those options have improved clarity over a parameterless constructor.
Another pretty significant advantage of string.Empty and the empty literal string is that they are capable of re-using the same string instance. Since strings are immutable, the only way to observe the difference between two different references to empty strings is through the use of ReferenceEquals (or a lock on the instance). Because there is virtually never a need to go out of your way to have different references to an empty string, removing the parameterless constructor removes the possibility of an equivalent but poorer performing method of constructing an empty string. In the very unlikely event that it is important to construct a new string instance that is an empty string, an empty char array can be passed to the relevant constructor overload, so removing the parameterless constructor doesn't remove any functionality from the end user; it simply forces you to go out of your way to do something really unusual if you want to do something really unusual, which is the sign of good language design.
Provided that you know that string is immuable, your question can be rephrased as the following:
why on earth can't I initiate a null object??
answer:
Because there is no null object :)

What's the practical difference between a variable being mutable vs non-mutable

I'm just learning C# and working with some examples of strings and StringBuilder. From my reading, I understand that if I do this:
string greeting = "Hello";
greeting += " my good friends";
that I get a new string called greeting with the concatenated value. I understand that the run-time(or compiler, or whatever) is actually getting rid of the reference to the original string greeting and replacing it with a new concatenated one of the same name.
I was just wondering what practical application/ramification this has. Why does it matter to me how C# shuffles strings around in the background when the effect to me is simply that my initial variable changed value.
I was wondering if someone could give me a scenario where a programmer would need to know the difference. * a simple example would be nice, as I'm a relative beginner to this.
Thanks in advance..
Strings, again, are a good example. A very common error is:
string greeting = "Hello Foo!";
greeting.Replace("Foo", "World");
Instead of the proper:
string greeting = "Hello Foo!";
greeting = greeting.Replace("Foo", "World");
Unless you knew that string was an immutable class, you could suspect the first method would be appropriate.
Why does it matter to me how C# shuffles strings around in the background when the effect to me is simply that my initial variable changed value.
The other major place where this has huge advantages is when concurrency is introduced. Immutable types are much easier to deal with in a concurrent situation, as you don't have to worry about whether another thread is modifying the same value within the same reference. Using an immutable type often allows you to avoid the potentially significant cost of synchronization (ie: locking).
I understand that the run-time(or compiler, or whatever) is actually getting rid of the reference to the original string greeting and replacing it with a new concatenated one of the same name.
Pedantic intro: No. Objects do not have names -- variables do. It is storing a new object in the same variable. Thus, the name (variable) used to access the object is the same, even though it (the variable) now refers to another object. An object may also be stored in multiple variables and have multiple "names" at the same time or it might not be accessible directly by any variable.
The other parts of the question have already been succinctly answered for the case of strings -- however, the mutable/immutable ramifications are much larger. Here are some questions which may widen the scope of the issue in context.
What happens if you set a property of an object passed into a method? (There are these pesky "value-types" in C#, so it depends...)
What happens if a sequence of actions leaves an object in an inconsistent state? (E.g. property A was set and an error occurred before property B was set?)
What happens if multiple parts of code expect to be modifying the same object, but are not because the object was cloned/duplicated somewhere?
What happens if multiple parts of code do not expect the object to be modified elsewhere, but it is? (This applies in both threading and non-threading situations)
In general, the contract of an object (API and usage patterns/scope/limitations) must be known and correctly adhered to in order to ensure program validity. I generally find that immutable objects make life easier (as then only one of the above "issues" -- a meager 25% -- even applies).
Happy coding.
C# isn't doing any "shuffling", you are! Your statement assigns a new value to the variable, the referenced object itself did not change, you just dropped the reference.
The major reason immutability is useful is this:
String greeting = "Hello";
// who knows what foo does
foo(greeting);
// always prints "Hello" since String is immutable
System.Console.WriteLine(greeting);
You can share references to immutable objects without worrying about other code changing the object--it can't happen. Therefore immutable objects are easier to reason about.
Most of the time, very little effect. However, in the situation of concatenating many strings, the performance hit of garbage collecting all those strings becomes problematic. Do too many string manipulations with just a string, and the performance of your application can take a nosedive.
This is the reason why StringBuilder is more effective when you have a lot of string manipulation to do; leaving all those 'orphaned' strings out there makes a bigger problem for the Garbage Collector than simply modifying an in memory buffer.
I think the main benefit of immutable strings lies in make memory management easier.
C# allocates memory byte by byte for each object. If you create a string "Tom" it takes up three bytes. You may then allocate an integer and that would be four bytes. If you then tried to change the string "Tom" to "Tomas" it would require moving all the other memory to make room for the two new characters a and s.
To eliminate this pain, it's easier (and quicker) to just allocate five new bytes for the string "Tomas".
Does that help?
In performance terms, the advantage of immutuable is copying an object is cheap in terms of both CPU and memory since it only involves making a copy of a pointer. The downside is that writing to the object becomes more expensive since it must make a copy of the object in the process.

Which is the best practice to use in string in ASP.NET(C#)

I want to know which is the preferred way to declare a string in C#:
string strEmpty = String.Empty;
or
string strNull = null;
Which practice is better or what is the difference between these statements.
the first answer makes the value of the string an actual empty string. Assigning it to null makes the pointer point to nothing. This means that if you tried to do strEmpty.Function(), it wouldn't work in the second case.
The first takes more memory initially, but is more clear.
The correct answer depends on what you would do next. If you are just going to reassign the string, I would make it null. If you intend to do stuff to the string (execute functions, append, etc), I would make it string.empty.
The difference is semantical.
Whenever you find in the code a NULL pointer, it means that is truly nothing. It means it has not been initialized to anything, so you can't use it.
On the other hand, an empty string is a string. It's been initialized and any programmer looking at the code will consider that string as a valid object to be used.
Consider for example having a relative URL stored in a string. If you find that string empty, you'll think it's pointing to the root path. But if it's NULL, you should consider it's not a valid variable whatsoever - it's not been initialized and you should not use it.
Based on that, you'll take different paths - an empty URL means something very different from a NULL variable.
There is no "best". What you do depends on what you mean.
An empty string has a length of 0, but it is a string. Do this when you absolutely must return some kind of string, even if it's zero length. This is rare.
A null is not a string in the first place, it's a null pointer. Do this when the absence of a string is a meaningful condition that may influence other parts of the application.
In a typical line-of-business app, the semantic difference is very clear: Null means the value is unknown or not applicable, and Empty means the value is known to be blank.
In other types of app, the semantics are more or less up to you, as long as you document them and apply them consistently across your whole team.
There is no difference in performance.
The difference between null and empty string is indeed semantic but they are largely equivalent by convention.
In .NET the System.String class has a static method IsNullOrEmpty that it is best practice to use in most cases.

Categories