String - Difference between Clone, Copy, and Standard Affectation - c#

I've just come across a block like this while browsing through legacy code :
object exeName = _connectionSettings.ApplicationName.Clone();
RandomFunction(exeName);
It seemed useless to me at first, but it made me wonder. Is there a fundamental difference between:
var copiedString = initialString;
var copiedString = initialString.Clone();
var copiedString = string.Copy(initialString);
I've created a basic unit test that seems to show there is none since it behaves the same way regardless of the method used (initial affectation of copiedString, change of the initialString, assertion of copiedString value) . Am I missing something?

Using Reflector to look at the implementation of String.Clone() reveals this:
public object Clone()
{
return this;
}
So the answer is "No, there is no difference between assigning and cloning for a string".
However, Copy() is somewhat different:
public static unsafe string Copy(string str)
{
if (str == null)
{
throw new ArgumentNullException("str");
}
int length = str.Length;
string str2 = FastAllocateString(length);
fixed (char* chRef = &str2.m_firstChar)
{
fixed (char* chRef2 = &str.m_firstChar)
{
wstrcpy(chRef, chRef2, length);
}
}
return str2;
}
This is actually making a copy - but since strings are immutable, it's not very useful anyway.
But - and this is important - Copy() will return a DIFFERENT REFERENCE from the original string, and Clone() will return the SAME REFERENCE as the original string.
Another thing to be aware of is string interning which causes strings with identical values to share the data (and therefore have the same string reference).
For example, the following code will print "Same!":
string s1 = "Hello";
string s2 = "Hello";
if (ReferenceEquals(s1, s2))
Console.WriteLine("Same!");
But the following code will print "Not same!", even though the string values are the same:
string s1 = "Hello";
string s2 = "He";
string s3 = "llo";
string s4 = s2 + s3;
if (!ReferenceEquals(s1, s4))
Console.WriteLine("Not Same!");
We can explicitly intern s4, so that the following prints "Same!":
string s1 = "Hello";
string s2 = "He";
string s3 = "llo";
string s4 = s2 + s3;
s4 = string.Intern(s4);
if (ReferenceEquals(s1, s4))
Console.WriteLine("Same!");

String.Clone() does nothing but return the reference to the same string (see here)
But since strings in C# are immutable anyway, there's no difference between all three methods you've specified.

Since the CLR implements immutable strings, and treats strings like values, semantically, the only time it would ever be an issue in correct code is outside of the managed code sandbox.
In context of managed code, strings should be simply assigned, just like int and byte and float.

Since the CLR implements immutable strings, and treats strings like values, semantically, the only time it would ever be an issue in correct code is outside of the managed code sandbox, and even so, correct code would properly consider all aspects of CLR strings (as in 2 strings may refer to the same "value").
In context of managed code, strings should be simply assigned, just like int and byte and float.

Related

string interning and referenceEquals

I'm trying to understand string interning. Not for any real purpose other than learning.
Here's where I'm at:
Strings are immutable and a reference type. Its this immutability that allows us to do string interning.
Without string interning, the two strings will be two strings on the heap.
e.g.
private static void Main()
{
var a = "foo";
var b = "foo";
ReferenceEquals(a, b); // would expect this to be false...
}
I would expect that ReferenceEquals to be false. It isn't though it's true. I thought to make it true I would have to do:
private static void Main()
{
var a = "foo";
var b = "foo";
ReferenceEquals(a, b); // false??
string.Intern(a);
string.Intern(b);
ReferenceEquals(a, b); // true?
}
Since the interning process, as I understand it, looks for the string in a hash table and if its not there it adds it. On further interning it looks for the string and if it finds it, then it changes the reference to point to the same place in the hash table.
This should speed up comparisons? Since it it doesn't need to check if each char matches and can just check if both strings point to the same location. (Let's ignore the overhead of actually interning for now till I understand how this works).
So what am I not getting. Why is the first code block returning true and not false?
This occurs because "foo" is interned.
static void Main(string[] args)
{
var a = "foo";
var b = "foo";
Console.WriteLine(string.IsInterned(a));
Console.WriteLine(ReferenceEquals(a, b));
Console.ReadLine();
}
The compiler will intern all literals / constants by default.

Assigning a string-literal to a string in C#

Do string-literals have a particular type in C# (like const char* in c++) or does C# just create a new string object for each string-literal that appears in a program ?
I am curious to understand what happens behind the scene when the following statement is executed:
string s1 = "OldValue";
does this call a particular method in the string class (a constructor, or an impicit conversion operator, ...) or does C# create a new string object that contains "OldValue" ( then just assign its reference to s1, just like it would for any reference type ) ?
i am trying to understand what it is in the design of the string class that garantees the value of s1 remains "OldValue":
string s2 = s1;
s2 = "NewValue";
To you last question, why values were preserved - it is not in the String class. It is in the way that object references work.
The String class is not a value type, it is a reference type. It is a full-featured object that is not copied-around when "passed intto/from variables".
When you write:
string s1 = "mom";
string s2 = s1;
string s3 = s1;
s3 = "dad";
there is only one instance of "mom", that is first created somewhere in the heap, then a reference to it is assigned to s1. Then another reference is created and assigned to s2. Then another reference is created and assigned to s3. No copies. References. Like for any real, normal CLR object.
Finally, in the last line, another string is created on the heap and then a reference to it is assigned to the s3 variable. Note that this sentence says absoltely nothing about the "mom" or s1/s2 variables. They didn't note a thing.
Remember that String is not a value-type. It is just an normal immutable object that has some handy Equals and GetHashCode overrides. String class has some little magic inside, but it is not relevant here.
Good question.
Actually in c# strings are stored in the format of buffer array where in every string declaration required 20bytes to store data and post that 2 bytes for each character.
so whenever you declare any string for e.g. string s1 = 'Bhushan';
then on string buffer will be created and will have memory requirements as follows,
Bytes required for Data (Overhead) : 20 Bytes
2 bytes per character so (2 * 7) : 14 Bytes
Overall it will required 20 + 14 = 34 Bytes.
string is an immutable class, that means every time we change the value, it will create new instance.
string s2 = s1;
s2 = "NewValue";
it can be explain like this.
string s2 = s1;
s2 = new string("NewValue"); // It doesn't compile, just an example.
And for string modification, it can be explained like this.
string s = "blah";
s.Insert(0, "blah"); // s is a new instance
The same like:
string s = "blah";
s = new string("blah") + new string("blah"); // Doesn't compile, just an explanation

Get a string to reference another in C#

I'm coming from a C++ background. This question has been asked before, but try as I might I cannot find the answer. Let's say I have:
string[] ArrayOfReallyVeryLongStringNames = new string[500];
ArrayOfReallyVeryLongStringNames[439] = "Hello world!";
Can I create a string that references the above (neither of these will compile):
string a = ref ArrayOfReallyVeryLongStringNames[439]; // no compile
string a = &ArrayOfReallyVeryLongStringNames[439]; // no compile
I do understand that strings are immutable in C#. I also understand that you cannot get the address of a managed object.
I'd like to do this:
a = "Donkey Kong"; // Now ArrayOfReallyVeryLongStringNames[439] = "Donkey Kong";
I have read the Stack Overflow question Make a reference to another string in C#
which has an excellent answer, but to a slightly different question. I do NOT want to pass this parameter to a function by reference. I know how to use the "ref" keyword for passing a parameter by reference.
If the answer is "You cannot do this in C#", is there a convenient workaround?
EDIT:
Some of the answers indicate the question was unclear. Lets ask it in a different way. Say I needed to manipulate all items in the original long-named array that have prime indices. I'd like to add aliases to Array...[2], Array...[3], Array...[5], etc to a list. Then, modify the items in the list using a "for" loop (perhaps by passing the list just created to a function).
In C# the "using" keyword creates an alias to a class or namespace. It seems from the answers, that it is not possible to create an alias to a variable, however.
You could create a wrapper that keeps a reference to the underlying array AND the index of the string:
public sealed class ArrayStringReference
{
private readonly string[] _array;
private readonly int _index;
public ArrayStringReference(string[] array, int index)
{
_array = array;
_index = index;
}
public string Value
{
get
{
return _array[_index];
}
set
{
_array[_index] = value;
}
}
public override string ToString()
{
return Value;
}
}
Then this will work:
string[] ArrayOfReallyVeryLongStringNames = new string[500];
ArrayOfReallyVeryLongStringNames[439] = "Hello world!";
var strRef = new ArrayStringReference(ArrayOfReallyVeryLongStringNames, 439);
Console.WriteLine(ArrayOfReallyVeryLongStringNames[439]); // Outputs "Hello world!"
strRef.Value = "Donkey Kong";
Console.WriteLine(ArrayOfReallyVeryLongStringNames[439]); // Outputs "Donkey Kong"
You could make this more convenient to use by providing an implicit string operator so you don't have to use .Value to access the underlying string:
// Add this to class ArrayStringReference implementation
public static implicit operator string(ArrayStringReference strRef)
{
return strRef.Value;
}
Then instead of having to access the underlying string like this:
strRef.Value = "Donkey Kong";
...
string someString = strRef.Value;
You can do this:
strRef.Value = "Donkey Kong";
...
string someString = strRef; // Don't need .Value
This is just syntactic sugar, but it might make it easier to start using an ArrayStringReference in existing code. (Note that you will still need to use .Value to set the underlying string.)
The closest you can get is this:
unsafe
{
string* a = &ArrayOfReallyVeryLongStringNames[439]; // no compile
}
Which gives an exception:
Cannot take the address of, get the size of, or declare a pointer to a managed type ('string')
So no, not possible...
Also read this MSDN article which explains what types can be used (blittable types).
When I do something like this in C#:
string a = "String 1";
string b = a;
a = "String 2";
Console.WriteLine(a); // String 2
Console.WriteLine(b); // String 1
The thing is, both "String 1" and "String 2" literals are created at the start of the program, and strings are always pointers: at first a references "String 1" literal and afterwards it references "String 2". If you want them to always reference the same thing, in C# you just use the same variable.
The string objects themselves are immutable in C#:
Because a string "modification" is actually a new string creation, you must use caution when you create references to strings. If you create a reference to a string, and then "modify" the original string, the reference will continue to point to the original object instead of the new object that was created when the string was modified.
When the string mutability is needed, for example, to concatenate a lot of strings faster, other classes are used, like StringBuilder.
To sum it up, what you're trying to do is impossible.
In C#, a String is an Object. Therefore String a = "Donkey Kong" says that a now have a reference to this string that is being allocated over the memory. Then all you need to do is:
ArrayOfReallyVeryLongStringNames[439] = a;
And that will copy the refrence (which you should be thinking of in C#!!!) to the location in the string.
BUT!! When you do a="new string";, a will get a new reference. See the example I made:
http://prntscr.com/3kw18v
You can only do this with unsafe mode.
You could create a wrapper
public class StringWrapper
{
public string Value {get;set;}
}
StringWrapper[] arrayOfWrappers = new StringWrapper[500];
arrayOfWrappers[439] = new StringWrapper { Value = "Hello World" };
StringWrapper a = arrayOfWrappers[439];
a.Value = "New Value";
What you are trying to do is universally discouraged, and actively prevented, in C#, where the logic should be independent of the memory model, however, refer to related SO question C# memory address and variable for some info.
EDIT 1
A more canonical approach to your actual problem in C# would be:
// using System.Linq;
string[] raw = new string[] { "alpha", "beta", "gamma", "delta" };
List<int> evenIndices = Enumerable.Range(0, raw.Length)
.Where(x => x % 2 == 0)
.ToList();
foreach (int x in evenIndices)
raw[x] = raw[x] + " (even)";
foreach (string x in raw)
Console.WriteLine(x);
/*
OUTPUT:
alpha (even)
beta
gamma (even)
delta
*/
If you really want to modify the original memory structure itself, then perhaps C++ is a more appropriate language choice for the solution.
EDIT 2
Looking around on SO, you may want to look at this answer Hidden Features of C#? to an unrelated question.
[TestMethod]
public void TestMethod1()
{
string[] arrayOfString = new string[500];
arrayOfString[499] = "Four Ninty Nine";
Console.WriteLine("Before Modification : {0} " , arrayOfString[499]);
string a = arrayOfString[499];
ModifyString(out arrayOfString[499]);
Console.WriteLine("after a : {0}", a);
Console.WriteLine("after arrayOfString [499]: {0}", arrayOfString[499]);
}
private void ModifyString(out string arrayItem)
{
arrayItem = "Five Hundred less one";
}
Of course you can, hehe:
var a = __makeref(array[666]);
__refvalue(a, string) = "hello";
But you would have to have a very good reason to do it this way.

How can i differ two same objects in c#?

Pardon me,I am not very good in explaining questions.
I can better explain my question through following example:
string first = "hello";
string second = "Bye";
first = second;
In the above example,consider the third line first=second .
Here i assigned object second to first. Because strings in c# are immutable i.e Every time you assign a new value to an existing string object, a new object is being created and old object is being released by the CLR.(I read this from here1).
So simply it means the object first in first line is different from object first in third line.
So My question is how can i prove both are different?
i.e if it(string) is possible in C then i can print address of both objects before and after the third statement to prove it.
Is there any method to access there addresses or other alternatives are there?
If you'd like to see the physical location in memory, you can use the following (unsafe) code.
private static void Main(string[] args)
{
unsafe
{
string first = "hello";
fixed (char* p = first)
{
Console.WriteLine("Address of first: {0}", ((int)p).ToString());
}
string second = "Bye";
fixed (char* p = second)
{
Console.WriteLine("Address of second: {0}", ((int)p).ToString());
}
first = second;
fixed (char* p = first)
{
Console.WriteLine("Address of first: {0}", ((int)p).ToString());
}
}
}
Sample output on my machine:
Address of first: 41793976
Address of second: 41794056
Address of first: 41794056
You'll notice, that .NET caches the string instances which is perfectly valid because they are immutable. To demonstrate this behavior, you can change second to hello and all memory addresses will be the same. That's why you shouldn't rely on native memory stuff and just use the managed ways to work with objects.
See also:
The common language runtime conserves string storage by maintaining a
table, called the intern pool, that contains a single reference to
each unique literal string declared or created programmatically in
your program. Consequently, an instance of a literal string with a
particular value only exists once in the system.
Source: String.Intern (MSDN)
I believe that you want the ReferenceEquals method. It can be used to check if two instances of the object are exactly the same - i.e. references the same object.
*you can use the .Equals() method or HashCode() method to compare *
If you have to compare the underlying memory addreses, the following unsafe code might help you (untested):
string first = "hello";
GCHandle handle = GCHandle.Alloc(first, GCHandleType.Pinned);
IntPtr address = handle.AddrOfPinnedObject();
string second = "Bye";
first = second;
GCHandle handle = GCHandle.Alloc(first, GCHandleType.Pinned);
IntPtr address2 = handle.AddrOfPinnedObject();
if (address != address2)
{
// memory addresses are different afterwards
}
for this you should get memory address of first first variable before assigning second to it and again check the memory address after assigning.
for getting the address of string follow this link
may this help you
You've misunderstood what you've read. Yes, strings are immutable. That means you cannot change an existing string. This won't work:
string x = "Hello";
x[3] = 'q';
When you're concatenating strings, you get a new one:
string a = "a";
string b = "b";
string c = a+b; // You get a new string and a and b are unchanged.
Even when you're self-concatenating, you get a new string:
string a = "a";
a += "b"; // The same as a = a + "b" and yields a new string.
But assigning to a variable (or passing to a function, or returning from a function, etc) does NOT create a new string.
Strings are "reference types". That means that this variable:
string a = "Hello";
Is just a reference to the string. Doing this:
string b = a;
Just assigns the reference to the variable. It does not alter the string.
Or, to put it in C terms: Reference variables are pointers to objects. Consider:
string a = "Hello"; // a now points to the string object
string b = a; // b now points to the same object.
What the immutability means is that you cannot change the memory that the pointer points to (the string object itself). But the pointer variable is as changeable as ever. You can assign a different address to it.
To return to your original example:
string first = "hello"; // Allocates memory for "hello" and points first to it.
string second = "Bye"; // Allocates memory for "Bye" and points second to it.
first = second; // Assigns the address of second to first.
In the end, both first and second point to the same address, which is the address of the string Bye. The memory of the string hello is now unreferenced (there are no pointers to it, it's unreachable). The garbage collector will reclaim it sometime later.
Added: Yet another analogy with C. String variables .NET are somewhat like this:
const char* str;
It's a pointer to a constant. You can change the pointer, but you cannot change the stuff that it points to.
Added 2: You should read up on Value Types vs Reference Types in .NET. In a nutshell, value types are all struct types, and reference types are all class types. Value types get copied on assignment (or when passed/returned from a function); reference types are pointers.
Note that there is one unintuitive piece here. The class object, which is the base class of ALL types, is a reference type. Yet Value Types inherit from it, and you can assign a Value Type to a variable of type object. If you do it, this will cause something called boxing and it involves making a copy of the value, so it's a bit of an expensive operation.

Does the immutable principle still apply when manipulating a char in a string?

If I manipulate a specific char off of a string, would it still be considered as a string manipulation internally by CLR, resulting in temporary string creation?
For example :
string myString = "String";
myString[0] = 's';
How about creating a char array[] eqvivalent of the string being edited and perform all position specific manipulation on that and transform it back to string.
Would it help cutting no of temp strings at least to just 2 actual string manipulations?
This code doesn't compile:
error CS0200: Property or indexer
'string.this[int]' cannot be assigned
to -- it is read only
That's because strings are immutable, as you said.
For the purpose of modifying a string, you need to use StringBuilder class (a mutable string for all intents and purposes), that has read-write this[int] property.
StringBuilder builder = new StringBuilder("Sample");
builder[0] = 's'; //produces "sample"
It is probably worth mentioning that although strings are considered immutable in normal usage, it is actually possible to mutate them. If you configure a VS project to "allow unsafe code", you can then use unsafe code to mutate a string:
internal class Program
{
private static void Main()
{
const string SomeString = "hello";
Console.WriteLine("Original string:'{0}'.", SomeString);
unsafe
{
fixed (char* charArray = SomeString)
{
byte* buffer = (byte*)(charArray);
buffer[0] = 66;
buffer[2] = 76;
buffer[4] = 73;
buffer[6] = 78;
buffer[8] = 71;
}
}
Console.WriteLine("Mutated string:'{0}'.", SomeString);
}
}
This could be considered as pretty evil, but it does have its uses, e.g. .NET Regular expressions on bytes instead of chars.
Strings are immutable. You cannot change individual chars. The indexer operator on strings is read-only, so you will get an error if you try to assign using array style index on a string.

Categories