I have the following class:
[StructLayout(LayoutKind.Sequential)]
class Class
{
public int Field1;
public byte Field2;
public short? Field3;
public bool Field4;
}
How can I get the byte offset of Field4 starting from the start of the class data (or object header)?
To illustrate:
Class cls = new Class();
fixed(int* ptr1 = &cls.Field1) //first field
fixed(bool* ptr2 = &cls.Field4) //requested field
{
Console.WriteLine((byte*)ptr2-(byte*)ptr1);
}
The resulting offset is, in this case, 5, because the runtime actually moves Field3 to the end of the type (and pads it), probably because it its type is generic. I know there is Marshal.OffsetOf, but it returns unmanaged offset, not managed.
How can I retrieve this offset from a FieldInfo instance? Is there any .NET method used for that, or do I have to write my own, taking all the exceptions into account (type size, padding, explicit offsets, etc.)?
Offset of a field within a class or struct in .NET 4.7.2:
public static int GetFieldOffset(this FieldInfo fi) =>
GetFieldOffset(fi.FieldHandle);
public static int GetFieldOffset(RuntimeFieldHandle h) =>
Marshal.ReadInt32(h.Value + (4 + IntPtr.Size)) & 0xFFFFFF;
These return the byte offset of a field within a class or struct, relative to the layout of some respective managed instance at runtime. This works for all StructLayout modes, and for both value- and reference-types (including generics, reference-containing, or otherwise non-blittable). The offset value is zero-based relative to the beginning of the user-defined content or 'data body' of the struct or class only, and doesn't include any header, prefix, or other pad bytes.
Discussion
Since struct types have no header, the returned integer offset value can used directly via pointer arithmetic, and System.Runtime.CompilerServices.Unsafe if necessary (not shown here). Reference-type objects, on the other hand, have a header which has to be skipped-over in order to reference the desired field. This object header is usually a single IntPtr, which means IntPtr.Size needs to be added to the the offset value. It is also necessary to dereference the GC ("garbage collection") handle to obtain the object's address in the first place.
With these considerations, we can synthesize a tracking reference to the interior of a GC object at runtime by combining the field offset (obtained via the method shown above) with an instance of the class (e.g. an Object handle).
The following method, which is only meaningful for class (and not struct) types, demonstrates the technique. For simplicity, it uses ref-return and the System.Runtime.CompilerServices.Unsafe libary. Error checking, such as asserting fi.DeclaringType.IsSubclassOf(obj.GetType()) for example, is also elided for simplicity.
/// <summary>
/// Returns a managed reference ("interior pointer") to the value or instance of type 'U'
/// stored in the field indicated by 'fi' within managed object instance 'obj'
/// </summary>
public static unsafe ref U RefFieldValue<U>(Object obj, FieldInfo fi)
{
var pobj = Unsafe.As<Object, IntPtr>(ref obj);
pobj += IntPtr.Size + GetFieldOffset(fi.FieldHandle);
return ref Unsafe.AsRef<U>(pobj.ToPointer());
}
This method returns a managed "tracking" pointer into the interior of the garbage-collected object instance obj.[see comment] It can be used to arbitrarily read or write the field, so this one function replaces the traditional pair of separate getter/setter functions. Although the returned pointer cannot be stored in the GC heap and thus has a lifetime limited to the scope of the current stack frame (i.e., and below), it is very cheap to obtain at any time by simply calling the function again.
Note that this generic method is only parameterized with <U>, the type of the fetched pointed-at value, and not for the type ("<T>", perhaps) of the containing class (the same applies for the IL version below). It's because the bare-bones simplicity of this technique doesn't require it. We already know that the containing instance has to be a reference (class) type, so at runtime it will present via a reference handle to a GC object with object header, and those facts alone are sufficient here; nothing further needs to be known about putative type "T".
It's a matter of opinion whether adding vacuous <T, … >, which would allow us to indicate the where T: class constraint, would improve the look or feel of the example above. It certainly wouldn't hurt anything; I believe the JIT is smart enough to not generate additional generic method instantiations for generic arguments that have no effect. But since doing so seems chatty (other than for stating the constraint), I opted for the minimalism of strict necessity here.
In my own use, rather than passing a FieldInfo or its respective FieldHandle every time, what I actually retain are the various integer offset values for the fields of interest as returned from GetFieldOffset, since these are also invariant at runtime, once obtained. This eliminates the extra step (of calling GetFieldOffset) each time the pointer is fetched. In fact, since I am able to include IL code in my projects, here is the exact code that I use for the function above. As with the C# just shown, it trivially synthesizes a managed pointer from a containing GC-object obj, plus a (retained) integer offset offs within it.
// Returns a managed 'ByRef' pointer to the (struct or reference-type) instance of type U
// stored in the field at byte offset 'offs' within reference type instance 'obj'
.method public static !!U& RefFieldValue<U>(object obj, int32 offs) aggressiveinlining
{
ldarg obj
ldarg offs
sizeof object
add
add
ret
}
So even if you are not able to directly incorporate this IL, showing it here, I think, nicely illustrates the extremely low runtime overhead and alluring simplicity, in general, of this technique.
Example usage
class MyClass { public byte b_bar; public String s0, s1; public int iFoo; }
The first demonstration gets the integer offset of reference-typed field s1 within an instance of MyClass, and then uses it to get and set the field value.
var fi = typeof(MyClass).GetField("s1");
// note that we can get a field offset without actually
// having any instance of 'MyClass'
var offs = GetFieldOffset(fi);
// i.e., later...
var mc = new MyClass();
RefFieldValue<String>(mc, offs) = "moo-maa"; // field "setter"
// note: method call used as l-value, on the left-hand side of '=' assignment!
RefFieldValue<String>(mc, offs) += "!!"; // in-situ access
Console.WriteLine(mc.s1); // --> moo-maa!! (in the original)
// can be used as a non-ref "getter" for by-value access
var _ = RefFieldValue<String>(mc, offs) + "%%"; // 'mc.s1' not affected
If this seems a bit cluttered, you can dramatically clean it up by retaining the managed pointer as ref local variable. As you know, this type of pointer is automatically adjusted--with interior offset preserved--whenever the GC moves the containing object. This means that it will remain valid even as you continue accessing the field unawares. In exchange for allowing this capability, the CLR requires that the ref local variable itself not be allowed to escape its stack frame, which in this case is enforced by the C# compiler.
// demonstrate using 'RuntimeFieldHandle', and accessing a value-type
// field (int) this time
var h = typeof(MyClass).GetField(nameof(mc.iFoo)).FieldHandle;
// later... (still using 'mc' instance created above)
// acquire managed pointer to 'mc.iFoo'
ref int i = ref RefFieldValue<int>(mc, h);
i = 21; // directly affects 'mc.iFoo'
Console.WriteLine(mc.iFoo == 21); // --> true
i <<= 1; // operates directly on 'mc.iFoo'
Console.WriteLine(mc.iFoo == 42); // --> true
// any/all 'ref' uses of 'i' just affect 'mc.iFoo' directly:
Interlocked.CompareExchange(ref i, 34, 42); // 'mc.iFoo' (and 'i' also): 42 -> 34
Summary
The usage examples focused on using the technique with a class object, but as noted, the GetFieldOffset method shown here works perfectly fine with struct as well. Just be sure not to use the RefFieldValue method with value-types, since that code includes adjusting for an expected object header. For that simpler case, just use System.Runtime.CompilerServicesUnsafe.AddByteOffset for your address arithmetic instead.
Needless to say, this technique might seem a bit radical to some. I'll just note that it has worked flawlessly for me for many years, specifically on .NET Framework 4.7.2, and including 32- and 64-bit mode, debug vs. release, plus whichever various JIT optimization settings I've tried.
With some tricks around TypedReference.MakeTypedReference, it is possible to obtain the reference to the field, and to the start of the object's data, then just subtract. The method can be found in SharpUtils.
Related
Please note that, this question is NOT the the same as "Why do local variables require initialization, but fields do not?" or "Why can't I define a default constructor for a struct in .NET?".
Let's say we have the following code:
struct MyStruct {
int num;
}
static void Main(string[] args) {
MyStruct m = new MyStruct();
Console.WriteLine(m.num); // display 0
}
We can see that the when we use new new MyStruct(), default parameterless constructor invokes, which initializs its field to their default value(0 in this case).
Buf if we do:
static void Main(string[] args) {
MyStruct m;
Console.WriteLine(m.num); // compile error
}
This code doesn't compile because we have to assign a value to the struct's fields before we can use them. This mean when we just declare a struct MyStruct m;, the struct's default constructor won't be called, which means that the stack just gets decremented by size of m (allocate space for m). When stack gets decremented, the space can contain any value (e.g values left by previous stack operations)
But if I put a breakpoint:
static void Main(string[] args) {
MyStruct m;
<------------ breakpoint here
...
}
and run the debug mode, when the mouse hovers over on m, I can clearly see that m.num is 0, it is always zero no matter how many time I try.
How come it is always zero? Does CLR initialize the new allocated content of stack to be 0? if CLR does initialize it to 0, then that means MyStruct m; is equivalent to MyStruct m = new MyStruct();, Then why Microsoft team doesn't make MyStruct m; the same as MyStruct m = new MyStruct();?
This also happens with integers and other built-ins; consider local variable int x. The C# spec mandates that it must be assigned prior to use, but the IL initializes it to 0.
Verifiable IL (which is what C# produces unless you use an "unsafe" feature like pointers or the SkipLocalsInit feature Matthew Watson mentioned) guarantees the locals are zeroed out at method startup.
I'd guess this is because if the locals weren't zeroed, they could contain arbitrary data (whatever happened to be on the call stack prior to adding the current method's frame), which would hinder the JITter's safety guarantees.
Otherwise, the JITter would also have to do some kind of definite assignment analysis itself, which would be an unnecessary cost if high-level languages are already going to guarantee it.
C# has an additional rule that says local variables must be definitely assigned prior to use. This extends to fields of value-type variables.
(I'll add that this feature has been helpful to me in the past: if I have a method that returns a struct, and I start the method off with a local variable that I return at the end of the method, then I can use this compiler check to guarantee all code paths through the method fully populate the struct.)
So you're right in that at runtime MyStruct m; and MyStruct m = new MyStruct(); will both lead to m being zeroed-out. The difference is that C# enforces an additional compile-time requirement.
As to why there is this difference, that's a matter of language design. IL is intended to be quickly understood and compiled by the JITter, so fewer / simpler rules makes that job easier. But C# is intended to help developers write programs, and checking that a local is assigned before access is apparently worth the cost of the compile time check to ensure developers don't forget to intentionally give value type variables a value before using them.
I'm aware that Pinnable<T> is an internal class used by the methods in the new Unsafe class, and it's not meant to be used anywhere else other than in that class. This question is not about something practical, but it's just to understand why it's been designed like this and to learn a bit more about the language and its various "tricks" like this one.
As a recap, the Pinnable<T> class is defined here, and it looks like this:
[StructLayout(LayoutKind.Sequential)]
internal sealed class Pinnable<T>
{
public T Data;
}
And it's mainly used in the Span<T>.DangerousCreate method, here:
public static Span<T> DangerousCreate(object obj, ref T objectData, int length)
{
Pinnable<T> pinnable = Unsafe.As<Pinnable<T>>(obj);
IntPtr byteOffset = Unsafe.ByteOffset<T>(ref pinnable.Data, ref objectData);
return new Span<T>(pinnable, byteOffset, length);
}
The reason for Pinnable<T> being that it's used to keep track of the original object, in case the Span<T> instance was created by one (instead of a native pointer).
Given that reference type doesn't matter when pinning a reference (fixing both a ref T and Unsafe.As<T, byte>(ref T) works the same), is there a specific reason why the Pinnable<T> class was made generic? The original design in DotNetCross here in fact had a Pinnable class with just a single byte field, and it worked just the same. Is there any reason why using a generic class in this case would be an advantage, other than avoiding to cast the reference time when writing/reading/returning it?
Is there any other way, other than this unsafe-cast done with Unsafe.As, to get a reference to an object (I mean a reference to the object contents, otherwise it'd be the same as any variable of a class type)? I mean, any way to get a reference (which should basically have the same address of the actual object variable in the first place, right?) to an object without having to pass through some custom defined secondary class.
First of all, the Struct in [StructLayout(LayoutKind.Sequential)] doesn't mean that it is only valid for structs, it means the layout of the actual structure of the fields in memory, be it in a class or in a value type. This controls the actual runtime layout of the data, not just how the type would marshal to unmanaged code. The Sequential is important because without it, the runtime is pretty much free to store the memory however it sees fit, which means that Data may have some padding before it.
From what I understand about the implementation, the reason for Pinnable is to allow creating an instance of Span to a memory that may be moved by the GC, without having to pin the object first. If you don't use actual pointers and just references, nothing at all will need to be pinned.
I have noticed that it was introduced in a commit with a description saying it made Span more "portable" (a bold word for something that does a lot of unsafe things). I can't think of any other reason than something related to alignment for why it is generic. I suppose representing a T in terms of an offset from another T is better than as an offset from a byte. It may happen that the type of the first field may play a role in its actual address, even if the type was marked with LayoutKind.Sequential.
A reference to an object is different from an interior reference to an object (a reference to its data). It is implementation defined, but in .NET Framework, an instance of any class (or a boxed value type) starts with a header consisting of a sync block (for lock) and a pointer to the method table, a.k.a. the type of the object. On 32-bit, the header is 8 bytes, but the actual pointer points to the pointer to the method table (for performance reasons, getting the type happens more often than locking an object).
One but not portable way of getting the pointer to the start of the data is therefore casting the object reference to a pointer and adding 4 bytes to it. There the first field should start.
Another way I can think of is utilising GCHandle.AddrOfPinnedObject. It is commonly used for accessing array or string data, but it works for other objects:
[StructLayout(LayoutKind.Sequential)]
class Obj
{
public int A;
}
var obj = new Obj();
var gc = GCHandle.Alloc(obj, GCHandleType.Pinned);
IntPtr interior = gc.AddrOfPinnedObject();
Marshal.WriteInt32(interior, 0, 16);
Console.WriteLine(obj.A);
I think this actually is quite portable, but still needs to pin the object (there is InternalAddrOfPinnedObject defined in GCHandle, but even if that doesn't check whether the handle is actually pinned, the returned value may not be valid if it was used on a non-pinned object).
Still, the technique Span uses seems like the most portable way of doing that, since a lot of the underlying work is done in pure CIL (like reference arithmetics).
I have a class, and I want to inspect its fields and report eventually how many bytes each field takes. I assume all fields are of type Int32, byte, etc.
How can I find out easily how many bytes does the field take?
I need something like:
Int32 a;
// int a_size = a.GetSizeInBytes;
// a_size should be 4
You can't, basically. It will depend on padding, which may well be based on the CLR version you're using and the processor etc. It's easier to work out the total size of an object, assuming it has no references to other objects: create a big array, use GC.GetTotalMemory for a base point, fill the array with references to new instances of your type, and then call GetTotalMemory again. Take one value away from the other, and divide by the number of instances. You should probably create a single instance beforehand to make sure that no new JITted code contributes to the number. Yes, it's as hacky as it sounds - but I've used it to good effect before now.
Just yesterday I was thinking it would be a good idea to write a little helper class for this. Let me know if you'd be interested.
EDIT: There are two other suggestions, and I'd like to address them both.
Firstly, the sizeof operator: this only shows how much space the type takes up in the abstract, with no padding applied round it. (It includes padding within a structure, but not padding applied to a variable of that type within another type.)
Next, Marshal.SizeOf: this only shows the unmanaged size after marshalling, not the actual size in memory. As the documentation explicitly states:
The size returned is the actually the
size of the unmanaged type. The
unmanaged and managed sizes of an
object can differ. For character
types, the size is affected by the
CharSet value applied to that class.
And again, padding can make a difference.
Just to clarify what I mean about padding being relevant, consider these two classes:
class FourBytes { byte a, b, c, d; }
class FiveBytes { byte a, b, c, d, e; }
On my x86 box, an instance of FourBytes takes 12 bytes (including overhead). An instance of FiveBytes takes 16 bytes. The only difference is the "e" variable - so does that take 4 bytes? Well, sort of... and sort of not. Fairly obviously, you could remove any single variable from FiveBytes to get the size back down to 12 bytes, but that doesn't mean that each of the variables takes up 4 bytes (think about removing all of them!). The cost of a single variable just isn't a concept which makes a lot of sense here.
Depending on the needs of the questionee, Marshal.SizeOf might or might not give you what you want. (Edited after Jon Skeet posted his answer).
using System;
using System.Runtime.InteropServices;
public class MyClass
{
public static void Main()
{
Int32 a = 10;
Console.WriteLine(Marshal.SizeOf(a));
Console.ReadLine();
}
}
Note that, as jkersch says, sizeof can be used, but unfortunately only with value types. If you need the size of a class, Marshal.SizeOf is the way to go.
Jon Skeet has laid out why neither sizeof nor Marshal.SizeOf is perfect. I guess the questionee needs to decide wether either is acceptable to his problem.
From Jon Skeets recipe in his answer I tried to make the helper class he was refering to. Suggestions for improvements are welcome.
public class MeasureSize<T>
{
private readonly Func<T> _generator;
private const int NumberOfInstances = 10000;
private readonly T[] _memArray;
public MeasureSize(Func<T> generator)
{
_generator = generator;
_memArray = new T[NumberOfInstances];
}
public long GetByteSize()
{
//Make one to make sure it is jitted
_generator();
long oldSize = GC.GetTotalMemory(false);
for(int i=0; i < NumberOfInstances; i++)
{
_memArray[i] = _generator();
}
long newSize = GC.GetTotalMemory(false);
return (newSize - oldSize) / NumberOfInstances;
}
}
Usage:
Should be created with a Func that generates new Instances of T. Make sure the same instance is not returned everytime. E.g. This would be fine:
public long SizeOfSomeObject()
{
var measure = new MeasureSize<SomeObject>(() => new SomeObject());
return measure.GetByteSize();
}
It can be done indirectly, without considering the alignment.
The number of bytes that reference type instance is equal service fields size + type fields size.
Service fields(in 32x takes 4 bytes each, 64x 8 bytes):
Sysblockindex
Pointer to methods table
+Optional(only for arrays) array size
So, for class without any fileds, his instance takes 8 bytes on 32x machine. If it is class with one field, reference on the same class instance, so, this class takes(64x):
Sysblockindex + pMthdTable + reference on class = 8 + 8 + 8 = 24 bytes
If it is value type, it does not have any instance fields, therefore in takes only his fileds size. For example if we have struct with one int field, then on 32x machine it takes only 4 bytes memory.
I had to boil this down all the way to IL level, but I finally got this functionality into C# with a very tiny library.
You can get it (BSD licensed) at bitbucket
Example code:
using Earlz.BareMetal;
...
Console.WriteLine(BareMetal.SizeOf<int>()); //returns 4 everywhere I've tested
Console.WriteLine(BareMetal.SizeOf<string>()); //returns 8 on 64-bit platforms and 4 on 32-bit
Console.WriteLine(BareMetal.SizeOf<Foo>()); //returns 16 in some places, 24 in others. Varies by platform and framework version
...
struct Foo
{
int a, b;
byte c;
object foo;
}
Basically, what I did was write a quick class-method wrapper around the sizeof IL instruction. This instruction will get the raw amount of memory a reference to an object will use. For instance, if you had an array of T, then the sizeof instruction would tell you how many bytes apart each array element is.
This is extremely different from C#'s sizeof operator. For one, C# only allows pure value types because it's not really possible to get the size of anything else in a static manner. In contrast, the sizeof instruction works at a runtime level. So, however much memory a reference to a type would use during this particular instance would be returned.
You can see some more info and a bit more in-depth sample code at my blog
if you have the type, use the sizeof operator. it will return the type`s size in byte.
e.g.
Console.WriteLine(sizeof(int));
will output:
4
You can use method overloading as a trick to determine the field size:
public static int FieldSize(int Field) { return sizeof(int); }
public static int FieldSize(bool Field) { return sizeof(bool); }
public static int FieldSize(SomeStructType Field) { return sizeof(SomeStructType); }
Simplest way is: int size = *((int*)type.TypeHandle.Value + 1)
I know this is implementation detail but GC relies on it and it needs to be as close to start of the methodtable for efficiency plus taking into consideration how GC code complex is nobody will dare to change it in future. In fact it works for every minor/major versions of .net framework+.net core. (Currently unable to test for 1.0)
If you want more reliable way, emit a struct in a dynamic assembly with [StructLayout(LayoutKind.Auto)] with exact same fields in same order, take its size with sizeof IL instruction. You may want to emit a static method within struct which simply returns this value. Then add 2*IntPtr.Size for object header. This should give you exact value.
But if your class derives from another class, you need to find each size of base class seperatly and add them + 2*Inptr.Size again for header. You can do this by getting fields with BindingFlags.DeclaredOnly flag.
System.Runtime.CompilerServices.Unsafe
Use System.Runtime.CompilerServices.Unsafe.SizeOf<T>() where T: unmanaged
(when not running in .NET Core you need to install that NuGet package)
Documentation states:
Returns the size of an object of the given type parameter.
It seems to use the sizeof IL-instruction just as Earlz solution does as well. (source)
The unmanaged constraint is new in C# 7.3
Can a Dynamic Method work like a normal method or code, in that it can access variables where variables can normally be accessed, call methods, and initialize variables (of course in the scope of the method)?
The only examples I've seen is where it is passed some parameters and returns some value and does nothing to change any variables outside of it.
I'm talking about the System.Reflection.Emit.DynamicMethod class. I'm having trouble understanding it since one needs to use MSIL which I don't know much of yet.
Yes. A DynamicMethod can be attached to a class, in which case it can access class-private static fields (and possibly class-private fields if the DynamicMethod is an instance method, but I don't recall whether that's a supported scenario). It can also access assembly-internal methods, properties, and types that are internal to the assembly in which the DynamicMethod is created.
What's the scenario you're using dynamic methods for?
I have some blog articles about dynamic programming, including a couple entries about using the DynamicMethod class on my blog: http://robpaveza.net/tag/dynamic-programming . Specifically, this article talks about how to calculate a file revision proof, and you can see the result implementation here (evidently, I never wrote part 2, but the implementation in BN# that I linked was the result of the analysis).
Let me walk you through the Compile method:
Type parameterType = typeof(uint).MakeByRefType()
The final method is going to take out uint parameters; this line obtains a reference to the uint-ref runtime type. The method declaration would look like this if I were to write it in normal C#:
public static void CheckRevision(out uint a, out uint b, out uint c, out uint s);
38-40. foreach (string formula in formulas) CompileStandardFormula(generator, formula)
As I mention in my blog post about it, the math that I do is always provided in the form of:
A=A-S B=B-C C=C+A A=A+B
Where A, B, and C are state variables and S is an input (the next uint value from the file).
The CompileStandardFormula function emits the IL that computes the logic for one individual operation of the four shown. Recall that the CLR is a stack-based state machine, and math operations occur with the values on the stack being popped and results being pushed. So, for A=A-S, for example, the following IL is what would be emitted:
ldarg.0 // push &A, which is a reference to the location that actually contains the value of A
ldarg.0 // push &A
ldind.u4 // dereference the top-most value on the stack, which puts the actual value of &A ready for operation
ldarg.3 // push &S
ldind.u4 // dereference &S
sub // subtracts [stack-1] from [stack-2], which effectively is A-S
stind.u4 // remember the first ldarg.0? That's getting accessed now and the subtraction result is going there
So, at this point, it should be pretty easy to figure out: my DynamicMethod compiles the math operation required to update all state variables for a single pass in the file. After all of the IL is emitted, because we know the state of the stack has nothing on it (more than when the method entered, anyway), we can just throw out a quick 'ret' instruction and we're done.
Anyway, hope this is helpful.
Someone asked me the other day when they should use the parameter keyword out instead of ref. While I (I think) understand the difference between the ref and out keywords (that has been asked before) and the best explanation seems to be that ref == in and out, what are some (hypothetical or code) examples where I should always use out and not ref.
Since ref is more general, why do you ever want to use out? Is it just syntactic sugar?
You should use out unless you need ref.
It makes a big difference when the data needs to be marshalled e.g. to another process, which can be costly. So you want to avoid marshalling the initial value when the method doesn't make use of it.
Beyond that, it also shows the reader of the declaration or the call whether the initial value is relevant (and potentially preserved), or thrown away.
As a minor difference, an out parameter needs not be initialized.
Example for out:
string a, b;
person.GetBothNames(out a, out b);
where GetBothNames is a method to retrieve two values atomically, the method won't change behavior whatever a and b are. If the call goes to a server in Hawaii, copying the initial values from here to Hawaii is a waste of bandwidth. A similar snippet using ref:
string a = String.Empty, b = String.Empty;
person.GetBothNames(ref a, ref b);
could confuse readers, because it looks like the initial values of a and b are relevant (though the method name would indicate they are not).
Example for ref:
string name = textbox.Text;
bool didModify = validator.SuggestValidName(ref name);
Here the initial value is relevant to the method.
Use out to denote that the parameter is not being used, only set. This helps the caller understand that you're always initializing the parameter.
Also, ref and out are not just for value types. They also let you reset the object that a reference type is referencing from within a method.
You're correct in that, semantically, ref provides both "in" and "out" functionality, whereas out only provides "out" functionality. There are some things to consider:
out requires that the method accepting the parameter MUST, at some point before returning, assign a value to the variable. You find this pattern in some of the key/value data storage classes like Dictionary<K,V>, where you have functions like TryGetValue. This function takes an out parameter that holds what the value will be if retrieved. It wouldn't make sense for the caller to pass a value into this function, so out is used to guarantee that some value will be in the variable after the call, even if it isn't "real" data (in the case of TryGetValue where the key isn't present).
out and ref parameters are marshaled differently when dealing with interop code
Also, as an aside, it's important to note that while reference types and value types differ in the nature of their value, every variable in your application points to a location of memory that holds a value, even for reference types. It just happens that, with reference types, the value contained in that location of memory is another memory location. When you pass values to a function (or do any other variable assignment), the value of that variable is copied into the other variable. For value types, that means that the entire content of the type is copied. For reference types, that means that the memory location is copied. Either way, it does create a copy of the data contained in the variable. The only real relevance that this holds deals with assignment semantics; when assigning a variable or passing by value (the default), when a new assignment is made to the original (or new) variable, it does not affect the other variable. In the case of reference types, yes, changes made to the instance are available on both sides, but that's because the actual variable is just a pointer to another memory location; the content of the variable--the memory location--didn't actually change.
Passing with the ref keyword says that both the original variable and the function parameter will actually point to the same memory location. This, again, affects only assignment semantics. If a new value is assigned to one of the variables, then because the other points to the same memory location the new value will be reflected on the other side.
It depends on the compile context (See Example below).
out and ref both denote variable passing by reference, yet ref requires the variable to be initialized before being passed, which can be an important difference in the context of Marshaling (Interop: UmanagedToManagedTransition or vice versa)
MSDN warns:
Do not confuse the concept of passing by reference with the concept of reference types. The two concepts are not the same. A method parameter can be modified by ref regardless of whether it is a value type or a reference type. There is no boxing of a value type when it is passed by reference.
From the official MSDN Docs:
out:
The out keyword causes arguments to be passed by reference. This is similar to the ref keyword, except that ref requires that the variable be initialized before being passed
ref:
The ref keyword causes an argument to be passed by reference, not by value. The effect of passing by reference is that any change to the parameter in the method is reflected in the underlying argument variable in the calling method. The value of a reference parameter is always the same as the value of the underlying argument variable.
We can verify that the out and ref are indeed the same when the argument gets assigned:
CIL Example:
Consider the following example
static class outRefTest{
public static int myfunc(int x){x=0; return x; }
public static void myfuncOut(out int x){x=0;}
public static void myfuncRef(ref int x){x=0;}
public static void myfuncRefEmpty(ref int x){}
// Define other methods and classes here
}
in CIL, the instructions of myfuncOut and myfuncRef are identical as expected.
outRefTest.myfunc:
IL_0000: nop
IL_0001: ldc.i4.0
IL_0002: starg.s 00
IL_0004: ldarg.0
IL_0005: stloc.0
IL_0006: br.s IL_0008
IL_0008: ldloc.0
IL_0009: ret
outRefTest.myfuncOut:
IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldc.i4.0
IL_0003: stind.i4
IL_0004: ret
outRefTest.myfuncRef:
IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldc.i4.0
IL_0003: stind.i4
IL_0004: ret
outRefTest.myfuncRefEmpty:
IL_0000: nop
IL_0001: ret
nop: no operation, ldloc: load local, stloc: stack local, ldarg: load argument, bs.s: branch to target....
(See: List of CIL instructions )
Below are some notes which i pulled from this codeproject article on C# Out Vs Ref
It should be used only when we are expecting multiple outputs from a function or a method. A thought on structures can be also a good option for the same.
REF and OUT are keywords which dictate how data is passed from caller to callee and vice versa.
In REF data passes two way. From caller to callee and vice-versa.
In Out data passes only one way from callee to caller. In this case if Caller tried to send data to the callee it will be overlooked / rejected.
If you are a visual person then please see this yourtube video which demonstrates the difference practically https://www.youtube.com/watch?v=lYdcY5zulXA
Below image shows the differences more visually
You need to use ref if you plan to read and write to the parameter. You need to use out if you only plan to write. In effect, out is for when you'd need more than one return value, or when you don't want to use the normal return mechanism for output (but this should be rare).
There are language mechanics that assist these use cases. Ref parameters must have been initialized before they are passed to a method (putting emphasis on the fact that they are read-write), and out parameters cannot be read before they are assigned a value, and are guaranteed to have been written to at the end of the method (putting emphasis on the fact that they are write only). Contravening to these principles results in a compile-time error.
int x;
Foo(ref x); // error: x is uninitialized
void Bar(out int x) {} // error: x was not written to
For instance, int.TryParse returns a bool and accepts an out int parameter:
int value;
if (int.TryParse(numericString, out value))
{
/* numericString was parsed into value, now do stuff */
}
else
{
/* numericString couldn't be parsed */
}
This is a clear example of a situation where you need to output two values: the numeric result and whether the conversion was successful or not. The authors of the CLR decided to opt for out here since they don't care about what the int could have been before.
For ref, you can look at Interlocked.Increment:
int x = 4;
Interlocked.Increment(ref x);
Interlocked.Increment atomically increments the value of x. Since you need to read x to increment it, this is a situation where ref is more appropriate. You totally care about what x was before it was passed to Increment.
In the next version of C#, it will even be possible to declare variable in out parameters, adding even more emphasis on their output-only nature:
if (int.TryParse(numericString, out int value))
{
// 'value' exists and was declared in the `if` statement
}
else
{
// conversion didn't work, 'value' doesn't exist here
}
How to use in or out or ref in C#?
All keywords in C# have the same functionality but with some boundaries.
in arguments cannot be modified by the called method.
ref arguments may be modified.
ref must be initialized before being used by caller it can be read and updated in the method.
out arguments must be modified by the caller.
out arguments must be initialized in the method
Variables passed as in arguments must be initialized before being passed in a method call. However, the called method may not assign a value or modify the argument.
You can't use the in, ref, and out keywords for the following kinds of methods:
Async methods, which you define by using the async modifier.
Iterator methods, which include a yield return or yield break statement.
Still feel the need for a good summary, this is what I came up with.
Summary,
When we are inside the function, this is how we specify the variable data access control,
in = R
out = must W before R
ref = R+W
Explanation,
in
Function may only READ that variable.
out
Variable must not be initialised first because,
function MUST WRITE to it before READ.
ref
Function may READ/WRITE to that variable.
Why is it named as such?
Focusing on where data gets modified,
in
Data must only be set before entering (in) function.
out
Data must only be set before leaving (out) function.
ref
Data must be set before entering (in) function.
Data may be set before leaving (out) function.
out is more constraint version of ref.
In a method body, you need to assign to all out parameters before leaving the method.
Also an values assigned to an out parameter is ignored, whereas ref requires them to be assigned.
So out allows you to do:
int a, b, c = foo(out a, out b);
where ref would require a and b to be assigned.
How it sounds:
out = only initialize/fill a parameter (the parameter must be empty) return it out plain
ref = reference, standard parameter (maybe with value), but the function can modifiy it.
You can use the out contextual keyword in two contexts (each is a link to detailed information), as a parameter modifier or in generic type parameter declarations in interfaces and delegates. This topic discusses the parameter modifier, but you can see this other topic for information on the generic type parameter declarations.
The out keyword causes arguments to be passed by reference. This is like the ref keyword, except that ref requires that the variable be initialized before it is passed. To use an out parameter, both the method definition and the calling method must explicitly use the out keyword. For example:
C#
class OutExample
{
static void Method(out int i)
{
i = 44;
}
static void Main()
{
int value;
Method(out value);
// value is now 44
}
}
Although variables passed as out arguments do not have to be initialized before being passed, the called method is required to assign a value before the method returns.
Although the ref and out keywords cause different run-time behavior, they are not considered part of the method signature at compile time. Therefore, methods cannot be overloaded if the only difference is that one method takes a ref argument and the other takes an out argument. The following code, for example, will not compile:
C#
class CS0663_Example
{
// Compiler error CS0663: "Cannot define overloaded
// methods that differ only on ref and out".
public void SampleMethod(out int i) { }
public void SampleMethod(ref int i) { }
}
Overloading can be done, however, if one method takes a ref or out argument and the other uses neither, like this:
C#
class OutOverloadExample
{
public void SampleMethod(int i) { }
public void SampleMethod(out int i) { i = 5; }
}
Properties are not variables and therefore cannot be passed as out parameters.
For information about passing arrays, see Passing Arrays Using ref and out (C# Programming Guide).
You can't use the ref and out keywords for the following kinds of methods:
Async methods, which you define by using the async modifier.
Iterator methods, which include a yield return or yield break statement.
Example
Declaring an out method is useful when you want a method to return multiple values. The following example uses out to return three variables with a single method call. Note that the third argument is assigned to null. This enables methods to return values optionally.
C#
class OutReturnExample
{
static void Method(out int i, out string s1, out string s2)
{
i = 44;
s1 = "I've been returned";
s2 = null;
}
static void Main()
{
int value;
string str1, str2;
Method(out value, out str1, out str2);
// value is now 44
// str1 is now "I've been returned"
// str2 is (still) null;
}
}
Just to clarify on OP's comment that the use on ref and out is a "reference to a value type or struct declared outside the method", which has already been established in incorrect.
Consider the use of ref on a StringBuilder, which is a reference type:
private void Nullify(StringBuilder sb, string message)
{
sb.Append(message);
sb = null;
}
// -- snip --
StringBuilder sb = new StringBuilder();
string message = "Hi Guy";
Nullify(sb, message);
System.Console.WriteLine(sb.ToString());
// Output
// Hi Guy
As apposed to this:
private void Nullify(ref StringBuilder sb, string message)
{
sb.Append(message);
sb = null;
}
// -- snip --
StringBuilder sb = new StringBuilder();
string message = "Hi Guy";
Nullify(ref sb, message);
System.Console.WriteLine(sb.ToString());
// Output
// NullReferenceException
Basically both ref and out for passing object/value between methods
The out keyword causes arguments to be passed by reference. This is like the ref keyword, except that ref requires that the variable be initialized before it is passed.
out : Argument is not initialized and it must be initialized in the method
ref : Argument is already initialized and it can be read and updated in the method.
What is the use of “ref” for reference-types ?
You can change the given reference to a different instance.
Did you know?
Although the ref and out keywords cause different run-time behavior, they are not considered part of the method signature at compile time. Therefore, methods cannot be overloaded if the only difference is that one method takes a ref argument and the other takes an out argument.
You can't use the ref and out keywords for the following kinds of methods:
Async methods, which you define by using the async modifier.
Iterator methods, which include a yield return or yield break statement.
Properties are not variables and therefore cannot be passed as out parameters.
An argument passed as ref must be initialized before passing to the method whereas out parameter needs not to be initialized before passing to a method.
why do you ever want to use out?
To let others know that the variable will be initialized when it returns from the called method!
As mentioned above:
"for an out parameter, the calling method is required to assign a value before the method returns."
example:
Car car;
SetUpCar(out car);
car.drive(); // You know car is initialized.
Extra notes regarding C# 7:
In C# 7 there's no need to predeclare variables using out. So a code like this:
public void PrintCoordinates(Point p)
{
int x, y; // have to "predeclare"
p.GetCoordinates(out x, out y);
WriteLine($"({x}, {y})");
}
Can be written like this:
public void PrintCoordinates(Point p)
{
p.GetCoordinates(out int x, out int y);
WriteLine($"({x}, {y})");
}
Source: What's new in C# 7.
It should be noted that in is a valid keyword as of C# ver 7.2:
The in parameter modifier is available in C# 7.2 and later. Previous versions generate compiler error CS8107 ("Feature 'readonly references' is not available in C# 7.0. Please use language version 7.2 or greater.") To configure the compiler language version, see Select the C# language version.
...
The in keyword causes arguments to be passed by reference. It makes the formal parameter an alias for the argument, which must be a variable. In other words, any operation on the parameter is made on the argument. It is like the ref or out keywords, except that in arguments cannot be modified by the called method. Whereas ref arguments may be modified, out arguments must be modified by the called method, and those modifications are observable in the calling context.