I've just decompiled some 3rd party source to debug an issue, using DotPeek. The output code contains some unusual operators, which AFAIK aren't valid C#, so I'm wondering what they mean...
The extract looks like (with Dotpeek comments included, as they are probably relevant);
protected internal void DoReceive(ref byte[] Buffer, int MaxSize, out int Written)
{
Written = 0;
.
.
.
// ISSUE: explicit reference operation
// ISSUE: variable of a reference type
int& local = #Written;
int num = SomeMethod();
.
.
.
// ISSUE: explicit reference operation
^local = num;
}
So, 3 unusual operators in there... int& = #Written seems to be assigning a pointer to a variable that is named pointlessly with the # character?
But what is ^local = num; ???
OK, here is the equivalent snippet from ILSpy, which makes more sense, I guess the decompile to C# didn't produce a valid equivalent?
'C#'
int& local = #Written;
byte[] numArray2 = this.FInSpool;
int num = (int) __Global.Min(numArray2 == null ? 0L : (long) numArray2.Length, (long) MaxSize);
^local = num;
IL
byte[] expr_22 = this.FInSpool;
Written = (int)__Global.Min((long)((expr_22 == null) ? 0 : expr_22.Length), (long)MaxSize);
So, I guess the 'C#' isn't quite valid? That IL would be valid C#, not sure why DotPeek produced the output it did. Perhaps I'll stick to ILSpy for this one...?
If you look at the raw IL (from ildasm, not the C# equivalent via IL Spy), that may help you see what the decompiler is trying to say. 'Out' parameters are represented using a (managed) typed-reference, which isn't explicitly exposed as a type in C#. 'Instances' of this type can normally only be passed as parameters to methods accepting typed references ('ref' or 'out' parameters.) See OpCodes.Mkrefany for more information.
What dotPeek is complaining about is that this 'out' typed reference was stored from the argument into a local variable slot, then written to later via the local slot. The '#' and '^' are placeholders used to indicate this unexpected behavior detected by the decompiler (the one the ISSUE comments describe.)
It's possible the code was compiled from C++/CLI and thus the IL looks different from the typical C# compiler output. It's also possible this is some level of minor obfuscation to confuse decompilers (though I don't think so.) I don't think this is functionally any different from loading the reference from its argument onto the operation stack directly (avoiding the use of a local variable slot), but I could be wrong.
Putting a # before a name allows you to use a reserved name for a variable, For example if I wanted to have a variable called return I would need to do this.
public int Weird()
{
int #return = 0;
return #return;
}
See this SO question for more details.
Putting a ^ before the name ... umm, no clue. (will update as I research I could find any info on what ^ means when not being used as a XOR)
It's clearly a decompilation issue. Because of broad set of languages supported decompiler to any particular language may not always find exact match, but still tries to produce some output. Decompiler MAY try to produce somewhat equivalent output, like in this case could be:
protected internal void DoReceive(ref byte[] Buffer, int MaxSize, out int Written)
{
Written = 0;
.
int num = SomeMethod();
.
Written = num;
}
but SHOULD it really do this? in this case decompiler actually provided you with a hint, so you could decide if this is important for your particular case, as there MAY be some side effects.
Related
I already have found useful answers why it shouldn't be possible at all:
Why does C# limit the set of types that can be declared as const?
Why can't structs be declared as const?
The first one has a detailed answer, which I still have to re-read a couple of times until I fully get it.
The second one has a very easy and clear answer (like 'the constructor might do anything, so it had to be run and evaluated at compile time').
But both refer to C#.
However, I am using C++/CLI and have a
value class CLocation
{
public:
double x, y, z;
CLocation ( double i_x, double i_y, double i_z) : x(i_x), y(i_y), z(i_z) {}
CLocation ( double i_all) : x(i_all), y(i_all), z(i_all) {}
...
}
where I can easily create a
const CLoc c_loc (1,2,3);
which indeed is immutable, meaning 'const'.
Why?
CLocation furthermore has a function
System::Drawing::Point CLocation::ToPoint ()
{
return System::Drawing::Point (x_int, y_int);
}
which works well on CLocation, but doesn't on a const CLocation. I think this comes from the limitation in C# (known from the links above), which likely comes from the underlying IL, so C++/CLI is affected by that limitation in the same way.
Is this correct?
Is there a way to run this member function on a const CLocation?
You must indicate to the compiler that your function doesn't change the object by adding const after the argument list.
Your function may then be called on a const variable, but may not modify its fields.
Pay also attention that some keywords (including const and struct) have different meanings in C# and C++ (and other languages based on C).
Update
As CPP/CLI doesn't allow a const modifier on a member function, you'll have to copy the variable to a non-const one to be able to call any member function (on the copy).
I have a class, and I want to inspect its fields and report eventually how many bytes each field takes. I assume all fields are of type Int32, byte, etc.
How can I find out easily how many bytes does the field take?
I need something like:
Int32 a;
// int a_size = a.GetSizeInBytes;
// a_size should be 4
You can't, basically. It will depend on padding, which may well be based on the CLR version you're using and the processor etc. It's easier to work out the total size of an object, assuming it has no references to other objects: create a big array, use GC.GetTotalMemory for a base point, fill the array with references to new instances of your type, and then call GetTotalMemory again. Take one value away from the other, and divide by the number of instances. You should probably create a single instance beforehand to make sure that no new JITted code contributes to the number. Yes, it's as hacky as it sounds - but I've used it to good effect before now.
Just yesterday I was thinking it would be a good idea to write a little helper class for this. Let me know if you'd be interested.
EDIT: There are two other suggestions, and I'd like to address them both.
Firstly, the sizeof operator: this only shows how much space the type takes up in the abstract, with no padding applied round it. (It includes padding within a structure, but not padding applied to a variable of that type within another type.)
Next, Marshal.SizeOf: this only shows the unmanaged size after marshalling, not the actual size in memory. As the documentation explicitly states:
The size returned is the actually the
size of the unmanaged type. The
unmanaged and managed sizes of an
object can differ. For character
types, the size is affected by the
CharSet value applied to that class.
And again, padding can make a difference.
Just to clarify what I mean about padding being relevant, consider these two classes:
class FourBytes { byte a, b, c, d; }
class FiveBytes { byte a, b, c, d, e; }
On my x86 box, an instance of FourBytes takes 12 bytes (including overhead). An instance of FiveBytes takes 16 bytes. The only difference is the "e" variable - so does that take 4 bytes? Well, sort of... and sort of not. Fairly obviously, you could remove any single variable from FiveBytes to get the size back down to 12 bytes, but that doesn't mean that each of the variables takes up 4 bytes (think about removing all of them!). The cost of a single variable just isn't a concept which makes a lot of sense here.
Depending on the needs of the questionee, Marshal.SizeOf might or might not give you what you want. (Edited after Jon Skeet posted his answer).
using System;
using System.Runtime.InteropServices;
public class MyClass
{
public static void Main()
{
Int32 a = 10;
Console.WriteLine(Marshal.SizeOf(a));
Console.ReadLine();
}
}
Note that, as jkersch says, sizeof can be used, but unfortunately only with value types. If you need the size of a class, Marshal.SizeOf is the way to go.
Jon Skeet has laid out why neither sizeof nor Marshal.SizeOf is perfect. I guess the questionee needs to decide wether either is acceptable to his problem.
From Jon Skeets recipe in his answer I tried to make the helper class he was refering to. Suggestions for improvements are welcome.
public class MeasureSize<T>
{
private readonly Func<T> _generator;
private const int NumberOfInstances = 10000;
private readonly T[] _memArray;
public MeasureSize(Func<T> generator)
{
_generator = generator;
_memArray = new T[NumberOfInstances];
}
public long GetByteSize()
{
//Make one to make sure it is jitted
_generator();
long oldSize = GC.GetTotalMemory(false);
for(int i=0; i < NumberOfInstances; i++)
{
_memArray[i] = _generator();
}
long newSize = GC.GetTotalMemory(false);
return (newSize - oldSize) / NumberOfInstances;
}
}
Usage:
Should be created with a Func that generates new Instances of T. Make sure the same instance is not returned everytime. E.g. This would be fine:
public long SizeOfSomeObject()
{
var measure = new MeasureSize<SomeObject>(() => new SomeObject());
return measure.GetByteSize();
}
It can be done indirectly, without considering the alignment.
The number of bytes that reference type instance is equal service fields size + type fields size.
Service fields(in 32x takes 4 bytes each, 64x 8 bytes):
Sysblockindex
Pointer to methods table
+Optional(only for arrays) array size
So, for class without any fileds, his instance takes 8 bytes on 32x machine. If it is class with one field, reference on the same class instance, so, this class takes(64x):
Sysblockindex + pMthdTable + reference on class = 8 + 8 + 8 = 24 bytes
If it is value type, it does not have any instance fields, therefore in takes only his fileds size. For example if we have struct with one int field, then on 32x machine it takes only 4 bytes memory.
I had to boil this down all the way to IL level, but I finally got this functionality into C# with a very tiny library.
You can get it (BSD licensed) at bitbucket
Example code:
using Earlz.BareMetal;
...
Console.WriteLine(BareMetal.SizeOf<int>()); //returns 4 everywhere I've tested
Console.WriteLine(BareMetal.SizeOf<string>()); //returns 8 on 64-bit platforms and 4 on 32-bit
Console.WriteLine(BareMetal.SizeOf<Foo>()); //returns 16 in some places, 24 in others. Varies by platform and framework version
...
struct Foo
{
int a, b;
byte c;
object foo;
}
Basically, what I did was write a quick class-method wrapper around the sizeof IL instruction. This instruction will get the raw amount of memory a reference to an object will use. For instance, if you had an array of T, then the sizeof instruction would tell you how many bytes apart each array element is.
This is extremely different from C#'s sizeof operator. For one, C# only allows pure value types because it's not really possible to get the size of anything else in a static manner. In contrast, the sizeof instruction works at a runtime level. So, however much memory a reference to a type would use during this particular instance would be returned.
You can see some more info and a bit more in-depth sample code at my blog
if you have the type, use the sizeof operator. it will return the type`s size in byte.
e.g.
Console.WriteLine(sizeof(int));
will output:
4
You can use method overloading as a trick to determine the field size:
public static int FieldSize(int Field) { return sizeof(int); }
public static int FieldSize(bool Field) { return sizeof(bool); }
public static int FieldSize(SomeStructType Field) { return sizeof(SomeStructType); }
Simplest way is: int size = *((int*)type.TypeHandle.Value + 1)
I know this is implementation detail but GC relies on it and it needs to be as close to start of the methodtable for efficiency plus taking into consideration how GC code complex is nobody will dare to change it in future. In fact it works for every minor/major versions of .net framework+.net core. (Currently unable to test for 1.0)
If you want more reliable way, emit a struct in a dynamic assembly with [StructLayout(LayoutKind.Auto)] with exact same fields in same order, take its size with sizeof IL instruction. You may want to emit a static method within struct which simply returns this value. Then add 2*IntPtr.Size for object header. This should give you exact value.
But if your class derives from another class, you need to find each size of base class seperatly and add them + 2*Inptr.Size again for header. You can do this by getting fields with BindingFlags.DeclaredOnly flag.
System.Runtime.CompilerServices.Unsafe
Use System.Runtime.CompilerServices.Unsafe.SizeOf<T>() where T: unmanaged
(when not running in .NET Core you need to install that NuGet package)
Documentation states:
Returns the size of an object of the given type parameter.
It seems to use the sizeof IL-instruction just as Earlz solution does as well. (source)
The unmanaged constraint is new in C# 7.3
After seeing how double.Nan == double.NaN is always false in C#, I became curious how the equality was implemented under the hood. So I used Resharper to decompile the Double struct, and here is what I found:
public struct Double : IComparable, IFormattable, IConvertible, IComparable<double>, IEquatable<double>
{
// stuff removed...
public const double NaN = double.NaN;
// more stuff removed...
}
This seems to indicate the the struct Double declares a constant that is defined in terms of this special lower case double, though I'd always thought that the two were completely synonymous. What's more, if I Go To Implementation on the lowercase double, Resharper simply scrolls me to the declaration at the top of the file. Similarly, jumping to implementation of the lowercase's NaN just takes me to the constant declaration earlier in the line!
So I'm trying to understand this seemingly recursive definition. Is this just an artefact of the decompiler? Perhaps a limitation in Resharper? Or is this lowercase double actually a different beast altogether - representing something at a lower level from the CLR/CTS?
Where does NaN really come from?
Beware looking at decompiled code, especially if it is for something inbuilt. The actual IL here (for .NET 4.5, at least) is:
.field public static literal float64 NaN = float64(NaN)
{
.custom instance void __DynamicallyInvokableAttribute::.ctor()
}
i.e. this is handled directly in IL via the NaN token.
However, because it is a const (literal in IL), it will get "burned into" the call site; anywhere else that uses double.NaN will also be using float64(NaN). Similarly, example, if I do:
const int I = 2;
int i = I;
int j = 2;
both of these assignments will look identical in the final IL (they will both be ldc.i4.2).
Because of this, most decompilers will recognise the IL pattern NaN and represent it with the language's equivalent of double.NaN. But that doesn't mean that the code is itself recursive; they probably just don't have a check for "but is it double.NaN itself?". Ultimately, this is simply a special case, where float64(NaN) is a recognised value in IL.
Incidentally, reflector decompiles it as:
[__DynamicallyInvokable]
public const double NaN = (double) 1.0 / (double) 0.0;
That again doesn't meant that this is truth :p Merely that this is something which may have the same end result.
By far the best source you can get for .NET assemblies is the actual source code that was used to build them. Beats any decompiler for accuracy, the comments can be quite useful as well. Download the Reference Source.
You'll then also see that Double.NaN isn't defined in IL as Marc assumed, it's actually in a C# source code file. The net/clr/bcl/system/double.cs source code file shows the real declaration:
public const double NaN = (double)0.0 / (double)0.0;
Which takes advantage of the C# compiler evaluating constant expressions at compile time. Or to put it tongue-in-cheek, NaN is defined by the C++ compiler since that's the language that was used to write the C# compiler ;)
I have some concurrent code which has an intermittent failure and I've reduced the problem down to two cases which seem identical, but where one fails and the other doesn't.
I've now spent way too much time trying to create a minimal, complete example that fails, but without success, so I'm just posting the lines that fail in case anyone can see an obvious problem.
Object lock = new Object();
struct MyValueType { readonly public int i1, i2; };
class Node { public MyValueType x; public int y; public Node z; };
volatile Node[] m_rg = new Node[300];
unsafe void Foo()
{
Node[] temp;
while (true)
{
temp = m_rg;
/* ... */
Monitor.Enter(lock);
if (temp == m_rg)
break;
Monitor.Exit(lock);
}
#if OK // this works:
Node cur = temp[33];
fixed (MyValueType* pe = &cur.x)
*(long*)pe = *(long*)&e;
#else // this reliably causes random corruption:
fixed (MyValueType* pe = &temp[33].x)
*(long*)pe = *(long*)&e;
#endif
Monitor.Exit(lock);
}
I have studied the IL code and it looks like what's happening is that the Node object at array position 33 is moving (in very rare cases) despite the fact that we are holding a pointer to a value type within it.
It's as if the CLR doesn't notice that we are passing through a heap (movable) object--the array element--in order to access the value type. The 'OK' version has never failed under extended testing on an 8-way machine, but the alternate path fails quickly every time.
Is this never supposed to work, and 'OK' version is too streamlined to fail under stress?
Do I need to pin the object myself using GCHandle (I notice in the IL that the fixed statement alone is not doing so)?
If manual pinning is required here, why is the compiler allowing access through a heap object (without pinning) in this way?
note: This question is not discussing the elegance of reinterpreting the blittable value type in a nasty way, so please, no criticism of this aspect of the code unless it is directly relevant to the problem at hand.. thanks
[edit: jitted asm]
Thanks to Hans' reply, I understand better why the jitter is placing things on the stack in what otherwise seem like vacuous asm operations. See [rsp + 50h] for example, and how it gets nulled out after the 'fixed' region. The remaining unresolved question is whether [cur+18h] (lines 207-20C) on the stack is somehow sufficient to protect the access to the value type in a way that is not adequate for [temp+33*IntPtr.Size+18h] (line 24A).
[edit]
summary of conclusions, minimal example
Comparing the two code fragments below, I now believe that #1 is not ok, whereas #2 is acceptable.
(1.) The following fails (on x64 jit at least); GC can still move the MyClass instance if you try to fix it in-situ, via an array reference. There's no place on the stack for the reference of the particular object instance (the array element that needs to be fixed) to be published, for the GC to notice.
struct MyValueType { public int foo; };
class MyClass { public MyValueType mvt; };
MyClass[] rgo = new MyClass[2000];
fixed (MyValueType* pvt = &rgo[1234].mvt)
*(int*)pvt = 1234;
(2.) But you can access a structure inside a (movable) object using fixed (without pinning) if you provide an explicit reference on the stack which can be advertised to the GC:
struct MyValueType { public int foo; };
class MyClass { public MyValueType mvt; };
MyClass[] rgo = new MyClass[2000];
MyClass mc = &rgo[1234]; // <-- only difference -- add this line
fixed (MyValueType* pvt = &mc.mvt) // <-- and adjust accordingly here
*(int*)pvt = 1234;
This is where I'll leave it unless someone can provide corrections or more information...
Modifying objects of managed type through fixed pointers can results in undefined behavior (C# Language specification, chapter 18.6.)
Well, you are doing just that. In spite of the verbiage in the spec and the MSDN library, the fixed keyword does not in fact make the object unmoveable, it doesn't get pinned. You probably found out from looking at the IL. It uses a clever trick by generating a pointer + offset and letting the garbage collector adjust the pointer. I don't have a great explanation why this fails in one case but not the other. I don't see a fundamental difference in the generated machine code. But then I probably didn't reproduce your exact machine code either, the snippet isn't great.
As near as I can tell it should fail in both cases because of the structure member access. That causes the pointer + offset to collapse to a single pointer with a LEA instruction, preventing the garbage collector from recognizing the reference. Structures have always been trouble for the jitter. Thread timing could explain the difference, perhaps.
You could post to connect.microsoft.com for a second opinion. It is however going to be difficult to navigate around the spec violation. If my theory is correct then a read could fail too, much harder to prove though.
Fix it by actually pinning the array with GCHandle.
Puzzling over this, and I'm guessing here, it looks like the compiler is taking &temp (fixed pointer to the tmp array) then indexing that with [33]. So you're pinning the temp array, rather than the node. Try...
fixed (MyValueType* pe = &(temp[33]).x)
*(long*)pe = *(long*)&e;
In an answer to his own controversial question, Mash has illustrated that you don't need the "unsafe" keyword to read and write directly to the bytes of any .NET object instance. You can declare the following types:
[StructLayout(LayoutKind.Explicit)]
struct MemoryAccess
{
[FieldOffset(0)]
public object Object;
[FieldOffset(0)]
public TopBytes Bytes;
}
class TopBytes
{
public byte b0;
public byte b1;
public byte b2;
public byte b3;
public byte b4;
public byte b5;
public byte b6;
public byte b7;
public byte b8;
public byte b9;
public byte b10;
public byte b11;
public byte b12;
public byte b13;
public byte b14;
public byte b15;
}
And then you can do things like change an "immutable" string. The following code prints "bar" on my machine:
string foo = "foo";
MemoryAccess mem = new MemoryAccess();
mem.Object = foo;
mem.Bytes.b8 = (byte)'b';
mem.Bytes.b10 = (byte)'a';
mem.Bytes.b12 = (byte)'r';
Console.WriteLine(foo);
You can also trigger an AccessViolationException by corrupting object references with the same technique.
Question: I thought that (in pure managed C# code) the unsafe keyword was necessary to do things like this. Why is it not necessary here? Does this mean that pure managed "safe" code is not really safe at all?
OK, that is nasty... the dangers of using a union. That may work, but isn't a very good idea - I guess I'd compare it to reflection (where you can do most things). I'd be interested to see if this works in a constrained access environment - if so, it may represent a bigger problem...
I've just tested it without the "Full Trust" flag, and the runtime rejects it:
Could not load type 'MemoryAccess'
from assembly 'ConsoleApplication4,
Version=1.0.0.0, Culture=neutral,
PublicKeyToken=null' because objects
overlapped at offset 0 and the
assembly must be verifiable.
And to have this flag, you already need high trust - so you can already do more nasty things. Strings are a slightly different case, because they aren't normal .NET objects - but there are other examples of ways to mutate them - the "union" approach is an interesting one, though. For another hacky way (with enough trust):
string orig = "abc ", copy = orig;
typeof(string).GetMethod("AppendInPlace",
BindingFlags.NonPublic | BindingFlags.Instance,
null, new Type[] { typeof(string), typeof(int) }, null)
.Invoke(orig, new object[] { "def", 3 });
Console.WriteLine(copy); // note we didn't touch "copy", so we have
// mutated the same reference
Whoops, I've muddled unsafe with fixed. Here's a corrected version:
The reason that the sample code does not require tagging with the unsafe keyword is that it does not contain pointers (see below quote for why this is regarded as unsafe). You are quite correct: "safe" might better be termed "run-time friendly". For more information on this topic I refer you to Don Box and Chris Sells Essential .NET
To quote MSDN,
In the common language runtime (CLR),
unsafe code is referred to as
unverifiable code. Unsafe code in C#
is not necessarily dangerous; it is
just code whose safety cannot be
verified by the CLR. The CLR will
therefore only execute unsafe code if
it is in a fully trusted assembly. If
you use unsafe code, it is your
responsibility to ensure that your
code does not introduce security risks
or pointer errors.
The difference between fixed and unsafe is that fixed stops the CLR from moving things around in memory, so that things outside the run-time can safely access them, whereas unsafe is about exactly the opposite problem: while the CLR can guarantee correct resolution for a dotnet reference, it cannot do so for a pointer. You may recall various Microsofties going on about how a reference is not a pointer, and this is why they make such a fuss about a subtle distinction.
You are still opting out of the 'managed' bit. There is the underlying assumption that if you can do that then you know what you're doing.