C# unsafe pointer fields - c#

Is this going to break? It compiles fine but based on readings, I'm unsure if its guaranteed that _ptRef will always point to the struct referenced in the constructor.
I guess by 'break' I mean...will the GC move the struct pointed to by the pointer (_ptRef)?
public unsafe class CPointType0
{
private PointType0* _ptRef = null;
public CPointType0(ref PointType0 spt)
{
fixed (PointType0 * pt = &spt)
{
_ptRef = pt;
}
}
...a bunch of property accessors to fields of _ptRef (accessed as return _ptRef->Thing) }
The scenario is
-PointType0 is a struct.
-millions of PointType0's in memory in a data structure. These used to be reference types but there gets to be way too much memory overhead.
-A List is returned only when a search operation finds a relevant PointType0, and this List is passed around and operated on a lot.

It's not safe.
After the code leaves the fixed block, the garbage collector is free to move things around again. What are you trying to accomplish here? Do you maybe want to use the index of an item in the list, instead of a pointer?

Related

c# Is it possible to recreate an array?

I have a class, CircularBuffer, which has a method CreateBuffer. The class does a number of things but occasionally I need to change the size of an array that is used in the class. I do not need the data any longer. Here is the class:
static class CircularBuffer
{
static Array[,] buffer;
static int columns, rows;
public static void CreateBuffer(int columns, int rows)
{
buffer = new Array[rows,columns];
}
//other methods that use the buffer
}
Now the size of the buffer is up to 100 x 2048 floats. Is this going to cause any memory issues, or will it be automatically replaced with no issues?
Thanks
You are, technically speaking, not recreating anything. You are simply creating a new array and overwriting the variable's value (the address, so to speak, of the array its referencing).
It's important therefore that you distinguish what you are really replacing; you are not replacing the array, only the reference to it.
Problems? None. By your code, the old array will not be reachable anymore and will therefore be eligible for collection by the GC. If the collection ever happens is up to the GC but its not something you should worry about.

Clear structure in c#

Say I have a structure with several members eg.
private struct MyStats
{
public int packsGood, packsBad, packsTotal;
public bool haveAcceptedStart;
// ...and a bunch of other parameters
}
If I have a member variable of type MyStats, how do I use it for a while and then quickly clear it? In Delphi or C++ I might have used the following code:
memset(&m_stats, 0, sizeof(MyStats)); // C++
ZeroMemory(#m_stats, SizeOf(MyStats)); // Delphi
but that doesn't appear possible in C#. Surely I don't have to rattle through all members each time or PInvoke an API call?
Memory Management should generally not be one of your concerns in C#, as opposed to C or C++. There is the garbage collector to take care of memory clearing. As soon as all the references to your struct have disappeared, the garbage collector will reclaim the memory somewhere in the future. When exactly is irrelevant for a developer.
If you have a field in your class just to be used for a while an clearing it then your class isn't properly designed and does not comply with OOP principles.
If you are using that field just as an auxiliary value you should just replace it with a variable in every scope it is used; in this way you'll reduce the number of daunting side effects and since it's a value type it will be allocated on the stack and cleared as soon as the control flow leaves the current block (given you don't capture it in a lambda expression).

How to design an api to a persistent collection in C#?

I am thinking about creating a persistent collection (lists or other) in C#, but I can't figure out a good API.
I use 'persistent' in the Clojure sense: a persistent list is a list that behaves as if it has value semantics instead of reference semantics, but does not incur the overhead of copying large value types. Persistent collections use copy-on-write to share internal structure. Pseudocode:
l1 = PersistentList()
l1.add("foo")
l1.add("bar")
l2 = l1
l1.add("baz")
print(l1) # ==> ["foo", "bar", "baz"]
print(l2) # ==> ["foo", "bar"]
# l1 and l2 share a common structure of ["foo", "bar"] to save memory
Clojure uses such datastructures, but additionally in Clojure all data structures are immutable. There is some overhead in doing all the copy-on-write stuff so Clojure provides a workaround in the form of transient datastructures that you can use if you are sure you're not sharing the datastructure with anyone else. If you have the only reference to a datastructure, why not mutate it directly instead of going through all the copy-on-write overhead.
One way to get this efficiency gain would be to keep a reference count on your datastructure (though I don't think Clojure works that way). If the refcount is 1, you're holding the only reference so do the updates destructively. If the refcount is higher, someone else is also holding a reference to it that's supposed to behave like a value type, so do copy-on-write to not disturb the other referrers.
In the API to such a datastructure, one could expose the refcounting, which makes the API seriously less usable, or one could not do the refcounting, leading to unnecessary copy-on-write overhead if every operation is COW'ed, or the API loses it's value type behaviour and the user has to manage when to do COW manually.
If C# had copy constructors for structs, this would be possible. One could define a struct containing a reference to the real datastructure, and do all the incref()/decref() calls in the copy constructor and destructor of the struct.
Is there a way to do something like reference counting or struct copy constructors automatically in C#, without bothering the API users?
Edit:
Just to be clear, I'm just asking about the API. Clojure already has an implementation of this written in Java.
It is certainly possible to make such an interface by using a struct with a reference to the real collection that is COW'ed on every operation. The use of refcounting would be an optimisation to avoid unnecessary COWing, but apparently isn't possible with a sane API.
What you're looking to do isn't possible, strictly speaking. You could get close by using static functions that do the reference counting, but I understand that that isn't a terrible palatable option.
Even if it were possible, I would stay away from this. While the semantics you describe may well be useful in Clojure, this cross between value type and reference type semantics will be confusing to most C# developers (mutable value types--or types with value type semantics that are mutable--are also usually considered Evil).
You may use the WeakReference class as an alternative to refcounting and achieve some of the benefits that refcounting gives you. When you hold the only copy to an object in a WeakReference, it will be garbage collected. WeakReference has some hooks for you to inspect whether that's been the case.
EDIT 3: While this approach does do the trick I'd urge you to stay away from persuing value semantics on C# collections. Users of your structure do not expect this kind of behavior on the platform. These semantics add confusion and the potential for mistakes.
EDIT 2: Added an example. #AdamRobinson: I'm afraid I was not clear how WeakReference can be of use. I must warn that performancewise, most of the time it might be even worse than doing a naive Copy-On-Write at every operation. This is due to the Garbage Collector call. Therefore this is merely an academic solution, and I cannot recommend it's use in production systems. It does do exactly what you ask however.
class Program
{
static void Main(string[] args)
{
var l1 = default(COWList);
l1.Add("foo"); // initialize
l1.Add("bar"); // no copy
l1.Add("baz"); // no copy
var l2 = l1;
l1.RemoveAt(0); // copy
l2.Add("foobar"); // no copy
l1.Add("barfoo"); // no copy
l2.RemoveAt(1); // no copy
var l3 = l2;
l3.RemoveAt(1); // copy
Trace.WriteLine(l1.ToString()); // bar baz barfoo
Trace.WriteLine(l2.ToString()); // foo baz foobar
Trace.WriteLine(l3.ToString()); // foo foobar
}
}
struct COWList
{
List<string> theList; // Contains the actual data
object dummy; // helper variable to facilitate detection of copies of this struct instance.
WeakReference weakDummy; // helper variable to facilitate detection of copies of this struct instance.
/// <summary>
/// Check whether this COWList has already been constructed properly.
/// </summary>
/// <returns>true when this COWList has already been initialized.</returns>
bool EnsureInitialization()
{
if (theList == null)
{
theList = new List<string>();
dummy = new object();
weakDummy = new WeakReference(dummy);
return false;
}
else
{
return true;
}
}
void EnsureUniqueness()
{
if (EnsureInitialization())
{
// If the COWList has been copied, removing the 'dummy' reference will not kill weakDummy because the copy retains a reference.
dummy = new object();
GC.Collect(2); // OUCH! This is expensive. You may replace it with GC.Collect(0), but that will cause spurious Copy-On-Write behaviour.
if (weakDummy.IsAlive) // I don't know if the GC guarantees detection of all GC'able objects, so there might be cases in which the weakDummy is still considered to be alive.
{
// At this point there is probably a copy.
// To be safe, do the expensive Copy-On-Write
theList = new List<string>(theList);
// Prepare for the next modification
weakDummy = new WeakReference(dummy);
Trace.WriteLine("Made copy.");
}
else
{
// At this point it is guaranteed there is no copy.
weakDummy.Target = dummy;
Trace.WriteLine("No copy made.");
}
}
else
{
Trace.WriteLine("Initialized an instance.");
}
}
public void Add(string val)
{
EnsureUniqueness();
theList.Add(val);
}
public void RemoveAt(int index)
{
EnsureUniqueness();
theList.RemoveAt(index);
}
public override string ToString()
{
if (theList == null)
{
return "Uninitialized COWList";
}
else
{
var sb = new StringBuilder("[ ");
foreach (var item in theList)
{
sb.Append("\"").Append(item).Append("\" ");
}
sb.Append("]");
return sb.ToString();
}
}
}
This outputs:
Initialized an instance.
No copy made.
No copy made.
Made copy.
No copy made.
No copy made.
No copy made.
Made copy.
[ "bar" "baz" "barfoo" ]
[ "foo" "baz" "foobar" ]
[ "foo" "foobar" ]
I read what you're asking for, and I'm thinking of a "terminal-server"-type API structure.
First, define an internal, thread-safe singleton class that will be your "server"; it actually holds the data you're looking at. It will expose a Get and Set method that will take the string of the value being set or gotten, controlled by a ReaderWriterLock to ensure that the value can be read by anyone, but not while anyone's writing and only one person can write at a time.
Then, provide a factory for a class that is your "terminal"; this class will be public, and contains a reference to the internal singleton (which otherwise cannot be seen). It will contain properties that are really just pass-throughs for the singleton instance. In this way, you can provide a large number of "terminals" that will all see the same data from the "server", and will be able to modify that data in a thread-safe way.
You could use copy constructors and a list of the values accessed by each instance to provide copy-type knowledge. You can also mashup the value names with the object's handle to support cases where L1 and L2 share an A, but L3 has a different A because it was declared seperately. Or, L3 can get the same A that L1 and L2 have. However you structure this, I would very clearly document how it should be expected to behave, because this is NOT the way things behave in basic .NET.
I'd like to have something like this on a flexible tree collection object of mine, though it wouldn't be by using value-type semantics (which would be essentially impossible in .net) but by having a clone generate a "virtual" deep clone instead of actually cloning every node within the collection. Instead of trying to keep an accurate reference count, every internal node would have three states:
Flexible
SharedImmutable
UnsharedMutable
Calling Clone() on a sharedImmutable node would simply yield the original object; calling Clone on a Flexible node would turn it into a SharedImmutable one. Calling Clone on an unshared mutable node would create a new node holding clones of all its descendents; the new object would be Flexible.
Before an object could be written, it would have to be made UnsharedMutable. To make an object UnsharedMutable if it isn't already, make its parent (the node via which it was accessed) UnsharedMutable (recursively). Then if the object was SharedImmutable, clone it (using a ForceClone method) and update the parent's link to point to the new object. Finally, set the new object's state to UnsharedMutable.
An essential aspect of this technique would be having separate classes for holding the data and providing the interface to it. A statement like MyCollection["this"]["that"]["theOther"].Add("George")needs to be evaluated by having the indexing operations return an indexer class which holds a reference to MyCollection. At that point, the "Add" method could then be able to act upon whatever intermediate nodes it had to in order to perform any necessary copy-on-write operations.

C#: Using pointer types as fields?

In C#, it's possible to declare a struct (or class) that has a pointer type member, like this:
unsafe struct Node
{
public Node* NextNode;
}
Is it ever safe (err.. ignore for a moment that ironic little unsafe flag..) to use this construction? I mean for longterm storage on the heap. From what I understand, the GC is free to move things around, and while it updates the references to something that's been moved, does it update pointers too? I'm guessing no, which would make this construction very unsafe, right?
I'm sure there are way superior alternatives to doing this, but call it morbid curiosity.
EDIT: There appears to be some confusion. I know that this isn't a great construction, I purely want to know if this is ever a safe construction, ie: is the pointer guaranteed to keep pointing to whatever you originally pointed it to?
The original C-code was used to traverse a tree (depth first) without recursion, where the tree is stored in an array. The array is then traversed by incrementing a pointer, unless a certain condition is met, then the pointer is set to the NextNode, where traversal continues. Of course, the same can in C# be accomplished by:
struct Node
{
public int NextNode;
... // other fields
}
Where the int is the index in the array of the next node. But for performance reasons, I'd end up fiddling with pointers and fixed arrays to avoid bounds checks anyway, and the original C-code seemed more natural.
Is it ever safe to use this construction? I mean for long term storage on the heap.
Yes. Doing so is usually foolish, painful and unnecessary, but it is possible.
From what I understand, the GC is free to move things around, and while it updates the references to something that's been moved, does it update pointers too?
No. That's why we make you mark it as unsafe.
I'm guessing no, which would make this construction very unsafe, right?
Correct.
I'm sure there are way superior alternatives to doing this, but call it morbid curiosity.
There certainly are.
is the pointer guaranteed to keep pointing to whatever you originally pointed it to?
Not unless you ensure that happens. There are two ways to do that.
Way one: Tell the garbage collector to not move the memory. There are two ways to do that:
Fix a variable in place with the "fixed" statement.
Use interop services to create a gc handle to the structures you wish to keep alive and in one place.
Doing either of these things will with high likelihood wreck the performance of the garbage collector.
Way two: Don't take references to memory that the garbage collector can possibly move. There are two ways to do that:
Only take addresses of local variables, value parameters, or stack-allocated blocks. Of course, in doing so you are then required to ensure that the pointers do not survive longer than the relevant stack frame, otherwise, you're referencing garbage.
Allocate a block out of the unmanaged heap and then use pointers inside that block. In essence, implement your own memory manager. You are required to correctly implement your new custom memory manager. Be careful.
Some obvious integrity checks have been excluded. The obvious problem with this is you have to allocate more than you will need because you cannot reallocate the buffer as the keyword fixed implies.
public unsafe class NodeList
{
fixed Node _Nodes[1024];
Node* _Current;
public NodeList(params String[] data)
{
for (int i = 0; i < data.Length; i++)
{
_Nodes[i].Data = data[i];
_Nodes[i].Next = (i < data.Length ? &_Nodes[i + 1] : null);
}
_Current = &_Nodes[0];
}
public Node* Current()
{
return _Current++;
}
}
public unsafe struct Node
{
public String Data;
public Node* Next;
}
Why not:
struct Node
{
public Node NextNode;
}
or at least:
struct Node
{
public IntPtr NextNode;
}
You could use the fixed statement to prevent the GC to move pointers around.
Yes, the garbage collector can move the objects around and, no, it will not update your pointers. You need to fix the objects you point to. More information can be found on this memory management explanation.
You can fix objects like this:
unsafe {
fixed (byte* pPtr = object) {
// This will fix object in the memory
}
}
}
The advantages of pointers are usually performance and interaction with other unsafe code. There will be no out-of-bounds checks etc, speeding up your code. But just as if you were programming in e.g. C you have to be very careful of what you are doing.
A dangerous idea, but it may work:
When your array of structs exceeds a certain size (85000 bytes) it will be allocated on the Large Object Heap where blocks are scanned and collected but not moved...
The linked article points out the danger that a newer CLR version might move stuff on the LOH...

Pinning a delegate within a struct before passing to unmanaged code

I'm trying to use an unmanaged C dll for loading image data into a C# application. The library has a fairly simple interface where you pass in a struct that contains three callbacks, one to receive the size of the image, one that receives each row of the pixels and finally one called when the load is completed. Like this (C# managed definition):
[System.Runtime.InteropServices.StructLayoutAttribute(System.Runtime.InteropServices.LayoutKind.Sequential)]
public struct st_ImageProtocol
{
public st_ImageProtocol_done Done;
public st_ImageProtocol_setSize SetSize;
public st_ImageProtocol_sendLine SendLine;
}
The types starting st_ImageProtocol are delgates:
public delegate int st_ImageProtocol_sendLine(System.IntPtr localData, int rowNumber, System.IntPtr pixelData);
With the test file that I'm using the SetSize should get called once, then the SendLine will get called 200 times (once for each row of pixels in the image), finally the Done callback gets triggered. What actually happens is that the SendLine is called 19 times and then a AccessViolationException is thrown claiming that the library tried to access protected memory.
I have access to the code of the C library (though I can't change the functionality) and during the loop where it calls the SendLine method it does not allocate or free any new memory, so my assumption is that the delegate itself is the issue and I need to pin it before I pass it in (I have no code inside the delegate itself currently, besides a counter to see how often it gets called, so I doubt I'm breaking anything on the managed side). The problem is that I don't know how to do this; the method I've been using to declare the structs in unmanaged space doesn't work with delegates (Marshal.AllocHGlobal()) and I can't find any other suitable method. The delegates themselves are static fields in the Program class so they shouldn't be being garbage collected, but I guess the runtime could be moving them.
This blog entry by Chris Brumme says that delegates don't need to be pinned before being passed into unmanaged code:
Clearly the unmanaged function pointer must refer to a fixed address. It would be a disaster if the GC were relocating that! This leads many applications to create a pinning handle for the delegate. This is completely unnecessary. The unmanaged function pointer actually refers to a native code stub that we dynamically generate to perform the transition & marshaling. This stub exists in fixed memory outside of the GC heap.
But I don't know if this holds true when the delegate is part of a struct. It does imply that it is possible to manually pin them though, and I'm interested in how to do this or any better suggestions as to why a loop would run 19 times then suddenly fail.
Thanks.
Edited to answer Johan's questions...
The code that allocates the struct is as follows:
_sendLineFunc = new st_ImageProtocol_sendLine(protocolSendLineStub);
_imageProtocol = new st_ImageProtocol()
{
//Set some other properties...
SendLine = _sendLineFunc
};
int protocolSize = Marshal.SizeOf(_imageProtocol);
_imageProtocolPtr = Marshal.AllocHGlobal(protocolSize);
Marshal.StructureToPtr(_imageProtocol, _imageProtocolPtr, true);
Where the _sendLineFunc and the _imageProtocol variables are both static fields of the Program class. If I understand the internals of this correctly, that means that I'm passing an unmanaged pointer to a copy of the _imageProtocol variable into the C library, but that copy contains a reference to the static _sendLineFunc. This should mean that the copy isn't touched by the GC - since it is unmanaged - and the delegate won't be collected since it is still in scope (static).
The struct actually gets passed to the library as a return value from another callback, but as a pointer:
private static IntPtr beginCallback(IntPtr localData, en_ImageType imageType)
{
return _imageProtocolPtr;
}
Basically there is another struct type that holds the image filename and the function pointer to this callback, the library figures out what type of image is stored in the file and uses this callback to request the correct protocol struct for the given type. My filename struct is declared and managed in the same way as the protocol one above, so probably contains the same mistakes, but since this delegate is only called once and called quickly I haven't had any problems with it yet.
Edited to update
Thanks to everybody for their responses, but after spending another couple of days on the problem and making no progress I decided to shelve it. In case anyone is interested I was attempting write a tool for users of the Lightwave 3D rendering application and a nice feature would have been the ability to view all the different image formats that Lightwave supports (some of which are fairly exotic). I thought that the best way to do this would be to write a C# wrapper for the plugin architecture that Lightwave uses for image manipulation so I could use their code to actually load the files. Unfortunately after trying a number of the plugins against my solution I had a variety of errors that I couldn't understand or fix and my guess is that Lightwave doesn't call the methods on the plugins in a standard way, probably to improve the security of running external code (wild stab in the dark, I admit). For the time being I'm going to drop the image feature and if I do decide to reinstate it I'll approach it in a different way.
Thanks again, I learnt a lot through this process even though I didn't get the result I wanted.
I had a similar problem when registering a callback delegate (it would be called, then poof!). My problem was that the object with the method being delegated was getting GC'ed. I created the object in a more global place so as to keep it from being GC'ed.
If something like that doesn't work, here are some other things to look at:
As additional info, take a look at GetFunctionPointerForDelegate from the Marshal class. That is another way you could do this. Just make sure that the delegates are not GC'ed. Then, instead of delegates in your struct, declare them as IntPtr.
That may not solve the pinning, but take a look at fixed keyword, even though that may not work for you since you are dealing with a longer lifetime than for what that is typically used.
Finally, look at stackalloc for creating non-GC memory. These methods will require the use of unsafe, and might therefore put some other constraints on your Assemblies.
It would be interesting to know a little more:
How do you create the ImageProtocol struct? Is it a local variable or a class member or do you allocate it in unmanaged memory with Marshal.AllocHGlobal?
How is it sent to the C function? Directly as stack variable or as a pointer?
A really tricky problem! It feels like the delegate data is moved around by the GC which causes the access violation. The interesting thing is that the delegate data type is a reference data type, which stores its data on the GC heap. This data contains things like the address of the function to call (function pointer) but also a reference to the object that contains the function. This should mean that even though the actual function code is stored outside of the GC heap, the data that holds the function pointer is stored in the GC heap and can hence be moved by the GC. I thought about the problem a lot last night but haven't come up with a solution....
You don't say exactly how the callback is declared in the C library. Unless it is explictly declared __stdcall you'll slowly corrupt your stack. You'll see your method get called (probably with the parameters reversed) but at some point in the future the program will crash.
So far as I know there is no way around that, other than writing another callback function in C that sits between the C# code and the library that wants a __cdecl callback.
If the c function is a __cdecl function then you have to use the Attribut
[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
before the delegate declaration.

Categories