In C#, it's possible to declare a struct (or class) that has a pointer type member, like this:
unsafe struct Node
{
public Node* NextNode;
}
Is it ever safe (err.. ignore for a moment that ironic little unsafe flag..) to use this construction? I mean for longterm storage on the heap. From what I understand, the GC is free to move things around, and while it updates the references to something that's been moved, does it update pointers too? I'm guessing no, which would make this construction very unsafe, right?
I'm sure there are way superior alternatives to doing this, but call it morbid curiosity.
EDIT: There appears to be some confusion. I know that this isn't a great construction, I purely want to know if this is ever a safe construction, ie: is the pointer guaranteed to keep pointing to whatever you originally pointed it to?
The original C-code was used to traverse a tree (depth first) without recursion, where the tree is stored in an array. The array is then traversed by incrementing a pointer, unless a certain condition is met, then the pointer is set to the NextNode, where traversal continues. Of course, the same can in C# be accomplished by:
struct Node
{
public int NextNode;
... // other fields
}
Where the int is the index in the array of the next node. But for performance reasons, I'd end up fiddling with pointers and fixed arrays to avoid bounds checks anyway, and the original C-code seemed more natural.
Is it ever safe to use this construction? I mean for long term storage on the heap.
Yes. Doing so is usually foolish, painful and unnecessary, but it is possible.
From what I understand, the GC is free to move things around, and while it updates the references to something that's been moved, does it update pointers too?
No. That's why we make you mark it as unsafe.
I'm guessing no, which would make this construction very unsafe, right?
Correct.
I'm sure there are way superior alternatives to doing this, but call it morbid curiosity.
There certainly are.
is the pointer guaranteed to keep pointing to whatever you originally pointed it to?
Not unless you ensure that happens. There are two ways to do that.
Way one: Tell the garbage collector to not move the memory. There are two ways to do that:
Fix a variable in place with the "fixed" statement.
Use interop services to create a gc handle to the structures you wish to keep alive and in one place.
Doing either of these things will with high likelihood wreck the performance of the garbage collector.
Way two: Don't take references to memory that the garbage collector can possibly move. There are two ways to do that:
Only take addresses of local variables, value parameters, or stack-allocated blocks. Of course, in doing so you are then required to ensure that the pointers do not survive longer than the relevant stack frame, otherwise, you're referencing garbage.
Allocate a block out of the unmanaged heap and then use pointers inside that block. In essence, implement your own memory manager. You are required to correctly implement your new custom memory manager. Be careful.
Some obvious integrity checks have been excluded. The obvious problem with this is you have to allocate more than you will need because you cannot reallocate the buffer as the keyword fixed implies.
public unsafe class NodeList
{
fixed Node _Nodes[1024];
Node* _Current;
public NodeList(params String[] data)
{
for (int i = 0; i < data.Length; i++)
{
_Nodes[i].Data = data[i];
_Nodes[i].Next = (i < data.Length ? &_Nodes[i + 1] : null);
}
_Current = &_Nodes[0];
}
public Node* Current()
{
return _Current++;
}
}
public unsafe struct Node
{
public String Data;
public Node* Next;
}
Why not:
struct Node
{
public Node NextNode;
}
or at least:
struct Node
{
public IntPtr NextNode;
}
You could use the fixed statement to prevent the GC to move pointers around.
Yes, the garbage collector can move the objects around and, no, it will not update your pointers. You need to fix the objects you point to. More information can be found on this memory management explanation.
You can fix objects like this:
unsafe {
fixed (byte* pPtr = object) {
// This will fix object in the memory
}
}
}
The advantages of pointers are usually performance and interaction with other unsafe code. There will be no out-of-bounds checks etc, speeding up your code. But just as if you were programming in e.g. C you have to be very careful of what you are doing.
A dangerous idea, but it may work:
When your array of structs exceeds a certain size (85000 bytes) it will be allocated on the Large Object Heap where blocks are scanned and collected but not moved...
The linked article points out the danger that a newer CLR version might move stuff on the LOH...
Related
I was reading a description of some code written in C that gains speed due to allocating temporary arrays on the stack instead of the heap for use in very hot loops. (It was described as being similar to SBO optimization). The object in question is similar to a List<T> in that it's just an array with some basic convenience functionality on top. It allocates a small section of memory to use, and if the list is expanded past the size of the array, it allocates a new array on the heap, copies the data, and updates the pointer.
I would like to do the same thing in C#, but I'm not sure how to accomplish it as I want to keep this in a safe context so I can't use a pointer to update the data reference if its expanded, and Span<int> doesn't have an implicit cast to int[]. Specifically:
stackalloc memory is released on method exit, so I'm not sure if there's a simpler way to use a struct like this than giving it a Span field and assigning it after creating within the method using it.
How do I neatly switch between using backing fields of different types (Span and int[]) without changing the public-facing interface?
I managed to come up with a solution, not sure if it's the best implementation, but it seems to work. I also have a couple of alternatives.
Note: This is useful for increasing speed only when you have a function that needs to create a temporary array and is called very frequently. The ability to switch to a heap allocated object is just a fallback in case you overrun the buffer.
Option 1 - Using Span and stackalloc
If you're building to .NET Core 2.1 or later, .NET Standard 2.1 or later, or can use NuGet to use the System.Memory package, the solution is really simple.
Instead of a class, use a ref struct (this is necessary to have a Span<T> field, and neither can leave the method where they're declared. If you need a long-lived class, then there's no reason to try to allocate on the stack since you'll just have to move it to the heap anyway.)
public ref struct SmallList
{
private Span<int> data;
private int count;
//...
}
Then add in all your list functionality. Add(), Remove(), etc. In Add or any functions that might expand the list, add a check to make sure you don't overrun the span.
if (count == data.Length)
{
int[] newArray = new int[data.Length * 2]; //double the capacity
Array.Copy(data.ToArray(), 0, new_array, 0, cap);
data = new_array; //Implicit cast! Easy peasy!
}
Span<T> can be used to work with stack allocated memory, but it can also point to heap allocated memory. So if you can't guarantee your list will always be small enough to fit in the stack, the snippet above gives you a nice fallback that shouldn't happen frequently enough to cause noticeable problems. If it is, either increase the initial stack allocation size (within reason, don't overflow!), or use another solution like an array pool.
Using the struct just requires an extra line and a constructor that takes a span to assign to the data field. Not sure if there's a way to do it all in one shot, but it's easy enough:
Span<int> span = stackalloc int[32];
SmallList list = new SmallList(span);
And if you need to use it in a nested function (which was part of my issue) you just pass it in as a parameter instead of having the nested function return a list.
void DoStuff(SmallList results) { /* do stuff */ }
DoStuff(list);
//use results...
Option 2: ArrayPool
The System.Memory package also includes the ArrayPool class, which lets you store a pool of small arrays that your class/struct could take out without bothering the garbage collector. This has comparable speed depending on the use case. It also has the benefit that it would work for classes that have to live beyond a single method. It's also fairly easy to write your own if you can't use System.Memory.
Option 3: Pointers
You can do something like this with pointers and other unsafe code, but the question was technically asking about safe code. I just like my lists to be thorough.
Option 4: Without System.Memory
If, like me, you're using Unity / Mono, you can't use System.Memory and related features until at least 2021. Which leaves you to roll your own solution. An array pool is fairly straightforward to implement, and does the job of avoiding garbage allocations. A stack allocated array is a bit more complicated.
Luckily, someone has already done it, specifically with Unity in mind. The page linked is quite long, but includes both sample code demonstrating the concept and a code generation tool that can make a SmallBuffer class specific to your exact use case. The basic idea is to just create a struct with individual variables that you index as if they were an array.
Update: I tried both these solutions and the array pool was slightly faster (and a lot easier) than the SmallBuffer in my case, so remember to profile!
In C and C++ languages the developer defines in which memory an object is going to be instantiated: stack or heap.
In C# you it is determined by the author of the data type.
You can achieve your goal using Span and pointers. https://learn.microsoft.com/en-us/dotnet/api/system.span-1?view=netcore-3.1.
But I would not recommend you to do that, because your code is not safe. Meaning that CLR gives you all the responsibility to manage it, at least clean the memory, when you do not need such object anymore. Usually the C# developers come to such tricks, when they want to optimise really big data collections, which allocates a lot of memory in the heap.
If it is still what you are looking for - than, probably, C# is not the best option to use.
Even more, if you have a big collection and somehow you find the way how to put it in stack memory - you can easily face StackOverflowException.
Say I have a structure with several members eg.
private struct MyStats
{
public int packsGood, packsBad, packsTotal;
public bool haveAcceptedStart;
// ...and a bunch of other parameters
}
If I have a member variable of type MyStats, how do I use it for a while and then quickly clear it? In Delphi or C++ I might have used the following code:
memset(&m_stats, 0, sizeof(MyStats)); // C++
ZeroMemory(#m_stats, SizeOf(MyStats)); // Delphi
but that doesn't appear possible in C#. Surely I don't have to rattle through all members each time or PInvoke an API call?
Memory Management should generally not be one of your concerns in C#, as opposed to C or C++. There is the garbage collector to take care of memory clearing. As soon as all the references to your struct have disappeared, the garbage collector will reclaim the memory somewhere in the future. When exactly is irrelevant for a developer.
If you have a field in your class just to be used for a while an clearing it then your class isn't properly designed and does not comply with OOP principles.
If you are using that field just as an auxiliary value you should just replace it with a variable in every scope it is used; in this way you'll reduce the number of daunting side effects and since it's a value type it will be allocated on the stack and cleared as soon as the control flow leaves the current block (given you don't capture it in a lambda expression).
Is this going to break? It compiles fine but based on readings, I'm unsure if its guaranteed that _ptRef will always point to the struct referenced in the constructor.
I guess by 'break' I mean...will the GC move the struct pointed to by the pointer (_ptRef)?
public unsafe class CPointType0
{
private PointType0* _ptRef = null;
public CPointType0(ref PointType0 spt)
{
fixed (PointType0 * pt = &spt)
{
_ptRef = pt;
}
}
...a bunch of property accessors to fields of _ptRef (accessed as return _ptRef->Thing) }
The scenario is
-PointType0 is a struct.
-millions of PointType0's in memory in a data structure. These used to be reference types but there gets to be way too much memory overhead.
-A List is returned only when a search operation finds a relevant PointType0, and this List is passed around and operated on a lot.
It's not safe.
After the code leaves the fixed block, the garbage collector is free to move things around again. What are you trying to accomplish here? Do you maybe want to use the index of an item in the list, instead of a pointer?
Curious about the reputed performance gains in xobotos, I checked out the binary tree benchmark code.
The Java version of the binary tree node is:
private static class TreeNode
{
private TreeNode left, right;
private int item;
}
The C# version is:
struct TreeNode
{
class Next
{
public TreeNode left, right;
}
private Next next;
private int item;
}
I'm wondering what the benefit of using a struct here is, since the Next and Previous pointers are still encapsulated in a class.
Well, there is one - leaf nodes are pure value types since they don't need left and right pointers. In a typical binary tree where half the nodes are leaves, that means a 50% reduction in the number of objects. Still, the performance gains listed seem far greater.
Question: Is there more to this?
Also, since I wouldn't have thought of defining tree nodes this way in C# (thanks Xamarin!) what other data structures can benefit from using structs in a non-obvious way? (Even though that's a bit off-topic and open ended.)
I just ran across this odd code and had the same question. If you change the code to match the Java version it will run just slightly slower. I believe most of the 'struct TreeNode' will get boxed and allocated anyway, except for the bottom row. However, each node results in 2 allocations: boxed TreeNode and class Next. The allocation savings quickly disappear. IMO, this is not an appropriate use of struct.
This groups two nodes into one allocation, ideally halving the number of total allocations.
IMO this makes the comparision pretty meaningless.
Structures can be allocated on stack instead of the heap (not in every case though), which means that they are de-allocated as soon as they go out of scope - the Garbage Collector does not get involved in that scenario. That can result in smaller memory pressure and less garbage collections. Also, stack is (mostly) contigous area of memory, so access to it has better locality, which can (again, possibly) improve cache hits ratio on the CPU level.
Microsoft guidelines on choosing between classes and structures:
structures should be small (16 bytes generally) and short lived
they should be immutable
If those are met, then using a structure over a class will result in a performance gain.
I dont think that using struct here makes any difference at all. Especially after looking at source code of TreeNode, where instances of TreeNode are always copied in constructor and recursive bottomUpTree call.
In C#, does the following save any memory?
private List<byte[]> _stream;
public object Stream
{
get
{
if (_stream == null)
{
_stream = new List<byte[]>();
}
return _stream;
}
}
Edit: sorry, I guess I should have been more specific.
Specifically using "object" instead of List... I thought that would kinda clue itself in because it's a weird thing to do.
It saves a very small amount of memory. The amount of memory an empty List<byte[]> is going to take up is byte size.
The reason why is that your reference variable _stream only needs to allocate enough memory to hold a reference to an object. Once an object is allocated, it will take up a certain amount of memory which may grow or shrink over time, such as when new byte[]s are added to the List. However the memory taken up by the reference to that object will remain the same size.
This is simpler and less prone to corner cases that cause you headaches:
private List<byte[]> _stream = new List<byte[]>();
public object Stream
{
get
{
return _stream;
}
}
Although, in most cases it's not really optimal to be returning references to private members when they are collections/arrays, etc. Better to return _stream.AsReadOnlyCollection().
Save memory compared to what?
byte[][] _stream;
maybe? Then no, a List<T> will take up more memory since it is an array at its heart (which isn't necessarily exactly the size of its contents, but usually larger) and some statekeeping needs to be done too.
That is a lazy loading. You will create the stream only when someone requests it. It will not create the stream (in your case a list) unless is required.
One might say that it saves some memory because it will not use any unless required. So before using the stream there is no memory allocated for it.
If your edit indicates that you are asking whether the use of the object keyword instead of List<byte[]> as the type of the property saves memory, no, it doesn't. And your if block only saves a negligible amount of memory (and cpu at instantiation) until the first time the property is called. And it does make the first call to that property slightly slower. Consider returning a null instead if it makes sense for the property. And, like another answerer suggested, it may be better to keep the property read-only unless you'd like other classes to be altering it. In general, I'd say attempts at optimization like this are mostly misguided and make your code less maintainable.
Are you sure a Stream wouldn't be just a byte[] or a List of byte? Or even better, a MemoryStream? :) I think you are somewhat confused, so a bigger example and some scenario details will help a lot.
What are objects really
I'd suggest thinking in objects as structs... and object references as pointers to that structure.
If you instantiate an object you are reserving memory for an "struct" with all its fields (and a reference to the class it's implementing), plus all memory reserved by the constructor (other objects, arrays, etc...).
In List you are reserving memory for state keeping (I don't know how it's implemented in C#) and the initial internal array, maybe of ten references. So... if you count its something like (assuming 32 bits runtime, I'm not a .net specialist):
pointer to class: 4 bytes
pointer to array: 4 bytes
array of initialCapacity references: 40 bytes
So in my estimation it's about 48 bytes. But it depends on the implementation.
As SoloBold says: most of times it's not worthy.