I'm currently working on a ray-tracer in C# as a hobby project. I'm trying to achieve a decent rendering speed by implementing some tricks from a c++ implementation and have run into a spot of trouble.
The objects in the scenes which the ray-tracer renders are stored in a KdTree structure and the tree's nodes are, in turn, stored in an array. The optimization I'm having problems with is while trying to fit as many tree nodes as possible into a cache line. One means of doing this is for nodes to contain a pointer to the left child node only. It is then implicit that the right child follows directly after the left one in the array.
The nodes are structs and during tree construction they are succesfully put into the array by a static memory manager class. When I begin to traverse the tree it, at first, seems to work just fine. Then at a point early in the rendering (about the same place each time), the left child pointer of the root node is suddenly pointing at a null pointer. I have come to the conclusion that the garbage collecter has moved the structs as the array lies on the heap.
I've tried several things to pin the addresses in memory but none of them seems to last for the entire application lifetime as I need. The 'fixed' keyword only seems to help during single method calls and declaring 'fixed' arrays can only be done on simple types which a node isn't. Is there a good way to do this or am I just too far down the path of stuff C# wasn't meant for.
Btw, changing to c++, while perhaps the better choice for a high performance program, is not an option.
Firstly, if you're using C# normally, you can't suddenly get a null reference due to the garbage collector moving stuff, because the garbage collector also updates all references, so you don't need to worry about it moving stuff around.
You can pin things in memory but this may cause more problems than it solves. For one thing, it prevents the garbage collector from compacting memory properly, and may impact performance in that way.
One thing I would say from your post is that using structs may not help performance as you hope. C# fails to inline any method calls involving structs, and even though they've fixed this in their latest runtime beta, structs frequently don't perform that well.
Personally, I would say C++ tricks like this don't generally tend to carry over too well into C#. You may have to learn to let go a bit; there can be other more subtle ways to improve performance ;)
What is your static memory manager actually doing? Unless it is doing something unsafe (P/Invoke, unsafe code), the behaviour you are seeing is a bug in your program, and not due to the behaviour of the CLR.
Secondly, what do you mean by 'pointer', with respect to links between structures? Do you literally mean an unsafe KdTree* pointer? Don't do that. Instead, use an index into the array. Since I expect that all nodes for a single tree are stored in the same array, you won't need a separate reference to the array. Just a single index will do.
Finally, if you really really must use KdTree* pointers, then your static memory manager should allocate a large block using e.g. Marshal.AllocHGlobal or another unmanaged memory source; it should both treat this large block as a KdTree array (i.e. index a KdTree* C-style) and it should suballocate nodes from this array, by bumping a "free" pointer.
If you ever have to resize this array, then you'll need to update all the pointers, of course.
The basic lesson here is that unsafe pointers and managed memory do not mix outside of 'fixed' blocks, which of course have stack frame affinity (i.e. when the function returns, the pinned behaviour goes away). There is a way to pin arbitrary objects, like your array, using GCHandle.Alloc(yourArray, GCHandleType.Pinned), but you almost certainly don't want to go down that route.
You will get more sensible answers if you describe in more detail what you are doing.
If you really want to do this, you can use the GCHandle.Alloc method to specify that a pointer should be pinned without being automatically released at the end of the scope like the fixed statement.
But, as other people have been saying, doing this is putting undue pressure on the garbage collector. What about just creating a struct that holds onto a pair of your nodes and then managing an array of NodePairs rather than an array of nodes?
If you really do want to have completely unmanaged access to a chunk of memory, you would probably be better off allocating the memory directly from the unmanaged heap rather than permanently pinning a part of the managed heap (this prevents the heap from being able to properly compact itself). One quick and simple way to do this would be to use Marshal.AllocHGlobal method.
Is it really prohibitive to store the pair of array reference and index?
What is your static memory manager actually doing? Unless it is doing something unsafe (P/Invoke, unsafe code), the behaviour you are seeing is a bug in your program, and not due to the behaviour of the CLR.
I was in fact speaking about unsafe pointers. What I wanted was something like Marshal.AllocHGlobal, though with a lifetime exceeding a single method call. On reflection it seems that just using an index is the right solution as I might have gotten too caught up in mimicking the c++ code.
One thing I would say from your post is that using structs may not help performance as you hope. C# fails to inline any method calls involving structs, and even though they've fixed this in their latest run-time beta, structs frequently don't perform that well.
I looked into this a bit and I see it has been fixed in .NET 3.5SP1; I assume that's what you were referring to as the run-time beta. In fact, I now understand that this change accounted for a doubling of my rendering speed. Now, structs are aggressively in-lined, improving their performance greatly on X86 systems (X64 had better struct performance in advance).
Related
I know C# gives the programmer the ability to access, use pointers in an unsafe context. But When is this needed?
At what circumstances, using pointers becomes inevitable?
Is it only for performance reasons?
Also why does C# expose this functionality through an unsafe context, and remove all of the managed advantages from it? Is it possible to have use pointers without losing any advantages of managed environment, theoretically?
When is this needed? Under what circumstances does using pointers becomes inevitable?
When the net cost of a managed, safe solution is unacceptable but the net cost of an unsafe solution is acceptable. You can determine the net cost or net benefit by subtracting the total benefits from the total costs. The benefits of an unsafe solution are things like "no time wasted on unnecessary runtime checks to ensure correctness"; the costs are (1) having to write code that is safe even with the managed safety system turned off, and (2) having to deal with potentially making the garbage collector less efficient, because it cannot move around memory that has an unmanaged pointer into it.
Or, if you are the person writing the marshalling layer.
Is it only for performance reasons?
It seems perverse to use pointers in a managed language for reasons other than performance.
You can use the methods in the Marshal class to deal with interoperating with unmanaged code in the vast majority of cases. (There might be a few cases in which it is difficult or impossible to use the marshalling gear to solve an interop problem, but I don't know of any.)
Of course, as I said, if you are the person writing the Marshal class then obviously you don't get to use the marshalling layer to solve your problem. In that case you'd need to implement it using pointers.
Why does C# expose this functionality through an unsafe context, and remove all of the managed advantages from it?
Those managed advantages come with performance costs. For example, every time you ask an array for its tenth element, the runtime needs to do a check to see if there is a tenth element, and throw an exception if there isn't. With pointers that runtime cost is eliminated.
The corresponding developer cost is that if you do it wrong then you get to deal with memory corruption bugs that formats your hard disk and crashes your process an hour later rather than dealing with a nice clean exception at the point of the error.
Is it possible to use pointers without losing any advantages of managed environment, theoretically?
By "advantages" I assume you mean advantages like garbage collection, type safety and referential integrity. Thus your question is essentially "is it in theory possible to turn off the safety system but still get the benefits of the safety system being turned on?" No, clearly it is not. If you turn off that safety system because you don't like how expensive it is then you don't get the benefits of it being on!
Pointers are an inherent contradiction to the managed, garbage-collected, environment.
Once you start messing with raw pointers, the GC has no clue what's going on.
Specifically, it cannot tell whether objects are reachable, since it doesn't know where your pointers are.
It also cannot move objects around in memory, since that would break your pointers.
All of this would be solved by GC-tracked pointers; that's what references are.
You should only use pointers in messy advanced interop scenarios or for highly sophisticated optimization.
If you have to ask, you probably shouldn't.
The GC can move references around; using unsafe keeps an object outside of the GC's control, and avoids this. "Fixed" pins an object, but lets the GC manage the memory.
By definition, if you have a pointer to the address of an object, and the GC moves it, your pointer is no longer valid.
As to why you need pointers: Primary reason is to work with unmanaged DLLs, e.g. those written in C++
Also note, when you pin variables and use pointers, you're more susceptible to heap fragmentation.
Edit
You've touched on the core issue of managed vs. unmanaged code... how does the memory get released?
You can mix code for performance as you describe, you just can't cross managed/unmanaged boundaries with pointers (i.e. you can't use pointers outside of the 'unsafe' context).
As for how they get cleaned... You have to manage your own memory; objects that your pointers point to were created/allocated (usually within the C++ DLL) using (hopefully) CoTaskMemAlloc(), and you have to release that memory in the same manner, calling CoTaskMemFree(), or you'll have a memory leak. Note that only memory allocated with CoTaskMemAlloc() can be freed with CoTaskMemFree().
The other alternative is to expose a method from your native C++ dll that takes a pointer and frees it... this lets the DLL decide how to free the memory, which works best if it used some other method to allocate memory. Most native dlls you work with are third-party dlls that you can't modify, and they don't usually have (that I've seen) such functions to call.
An example of freeing memory, taken from here:
string[] array = new string[2];
array[0] = "hello";
array[1] = "world";
IntPtr ptr = test(array);
string result = Marshal.PtrToStringAuto(ptr);
Marshal.FreeCoTaskMem(ptr);
System.Console.WriteLine(result);
Some more reading material:
C# deallocate memory referenced by IntPtr
The second answer down explains the different allocation/deallocation methods
How to free IntPtr in C#?
Reinforces the need to deallocate in the same manner the memory was allocated
http://msdn.microsoft.com/en-us/library/aa366533%28VS.85%29.aspx
Official MSDN documentation on the various ways to allocate and deallocate memory.
In short... you need to know how the memory was allocated in order to free it.
Edit
If I understand your question correctly, the short answer is yes, you can hand the data off to unmanaged pointers, work with it in an unsafe context, and have the data available once you exit the unsafe context.
The key is that you have to pin the managed object you're referencing with a fixed block. This prevents the memory you're referencing from being moved by the GC while in the unsafe block. There are a number of subtleties involved here, e.g. you can't reassign a pointer initialized in a fixed block... you should read up on unsafe and fixed statements if you're really set on managing your own code.
All that said, the benefits of managing your own objects and using pointers in the manner you describe may not buy you as much of a performance increase as you might think. Reasons why not:
C# is very optimized and very fast
Your pointer code is still generated as IL, which has to be jitted (at which point further optimizations come into play)
You're not turning the Garbage Collector off... you're just keeping the objects you're working with out of the GC's purview. So every 100ms or so, the GC still interrupts your code and executes its functions for all the other variables in your managed code.
HTH,
James
The most common reasons to use pointers explicitly in C#:
doing low-level work (like string manipulation) that is very performance sensitive,
interfacing with unmanaged APIs.
The reason why the syntax associated with pointers was removed from C# (according to my knowledge and viewpoint — Jon Skeet would answer better B-)) was it turned out to be superfluous in most situations.
From the language design perspective, once you manage memory by a garbage collector you have to introduce severe constraints on what is and what is not possible to do with pointers. For example, using a pointer to point into the middle of an object can cause severe problems to the GC. Hence, once the restrictions are in place, you can just omit the extra syntax and end up with “automatic” references.
Also, the ultra-benevolent approach found in C/C++ is a common source of errors. For most situations, where micro-performance doesn't matter at all, it is better to offer tighter rules and constrain the developer in favor of less bugs that would be very hard to discover. Thus for common business applications the so-called “managed” environments like .NET and Java are better suited than languages that presume to work against the bare-metal machine.
Say you want to communicate between 2 application using IPC (shared memory) then you can marshal the data to memory and pass this data pointer to the other application via windows messaging or something. At receiving application you can fetch data back.
Useful also in case of transferring data from .NET to legacy VB6 apps wherein you will marshal the data to memory, pass pointer to VB6 app using win msging, use VB6 copymemory() to fetch data from the managed memory space to VB6 apps unmanaged memory space..
I suspect that the reason this function doesn't exist is that implementing it is complex and that few people need it. To be safe, you'd want pinning to work transitively, i.e., you'd want the entire graph of reachable objects to be pinned. But it doesn't seem like something that fundamentally can't be done.
E.g., suppose you have the following class:
[StructLayout(LayoutKind.Sequential)]
class SomeObject
{
public SomeObject r;
}
which you allocate like:
SomeObject o = new SomeObject();
and you try to pin it with:
GCHandle oh = GCHandle.Alloc(o, GCHandleType.Pinned);
you'll get the dreaded:
Object contains non-primitive or non-blittable data.
OK, fine, I can live with that. But suppose I had access to .NET's garbage collector implementation. What would be the obstacles? Here are the obstacles I see:
Circular references.
You want the garbage collector to limit itself to objects inside the app's heap(s).
It could take a long time.
It would be hard/painful to make the operation atomic.
It seems to me that the GC already has to deal with some of these issues. So what am I forgetting?
NOTE: Before you ask "What are you trying to accomplish?", etc., the purpose of my asking is for research code, not necessarily limited to C#, and not necessarily limited to the CLR. I understand that fiddling with the runtime's own memory is not a typical scenario. In any case, this isn't a purely speculative question.
NOTE 2: Also, I don't care about marshaling. I'm just concerned about pinning.
The GC just knows that whatever you are going to next isn't going to work. You pin memory for a reason, surely it is to obtain a stable IntPtr to the object. Which you then next, say, pass to unmanaged code.
There is however a problem with the content of the pointed-to memory. It contains a pointer to a managed object. That pointer is going to randomly change whenever another thread allocates memory and triggers a collection. Which will play havoc on any code that uses the pinned memory content. There is no way to obtain a stable pointer, you can't "freeze" the collector. Pinning the pointer doesn't work either, it just passes the buck to the next pointed-to object. Hopefully it becomes null sooner or later, it would have to, but GC.Alloc doesn't travel the entire dependency graph to check that, there's no decent upper-bound on how long that can take. Getting an entire generation pinned is possible, that's a very hard deadlock.
Ugly problems, it was just much simpler to forbid it. Not much of a real problem, pinvoke happens everyday anyway.
I'm converting a C# project to C++ and have a question about deleting objects after use. In C# the GC of course takes care of deleting objects, but in C++ it has to be done explicitly using the delete keyword.
My question is, is it ok to just follow each object's usage throughout a method and then delete it as soon as it goes out of scope (ie method end/re-assignment)?
I know though that the GC waits for a certain size of garbage (~1MB) before deleting; does it do this because there is an overhead when using delete?
As this is a game I am creating there will potentially be lots of objects being created and deleted every second, so would it be better to keep track of pointers that go out of scope, and once that size reachs 1MB to then delete the pointers?
(as a side note: later when the game is optimised, objects will be loaded once at startup so there is not much to delete during gameplay)
Your problem is that you are using pointers in C++.
This is a fundamental problem that you must fix, then all your problems go away. As chance would have it, I got so fed up with this general trend that I created a set of presentation slides on this issue. – (CC BY, so feel free to use them).
Have a look at the slides. While they are certainly not entirely serious, the fundamental message is still true: Don’t use pointers. But more accurately, the message should read: Don’t use delete.
In your particular situation you might find yourself with a lot of long-lived small objects. This is indeed a situation which a modern GC handles quite well, and which reference-counting smart pointers (shared_ptr) handle less efficiently. If (and only if!) this becomes a performance problem, consider switching to a small object allocator library.
You should be using RAII as much as possible in C++ so you do not have to explicitly deleteanything anytime.
Once you use RAII through smart pointers and your own resource managing classes every dynamic allocation you make will exist only till there are any possible references to it, You do not have to manage any resources explicitly.
Memory management in C# and C++ is completely different. You shouldn't try to mimic the behavior of .NET's GC in C++. In .NET allocating memory is super fast (basically moving a pointer) whereas freeing it is the heavy task. In C++ allocating memory isn't that lightweight for several reasons, mainly because a large enough chunk of memory has to be found. When memory chunks of different sizes are allocated and freed many times during the execution of the program the heap can get fragmented, containing many small "holes" of free memory. In .NET this won't happen because the GC will compact the heap. Freeing memory in C++ is quite fast, though.
Best practices in .NET don't necessarily work in C++. For example, pooling and reusing objects in .NET isn't recommended most of the time, because the objects get promoted to higher generations by the GC. The GC works best for short lived objects. On the other hand, pooling objects in C++ can be very useful to avoid heap fragmentation. Also, allocating a larger chunk of memory and using placement new can work great for many smaller objects that need to be allocated and freed frequently, as it can occur in games. Read up on general memory management techniques in C++ such as RAII or placement new.
Also, I'd recommend getting the books "Effective C++" and "More effective C++".
Well, the simplest solution might be to just use garbage collection in
C++. The Boehm collector works well, for example. Still, there are
pros and cons (but porting code originally written in C# would be a
likely candidate for a case where the pros largely outweigh the cons.)
Otherwise, if you convert the code to idiomatic C++, there shouldn't be
that many dynamically allocated objects to worry about. Unlike C#, C++
has value semantics by default, and most of your short lived objects
should be simply local variables, possibly copied if they are returned,
but not allocated dynamically. In C++, dynamic allocation is normally
only used for entity objects, whose lifetime depends on external events;
e.g. a Monster is created at some random time, with a probability
depending on the game state, and is deleted at some later time, in
reaction to events which change the game state. In this case, you
delete the object when the monster ceases to be part of the game. In
C#, you probably have a dispose function, or something similar, for
such objects, since they typically have concrete actions which must be
carried out when they cease to exist—things like deregistering as
an Observer, if that's one of the patterns you're using. In C++, this
sort of thing is typically handled by the destructor, and instead of
calling dispose, you call delete the object.
Substituting a shared_ptr in every instance that you use a reference in C# would get you the closest approximation at probably the lowest effort input when converting the code.
However you specifically mention following an objects use through a method and deleteing at the end - a better approach is not to new up the object at all but simply instantiate it inline/on the stack. In fact if you take this approach even for returned objects with the new copy semantics being introduced this becomes an efficient way to deal with returned objects also - so there is no need to use pointers in almost every scenario.
There are a lot more things to take into considerations when deallocating objects than just calling delete whenever it goes out of scope. You have to make sure that you only call delete once and only call it once all pointers to that object have gone out of scope. The garbage collector in .NET handles all of that for you.
The construct that is mostly corresponding to that in C++ is tr1::shared_ptr<> which keeps a reference counter to the object and deallocates when it drops to zero. A first approach to get things running would be to make all C# references in to C++ tr1::shared_ptr<>. Then you can go into those places where it is a performance bottleneck (only after you've verified with a profile that it is an actual bottleneck) and change to more efficient memory handling.
GC feature of c++ has been discussed a lot in SO.
Try Reading through this!!
Garbage Collection in C++
What are the consequences (positive/negative) of using the unsafe keyword in C# to use pointers? For example, what becomes of garbage collection, what are the performance gains/losses, what are the performance gains/losses compared to other languages manual memory management, what are the dangers, in which situation is it really justifiable to make use of this language feature, is it longer to compile... ?
As already mentioned by Conrad, there are some situations where unsafe access to memory in C# is useful. There are not as many of them, but there are some:
Manipulating with Bitmap is almost a typical example where you need some additional performance that you can get by using unsafe.
Interoperability with older API (such as WinAPI or native C/C++ DLLs) is another area where unsafe can be quite useful - for example you may want to call a function that takes/returns an unmanaged pointer.
On the other hand, you can write most of the things using Marshall class, which hides many of the unsafe operations inside method calls. This will be a bit slower, but it is an option if you want to avoid using unsafe (or if you're using VB.NET which doesn't have unsafe)
Positive consequences:
So, the main positive consequences of the existence of unsafe in C# is that you can write some code more easily (interoperability) and you can write some code more efficiently (manipulating with bitmap or maybe some heavy numeric calculations using arrays - although, I'm not so sure about the second one).
Negative consequences: Of course, there is some price that you have to pay for using unsafe:
Non-verifiable code: C# code that is written using the unsafe features becomes non-verifiable, which means that the your code could compromise the runtime in any way. This isn't a big problem in a full-trust scenario (e.g. unrestricted desktop app) - you just don't have all the nice .NET CLR guarantees. However, you cannot run the application in a restricted enviornment such as public web hosting, Silverlight or partial trust (e.g. application running from network).
Garbage collector also needs to be careful when you use unsafe. GC is usually allowed to relocate objects on the managed heap (to keep the memory defragmented). When you take a pointer to some object, you need to use the fixed keyword to tell the GC that it cannot move the object until you finish (which could probably affect the performance of garbage collection - but of course, depending on the exact scenario).
My guess that if C# didn't have to interoperate with older code, it probably wouldn't support unsafe (and research projects like Singularity that attempt to create more verifiable operating system based on managed languages definitely disallow usnsafe code). However, in the real-world, unsafe is useful in some (rare) cases.
I can give you a situation where it was worth using:
I have to generate a bitmap pixel by pixel. Drawing.Bitmap.SetPixel() is way too slow. So I build my own managed Array of bitmap data, and use unsafe to get the IntPtr for Bitmap.Bitmap( Int32, Int32, Int32, PixelFormat, IntPtr).
To quote Professional C# 2008:
"The two main reasons for using
pointers are:
Backward compability - Despite all of the facilities provided by the
.NET-runtime it is still possible to
call native Windows API functions, and
for some operations this may be the
only way to accompling your task.
These API functions are generally
written in C and often require
pointers as parameters. However, in
many cases it is possible to write the
DllImport declaration in a way that
avoids use of pointers; for example,
by using the System.IntPtr class.
Performance - On those occasions where speed is of the utmost
importance, pointer can provide a
route to optimized perfomance. If you
know what you are doing, you can
ensure that data is accessed or
manipulated in the most efficient way.
However, be aware that, more often
than not, there are other areas of
your code where you can make necessary
performance improvemens without
resoirting to pointers. Try using a
code profiler to look for bottlenecks
in your code - one comes with Visual
Studio 2008."
And if you use pointer your code will require higher lever of trust to execute and if the user does not grant that your code will not run.
And wrap it up with a last quote:
"We strongly advice against using
pointers unnecessarily because it will
not only be harder to write and debug,
but it will also fail the memory
type-safety checks imposed by the
CLR."
Garbage Collection is inefficient with long-lived objects. .Net's garbage collector works best when most objects are released rather quickly, and some objects "live forever." The problem is that longer-living objects are only released during full garbage collections, which incurs a significant performance penalty. In essence, long-living objects quickly move into generation 2.
(For more information, you might want to read up on .Net's generational garbage collector: http://msdn.microsoft.com/en-us/library/ms973837.aspx)
In situations where objects, or memory use in general, is going to be long-lived, manual memory management will yield better performance because it can be released to the system without requiring a full garbage collection.
Implementing some kind of a memory management system based around a single large byte array, structs, and lots of pointer arithmetic, could theoretically increase performance in situations where data will be stored in RAM for a long time.
Unfortunately, I'm not aware of a good way to do manual memory management in .Net for objects that are going to be long-lived. This basically means that applications that have long-lived data in RAM will periodically become unresponsive when they run a full garbage collection of all of the memory.
Since RAM seems to be the new disk, and since that statement also means that access to memory is now considered slow similarly to how disk access has always been, I do want to maximize locality of reference in memory for high performance applications. For example, in a sorted index, I want adjacent values to be close (unlike say, in a hashtable), and I want the data the index is pointing to close by, too.
In C, I can whip up a data structure with a specialized memory manager, like the developers of the (immensely complex) Judy array did. With direct control over the pointers, they even went so far as to encode additional information in the pointer value itself. When working in Python, Java or C#, I am deliberately one (or more) level(s) of abstraction away from this type of solution and I'm entrusting the JIT compilers and optimizing runtimes with doing clever tricks on the low levels for me.
Still, I guess, even at this high level of abstraction, there are things that can be semantically considered "closer" and therefore are likely to be actually closer at the low levels. For example, I was wondering about the following (my guess in parentheses):
Can I expect an array to be an adjacent block of memory (yes)?
Are two integers in the same instance closer than two in different instances of the same class (probably)?
Does an object occupy a contigous region in memory (no)?
What's the difference between an array of objects with only two int fields and a single object with two int[] fields? (this example is probably Java specific)
I started wondering about these in a Java context, but my wondering has become more general, so I'd suggest to not treat this as a Java question.
In .NET, elements of an array are certainly contiguous. In Java I'd expect them to be in most implementations, but it appears not to be guaranteed.
I think it's reasonable to assume that the memory used by an instance for fields is in a single block... but don't forget that some of those fields may be references to other objects.
For the Java array part, Sun's JNI documentation includes this comment, tucked away in a discussion about strings:
For example, the Java virtual machine may not store arrays contiguously.
For your last question, if you have two int[] then each of those arrays will be a contiguous block of memory, but they could be very "far apart" in memory. If you have an array of objects with two int fields, then each object could be a long way from each other, but the two integers within each object will be close together. Potentially more importantly, you'll end up taking a lot more memory with the "lots of objects" solution due to the per-object overhead. In .NET you could use a custom struct with two integers instead, and have an array of those - that would keep all the data in one big block.
I believe that in both Java and .NET, if you allocate a lot of smallish objects in quick succession within a single thread then those objects are likely to have good locality of reference. When the GC compacts a heap, this may improve - or it may potentially become worse, if a heap with
A B C D E
is compacted to
A D E B
(where C is collected) - suddenly A and B, which may have been "close" before, are far apart. I don't know whether this actually happens in any garbage collector (there are loads around!) but it's possible.
Basically in a managed environment you don't usually have as much control over locality of reference as you do in an unmanaged environment - you have to trust that the managed environment is sufficiently good at managing it, and that you'll have saved enough time by coding to a higher level platform to let you spend time optimising elsewhere.
First, your title is implying C#. "Managed code" is a term coined by Microsoft, if I'm not mistaken.
Java primitive arrays are guaranteed to be a continuous block of memory. If you have a
int[] array = new int[4];
you can from JNI (native C) get a int *p to point to the actual array. I think this goes for the Array* class of containers as well (ArrayList, ArrayBlockingQueue, etc).
Early implementations of the JVM had objects as contiuous struct, I think, but this cannot be assumed with newer JVMs. (JNI abstracts away this).
Two integers in the same object will as you say probably be "closer", but they may not be. This will probably vary even using the same JVM.
An object with two int fields is an object and I don't think any JVM makes any guarantee that the members will be "close". An int-array with two elements will very likely be backed by a 8 byte long array.
With regards to arrays here is an excerpt from CLI (Common Language Infrastructure) specification:
Array elements shall be laid out
within the array object in row-major
order (i.e., the elements associated
with the rightmost array dimension
shall be laid out contiguously from lowest to highest index). The
actual storage allocated for each
array element can include
platform-specific padding. (The size
of this storage, in bytes, is returned
by the sizeof instruction when it is
applied to the type of that array’s
elements.
Good question! I think I would resort to writing extensions in C++ that handle memory in a more carefully managed way and just exposing enough of an interface to allow the rest of the application to manipulate the objects. If I was that concerned about performance I would probably resort to a C++ extension anyway.
I don't think anyone has talked about Python so I'll have a go
Can I expect an array to be an adjacent block of memory (yes)?
In python arrays are more like arrays of pointers in C. So the pointers will be adjacent, but the actual objects are unlikely to be.
Are two integers in the same instance closer than two in different instances of the same class (probably)?
Probably not for the same reason as above. The instance will only hold pointers to the objects which are the actual integers. Python doesn't have native int (like Java), only boxed Int (in Java-speak).
Does an object occupy a contigous region in memory (no)?
Probably not. However if you use the __slots__ optimisation then some parts of it will be contiguous!
What's the difference between an array of objects with only two int fields and a single object with two int[] fields?
(this example is probably Java specific)
In python, in terms of memory locality, they are both pretty much the same! One will make an array of pointers to objects which will in turn contain two pointers to ints, the other will make two arrays of pointers to integers.
If you need to optimise to that level then I suspect a VM based language is not for you ;)