I have kind of an issue. I am trying to pin a jagged array (which i am using due to the sheer size of the data i am handling):
public void ExampleCode(double[][] variables) {
int nbObservants = variables.Length;
var allHandles = new List<GCHandle>();
double*[] observationsPointersTable = new double*[nbObservants];
double** observationsPointer;
GCHandle handle;
for (int i = 0; i < nbObservants; i++) {
handle = GCHandle.Alloc(variables[i], GCHandleType.Pinned);
allHandles.Add(handle);
observationsPointersTable[i] = (double*) handle.AddrOfPinnedObject(); // no prob here
}
fixed(double** obsPtr = observationsPointersTable) { // works just fine
Console.WriteLine("haha {0}", obsPtr[0][0]);
}
handle = GCHandle.Alloc(observationsPointersTable, GCHandleType.Pinned); // won't work
allHandles.Add(handle);
observationsPointer = (double**) handle.AddrOfPinnedObject();
// ...
foreach (var aHandle in allHandles) {
aHandle.Free();
}
allHandles.Clear();
}
I need to use these double** in multiple parts of my code, and don't really want to explicitly pin them every time I need to use them. It seems to me that, as I can fix them through the usual fixed statement, I should be able to allocate a pinned handle to them.
Is there any way to actually pin a double*[] ?
Apparently there is no way to do this. I therefore settled for using fixed statements instead of allocating pinned handles.
Related
I apologize in advance. My domain is mostly C (and C++). I'm trying to write something similar in C#. Let me explain with code.
In C++, I can use large static arrays that are processed during compile-time and stored in a read-only section of the PE file. For instance:
typedef struct _MY_ASSOC{
const char* name;
unsigned int value;
}MY_ASSOC, *LPMY_ASSOC;
bool GetValueForName(const char* pName, unsigned int* pnOutValue = nullptr)
{
bool bResult = false;
unsigned int nValue = 0;
static const MY_ASSOC all_assoc[] = {
{"name1", 123},
{"name2", 213},
{"name3", 1433},
//... more to follow
{"nameN", 12837},
};
for(size_t i = 0; i < _countof(all_assoc); i++)
{
if(strcmp(all_assoc[i].name, pName) == 0)
{
nValue = all_assoc[i].value;
bResult = true;
break;
}
}
if(pnOutValue)
*pnOutValue = nValue;
return bResult;
}
In the example above, the initialization of static const MY_ASSOC all_assoc is never called at run-time. It is entirely processed during the compile-time.
Now if I write something similar in C#:
public struct NameValue
{
public string name;
public uint value;
}
private static readonly NameValue[] g_arrNV_Assoc = new NameValue[] {
new NameValue() { name = "name1", value = 123 },
new NameValue() { name = "name2", value = 213 },
new NameValue() { name = "name3", value = 1433 },
// ... more to follow
new NameValue() { name = "nameN", value = 12837 },
};
public static bool GetValueForName(string name, out uint nOutValue)
{
foreach (NameValue nv in g_arrNV_Assoc)
{
if (name == nv.name)
{
nOutValue = nv.value;
return true;
}
}
nOutValue = 0;
return false;
}
The line private static readonly NameValue[] g_arrNV_Assoc has to be called once during the host class initialization, and it is done for every single element in that array!
So my question -- can I somehow optimize it so that the data stored in g_arrNV_Assoc array is stored in the PE section and not initialized at run-time?
PS. I hope I'm clear for the .NET folks with my terminology.
Indeed the terminology is sufficient enough, large static array is fine.
There is nothing you can really do to make it more efficient out of the box.
It will load initially once (at different times depending on which version of .net and if you have a static constructor). However, it will load before you call it.
Even if you created it empty with just the predetermined size, the CLR is still going to initialize each element to default, then you would have to buffer copy over your data somehow which in turn will have to be loaded from file.
The question are though
How much overhead does loading the default static array of struct actually have compared to what you are doing in C
Does it matter when in the lifecycle of the application when its loaded
And if this is way too much over-head (which i have already assumed you have determined), what other options are possibly available outside the box?
You could possibly pre-allocate a chunk of unmanaged memory, then read and copy the bytes in from somewhere, then inturn access using pointers.
You could also create this in a standard Dll, Pinvoke just like an other un-managed DLL. However i'm not really sure you will get much of a free-lunch here anyway, as there is overhead to marshal these sorts of calls to load your dll.
If your question is only academic, these are really your only options. However if this is actually a performance problem you have, you will need to try and benchmark this for micro-optimization and try to figure out what is suitable to you.
Anyway, i don't profess to know everything, maybe someone else has a better idea or more information. Good luck
In the worst case, does this sample allocate testCnt * xArray.Length storage in the GPU global memory? How to make sure just one copy of the array is transferred to the device? The GpuManaged attribute seems to serve this purpose but it doesn't solve our unexpected memory consumption.
void Worker(int ix, byte[] array)
{
// process array - only read access
}
void Run()
{
var xArray = new byte[100];
var testCnt = 10;
Gpu.Default.For(0, testCnt, ix => Worker(ix, xArray));
}
EDIT
The main question in a more precise form:
Does each worker thread get a fresh copy of xArray or is there only one copy of xArray for all threads?
Your sample code should allocate 100 bytes of memory on the GPU and 100 bytes of memory on the CPU.
(.Net adds a bit of overhead, but we can ignore that)
Since you're using implicit memory, some resources need to be allocated to track that memory, (basically where it lives: CPU/GPU).
Now... You're probably seeing a bigger memory consumption on the CPU side I assume.
The reason for that is possibly due to kernel compilation happening on the fly.
AleaGPU has to compile your IL code into LLVM, that LLVM is fed into the Cuda compiler which in turn converts it into PTX.
This happens when you run a kernel for the first time.
All of the resources and unmanaged dlls are loaded into memory.
That's possibly what you're seeing.
testCnt has no effect on the amount of memory being allocated.
EDIT*
One suggestion is to use memory in an explicit way.
Its faster and more efficient:
private static void Run()
{
var input = Gpu.Default.AllocateDevice<byte>(100);
var deviceptr = input.Ptr;
Gpu.Default.For(0, input.Length, i => Worker(i, deviceptr));
Console.WriteLine(string.Join(", ", Gpu.CopyToHost(input)));
}
private static void Worker(int ix, deviceptr<byte> array)
{
array[ix] = 10;
}
Try use explicit memory:
static void Worker(int ix, byte[] array)
{
// you must write something back, note, I changed your Worker
// function to static!
array[ix] += 1uy;
}
void Run()
{
var gpu = Gpu.Default;
var hostArray = new byte[100];
// set your host array
var deviceArray = gpu.Allocate<byte>(100);
// deviceArray is of type byte[], but deviceArray.Length = 0,
assert deviceArray.Length == 0
assert Gpu.ArrayGetLength(deviceArray) == 100
Gpu.Copy(hostArray, deviceArray);
var testCnt = 10;
gpu.For(0, testCnt, ix => Worker(ix, deviceArray));
// you must copy memory back
Gpu.Copy(deviceArray, hostArray);
// check your result in hostArray
Gpu.Free(deviceArray);
}
I have recently encountered a problem about the memory my program used. The reason is the memory of an array of string i used in a method. More specifically, this program is to read an integer array from a outside file. Here is my code
class Program
{
static void Main(string[] args)
{
int[] a = loadData();
for (int i = 0; i < a.Length; i++)
{
Console.WriteLine(a[i]);
}
Console.ReadKey();
}
private static int[] loadData()
{
string[] lines = System.IO.File.ReadAllLines(#"F:\data.txt");
int[] a = new int[lines.Length];
for (int i = 0; i < lines.Length; i++)
{
string[] temp = lines[i].Split(new char[]{','},StringSplitOptions.RemoveEmptyEntries);
a[i] = Convert.ToInt32(temp[0]);
}
return a;
}
}
File data.txt is about 7.4 MB and 574285 lines. But when I run, the memory of program shown in task manager is : 41.6 MB. It seems that the memory of the array of string I read in loadData() (it is string[] lines) is not be freed. How can i free it, because it is never used later.
You can call GC.Collect() after setting lines to null, but I suggest you look at all answers here, here and here. Calling GC.Collect() is something that you rarely want to do. The purpose of using a language such as C# is that it manages the memory for you. If you want granular control over the memory read in, then you could create a C++ dll and call into that from your C# code.
Instead of reading the entire file into a string array, you could read it line by line and perform the operations that you need to on that line. That would probably be more efficient as well.
What problem does the 40MB of used memory cause? How often do you read the data? Would it be worth caching it for future use (assuming the 7MB is tolerable).
I have an array list which is continuously updated every second. I have to use the same array list in two other threads and make local copies of it. I have done all of this but i get weird exceptions of index out of bound , What i have found out so far is that i have to ensure some synchronization mechanism for the array list to be used across multiple threads.
this is how i am making it synchronized:
for (int i = 0; i < Globls.iterationCount; i++)
{
if (bw_Obj.CancellationPending)
{
eve.Cancel = true;
break;
}
byte[] rawData4 = DMM4.IO.Read(4 * numReadings);
TempDisplayData_DMM4.Add(rawData4);
Globls.Display_DataDMM4 = ArrayList.Synchronized(TempDisplayData_DMM4);
Globls.Write_DataDMM4 = ArrayList.Synchronized(TempDisplayData_DMM4);
}
in other thread i do the following to make local copies:
ArrayList Local_Write_DMM4 = new ArrayList();
Local_Write_DMM4 = new ArrayList(Globls.Write_DataDMM4);
Am i synchronizing the arraylist in right way?, also do i need to lock while copying array-list as well:
lock (Globls.Display_DataDMM4.SyncRoot){Local_Temp_Display1 = new ArrayList(Globls.Display_DataDMM4);}
or for single operations its safe?. I haven't actually ran this code i need to run it over the weekend and i don't want to see another exception :(. please help me on this!
as #Trickery stated assignment needs to be locked since the source array Globls.Write_DataDMM4 can be modified by another thread during enumeration.
It is essential therefore to lock both when populating the original array and when making your copy
for (int i = 0; i < Globls.iterationCount; i++)
{
if (bw_Obj.CancellationPending)
{
eve.Cancel = true;
break;
}
byte[] rawData4 = DMM4.IO.Read(4 * numReadings);
TempDisplayData_DMM4.Add(rawData4);
lock (Globls.Display_DataDMM4.SyncRoot)
{
Globls.Write_DataDMM4 = ArrayList.Synchronized(TempDisplayData_DMM4);
}
}
and
lock (Globls.Display_DataDMM4.SyncRoot)
{
Local_Temp_Display1 = new ArrayList(Globls.Display_DataDMM4);
}
Yes, all operations on your ArrayList need to use Lock.
EDIT: Sorry, the site won't let me add a comment to your question for some reason.
I have a huge array that is being analyzed differently by two threads:
Data is large- no copies allowed
Threads must process concurrently
Must disable bounds checking for maximum performance
Therefore, each thread looks something like this:
unsafe void Thread(UInt16[] data)
{
fixed(UInt16* pData = data)
{
UInt16* pDataEnd = pData + data.Length;
for(UInt16* pCur=pData; pCur != pDataEnd; pCur++)
{
// do stuff
}
}
}
Since there is no mutex (intentionally), I'm wondering if it's safe to use two fixed statements on the same data on parallel threads?? Presumably the second fixed should return the same pointer as the first, because memory is already pinned... and when the first completes, it won't really unpin memory because there is a second fixed() still active.. Has anyone tried this scenario?
According to "CLR via C#" it is safe to do so.
The compiler sets a 'pinned' flag on pData variable (on the pointer, not on the array instance).
So multiple/recursive use should be OK.
Maybe instead of using fixed, you could use GCHandle.Alloc to pin the array:
// not inside your thread, but were you init your shared array
GCHandle handle = GCHandle.Alloc(anArray, GCHandleType.Pinned);
IntPtr intPtr = handle.AddrOfPinnedObject();
// your thread
void Worker(IntPtr pArray)
{
unsafe
{
UInt16* ptr = (UInt16*) pArray.ToPointer();
....
}
}
If all you need to do is
for(int i = 0; i < data.Length; i++)
{
// do stuff with data[i]
}
the bounds check is eliminated by the JIT compiler. So no need for unsafe code.
Note that this does not hold if your access pattern is more complex than that.