Keep space for arrays/buffers for repeated use - c#

I'm using large arrays (many MB per array) but at any one time there is only one array - one gets disposed and another created to take its place. They are not of equal length, but the length does have an upper bound.
Instead of having a new array allocated every time, is there a way to allocate space for the largest array possible (which I can find out) and use whatever length of that needed to create every new array. I can't use the exact same array and use a variable for the length, because I need to feed the array on to other methods which I do not control, which need to be exactly the length of the data contained in them (which is not constant). I remember reading about some class that can do that and we ask it for a buffer and then return it to the class.

You can create your own memory manager to create a newest array when the one you have is too small or return the previously allocated one.
You can also use an InMemoryRandomAccessStream to store your data. This stream will resize itself to hold the data you have to store.
Using DataWriter or DataReader, you can then easily insert/read data to/from the stream.
To get a input or output stream from the InMemoryRandomAccessStream, you can use : GttInputStreamAt(0) or GetOutputStreamAt(0)

Related

Writing disparate objects to a single file

This is an offshoot of my question found at Pulling objects from binary file and putting in List<T> and the question posed by David Torrey at Serializing and Deserializing Multiple Objects.
I don't know if this is even possible or not. Say that I have an application that uses five different classes to contain the necessary information to do whatever it is that the application does. The classes do not refer to each other and have various numbers of internal variables and the like. I would want to save the collection of objects spawned from these classes into a single save file and then be able to read them back out. Since the objects are created from different classes they can't go into a list to be sent to disc. I originally thought that using something like sizeof(this) to record the size of the object to record in a table that is saved at the beginning of a file and then have a common GetObjectType() that returns an actionable value as the kind of object it is would have worked, but apparently sizeof doesn't work the way I thought it would and so now I'm back at square 1.
Several options: wrap all of your objects in a larger object and serialize that. The problem with that is that you can't deserialize just one object, you have to load all of them. If you have 10 objects that each is several megs, that isn't a good idea. You want random access to any of the objects, but you don't know the offsets in the file. The "wrapper" objcect can be something as simple as a List<object>, but I'd use my own object for better type safety.
Second option: use a fixed size file header/footer. Serialize each object to a MemoryStream, and then dump the memory streams from the individual objects into the file, remembering the number of bytes in each. Finally add a fixed size block at the start or end of the file to record the offsets in the file where the individual objects begin. In the example below the header of the file has first the number of objects in the file (as an integer), then one integer for each object giving the size of each object.
// Pseudocode to serialize two objects into the same file
// First serialize to memory
byte[] bytes1 = object1.Serialize();
byte[] bytes2 = object2.Serialize();
// Write header:
file.Write(2); // Number of objects
file.Write(bytes1.Length); // Size of first object
file.Write(bytes2.Length); // Size of second object
// Write data:
file.Write(bytes1);
file.Write(bytes2);
The offset of objectN, is the size of the header PLUS the sum of all sizes up to N. So in this example, to read the file, you read the header like so (pseudo, again)
numObjs = file.readInt();
for(i=0..numObjs)
size[i] = file.readInt();
After which you can compute and seek to the correct location for object N.
Third option: Use a format that does option #2 for you, for example zip. In .NET you can use System.IO.Packaging to create a structured zip (OPC format), or you can use a third party zip library if you want to roll your own zip format.

Can I create a new array reference which contains a subset of an existing array, without copying memory?

I am building a file reader in C#, and large volumes of data will be enumerated. I want to use the same buffer for each element I read out, then pass the buffer on for further processing by the client. The API would be cleaner if I could return a byte[] of the correct size, rather than the raw buffer and a length.
Is it possible to do this in C# without copying memory?
You can use ArraySegment<T>
http://msdn.microsoft.com/en-us/library/1hsbd92d.aspx
That lets you specify the start and end of the segment you want to pass on without copying any data.
If you can change your API parameter types, I think you could use an ArraySegment.
ArraySegment type is a generic struct that allows us to store information about an array range. It is useful for storing array ranges. The ArraySegment facilitates optimizations that reduce memory copying and heap allocations.
From MSDN;
The Array property returns the entire original array, not a copy of
the array; therefore, changes made to the array returned by the Array
property are made to the original array.
Here is a DEMO.

Best Way to Represent Large Byte Array

I'm looking for the most efficient way to store and manage a large byte array in memory. I will have the need to both insert and delete bytes from any position within the array.
At first, I was thinking that a regular array was best.
byte[] buffer = new byte[ArraySize];
This would allow me to access any byte within the array. I can also resize the array. However, there doesn't appear to be any built-in support for shifting or moving items within the array.
One option is to have a loop to move items one by one but that sounds horribly inefficient in C#. Another option is to create a new array and copy bytes over to the correct position, but that requires copying all data in the array.
Is there no better option?
Actually, I just found the Buffer Class, which appears ideal for what I need.
It looks like the BlockCopy method will block copy a bunch of items and supports copying within the same array, and even correctly handles overlapping items.
I think the best option in this case is a hybrid between a regular array and a list. This would only be necessary with megabyte sized arrays though.
So you could do something like this:
List<byte[]> buffer;
And have each element of the list just a chunk of the data(say 64K or something small and manageable)
It'd require quite a bit of custom code, but would definitely be the fastest option when having to shift data around in a large array.
Also, if you're doing a lot more shifting of bytes than anything else, LinkedList<T> may work better (but it's famously bad for everything but a specific set of cases)
To clarify why this is more correct than an array, consider inserting 1 byte to the beginning of an array. You must allocate another array (double memory consumption) and then copy every byte to the new array after inserting the new byte, and then free the old array (possible heap corruption depending on size)
Consider now this method with lists.
If you have to insert a lot of bytes, you'll probably want to insert at the beginning of the buffer list. This is an O(n) operation, so your ending efficiency for this operation is O(n/CHUNK_SIZE)
Or, if you just need to insert a single byte, you can just get the first element of the list and copy the array as normal. Then, the speed is O(CHUNK_SIZE), which isn't horrible, especially if n in comparison is very large (megabytes of data)

Resizing big arrays

I have an array which size is like 2 GB (filled with audio samples). Now I want to apply a filter for that array. This filter is generating like 50% more samples than input source. So now I need to create new array which size is 3 GB. Now I gave 5 GB of memory used. But if this filter can operate only at that source array and only need some more space in this array.
Question: can I allocate a memory in C# that can be resized w/o creating a second memory block, then removing that first one?
I just thought, If memory in PC's is divided into 4 kB pages (or more), so why C# cannot (?) use that good feature?
If your filter can work in-place just allocate 50% more space at the beginning. All you need to know is the actual length of the original sample.
If that code doesn't work always and you don't want to consume more memory beforehand, you can allocate half of the original array (the extension array) and check which part your access relates to:
byte[] myOriginalArray = new byte[2GB]; // previously allocated
byte[] myExtensionArray = new byte[1GB]; // 50% of the original
for(... my processing code of the array ...)
{
byte value = read(index);
... process the index and the value here
store(index, value);
}
byte read(int index)
{
if(index < 2GB) return myOriginalArray[index];
return myExtensionArray[index - 2GB];
}
void store(int index, byte value)
{
if(index < 2GB) myOriginalArray[index] = value;
myExtensionArray[index - 2GB] = value;
}
You add index check and subtraction overhead for each access to the array. That could also be made smarter for certain cases. For instance for the portion you do not need to access extension you can use your faster loop and for the part where you need to write to extension part you can use the slower version (two consecutive loops).
Question: can I allocate a memory in C# that can be resized w/o creating a second memory block, then removing that first one?
No, you cannot resize an array in .NET. If you want to increase the size of an array you will have to create a new and bigger array and copy all the data from the existing array to the new array.
To get around this problem you could provide your own "array" implementation based on allocating smaller chunks of memory but presenting it as one big buffer of data. An example of this is StringBuilder that is based on an implementation of chunks of characters, each chunk being a separate Char[] array.
Another option is to use P/Invoke to get access to low level memory management functions like VirtualAlloc that allows you to reserve pages of memory in advance. You need to do this in a 64 bit process because the virtual address space of a 32 bit process is only 4 GB. You probably also need to work with unsafe code and pointers.

C# Increasing an array by one element at the end

In my program I have a bunch of growing arrays where a new element is grown one by one to the end of the array. I identified Lists to be a speed bottleneck in a critical part of my program due to their slow access time in comparison with an array - switching to an array increased performance tremendously to an acceptable level. So to grow the array i'm using Array.Resize. This works well as my implementation restricts the array size to approximately 20 elements, so the O(N) performance of Array.Resize is bounded.
But it would be better if there was a way to just increase an array by one element at the end without having to use Array.Resize; which I believe does a copy of the old array to the newly sized array.
So my question is, is there a more efficiant method for adding one element to the end of an array without using List or Array.Resize?
A List has constant time access just like an array. For 'growing arrays' you really should be using List.
When you know that you may be adding elements to an array backed structure, you don't want to add one new size at a time. Usually it is best to grow an array by doubling it's size when it fills up.
As has been previously mentioned, List<T> is what you are looking for. If you know the initial size of the list, you can supply an initial capacity to the constructor, which will increase your performance for your initial allocations:
List<int> values = new List<int>(5);
values.Add(1);
values.Add(2);
values.Add(3);
values.Add(4);
values.Add(5);
List's allocate 4 elements to begin with (unless you specify a capacity when you construct it) and then grow every 4 elements.
Why don't you try a similar thing with Array? I.e. create it as having 4 elements, then when you insert the fifth element, first grow the array by another 4 elements.
There is no way to resize an array, so the only way to get a larger array is to use Array.Resize to create a new array.
Why not just create the arrays to have 20 elements from start (or whatever capacity you need at most), and use a variable to keep track of how many elements are used in the array? That way you never have to resize any arrays.
Growing an array AFAIK means that a new array is allocated, the existing content being copied to the new instance. I doubt that this should be faster than using List...?
it's much faster to resize an array in chunks (like 10) and store this as a seperate variable e.g capacity and then only resize the array when the capacity is reached. This is how a list works but if you prefer to use arrays then you should look into resizing them in larger chunks especially if you have a large number of Array.Resize calls
I think that every method, that wants to use array, will not be ever optimized because an array is a static structure so I think it's better to use dynamic structures like List or others.

Categories