Using pointers , memory gets overwritten , how to "reserve" memory? - c#

I am trying to create a class that simulates an array and makes use of pointers. Everything works well until my pointer values are overwritten.
Here's a sample of my code , this is the indexer that I use to get / set the values. As I said everything works well until at some point the address is overwritten with some other random values.
How can I fix or "reserve" the space for the length of the "array"?
public int this[int x]
{
get
{
if(x >= _length || x < 0)
{
throw new IndexOutOfRangeException();
}
int* offsetToReturn = indexZeroPointer + x;
return *offsetToReturn;
}
set
{
if (x >= _length || x < 0)
{
throw new IndexOutOfRangeException();
}
int* offset = indexZeroPointer + x;
*offset = value;
}
}
I used as index 0 for the array the address of a random integer I declared in class.
indexZeroPointer = &someValue;

You can't just randomly take addresses of objects in C#. You're in a managed memory environment where the locations in virtual memory space can (and will) change.
By taking the address of a single integer, you're at best getting four bytes of memory to use. You can't just access the memory behind the allocated piece and hope for the best - not only does it change (due to memory relocation), it will also be taken up by others. This is especially true if you got the address from a local, which would be allocated on the stack - you're rewriting the stack willy-nilly.
If you want to use pointers (relatively) safely, you need to ensure that the memory you use is actually allocated, and persisted as long as necessary. For example, if you know the length in advance, you can use this piece of code to get the "zero address":
var _length = 10;
var indexZeroPointer = (int*)Marshal.AllocHGlobal(_length * sizeof(int)).ToPointer();
This is just the very beginning of your problems, though. As soon as you enter pointer territory, you lose all the benefits of dealing with managed memory. You'll need to release memory as necessary, get rid of invalid or dangling pointers, handle all the bounds checking and many others.
This is yet another subject where just feeling your way around is going to hurt you. You really want to learn what you're doing, and how the architecture of the computer and the operating system works, and how all this integrates with the .NET memory model. As you just discovered, unsafe code has a tendency of appearing to work, while causing random issues all over the place if you don't know what you're doing (and even if you do - remember the Heartbleed bug and friends?). Make sure you understand how the lower layers work - by using unsafe code, the abstractions that help you avoid understanding that disappear. Low-level coding isn't very friendly.

Related

Data buffering from a file. code not working as intended

I am not very experienced in C#, but have lots of experience from other languages.
I am doing a project in C# where I have to read and modify large files.
For this I have coded a buffering scheme where I keep chunks of data in memory, and swap them to disk when I need to read more. I always eliminate the [0] element from the array, by moving the following elements back one position.
public struct TBBuffer
{
public long offset;
public short[] data;
public GCHandle dataHandle;
}
//tb is a TBBuffer[], the data[] is initialized to 4096.
If I use a small sample file, where everything fits in the buffers allocated, everything works as intended.
Whenever I need to free up some memory for more data I do:
int bufIdx,bufNo;
for (bufIdx = 0; bufIdx < tb.buffer.Length - 1; bufIdx++)
{
tb.buffer[bufIdx] = tb.buffer[bufIdx + 1];
}
bufNo = tb.Length - 1;
I have determined that the above code is the source of the problem, but I am unable to find out why that is so.
So my question is: Considering the TBBuffer struct, and its contents, does anybody have a clue why this is not working as expected ?
Is there a more efficient way to do this.
Are you looking for array resize?
Array.Resize(ref tb.buffer, newSize);
Just show up your intention to .Net and let it do the work (in the efficient way) for you.
in for loop you use tb.Length. I think it should be tb.buffer.Length:
int bufIdx,bufNo;
for (bufIdx = 0; bufIdx < tb.buffer.Length - 1; bufIdx++)
{
tb.buffer[bufIdx] = tb.buffer[bufIdx + 1];
}
bufNo = tb.Length - 1;
I solved my problem... Rather embarrasing, but the problem was that the data[] of the very last buffer would point at the same as the next-to-last after I did my for () move stunt. After adding a tb.buffer[bufNo].data=new short[4096]; statement everything is back to normal. Thank you for your time, and all the constructive feedback. I will be looking into memory mapped files to see whether this feature will be a better option.

C#/XNA Giant Memory Leak

this is my first post here.
I'm making a game in Visual Studio 2010 using XNA, and i've hit a giant memory leak. My game starts out using 17k ram and then after ten minutes it's upto 65k. I ran some memory profilers, and they all say that new instances of the String object are being created, but they aren't live. The amount of live instances of String hasn't changed at all. It's also creating instances of Char[] (which i'd expect from it), Object[], and StringBuilder. My game is pretty new but there's too much code to post here. I have no idea how to get rid of instances that aren't live, please help!
You haven't posted enough information to provide you with more than an educated guess. Here is my educated guess:
If you are doing things like this in your Draw method:
spriteBatch.DrawString(font, "Score: " + score, location, Color.Black);
spriteBatch.DrawString(font, "Something else: " + foo, overHere, Color.Black);
spriteBatch.DrawString(font, "And also: " + bar, overThere, Color.Black);
Then each of those calls will be creating new string and StringBuilder objects behind your back each time they run. Because they are in your Draw method, each is probably running 60 times per second. That is a lot of temporary objects being allocated!
To verify that this is the case - use the CLR Profiler. It sounds like you have already done this.
While this isn't really a "leak" - the Garbage Collector will eventually clean them up - this allocation pattern is undesirable in a game. See this blog post about two methods for dealing with garbage collection in a game. Method 1 is usually easier and provides better results - so I am discussing it here.
It is worth mentioning at this point that the GC on the PC is fast enough that allocations like this don't really matter. The GC will clean up tiny objects (like your temporary strings) with very little overhead.
On the Xbox 360, on the other hand, even producing tiny amounts of garbage like this regularly can cause some serious performance issues. (I'm not sure about WP7, but I would personally treat it like the Xbox -- with care!)
How do we fix this?
The answer is simple: DrawString will accept an instance of StringBuilder in place of string. Create one instance of StringBuilder and then reuse it each time you need to put together a custom string.
Note that converting a number or other object to a string, implicitly or by its ToString() method will also cause allocations. So you might have to write your own custom code to append to StringBuilder without causing allocations.
Here is one I use, in the form of an extension method, for appending integers to a string without allocating:
public static class StringBuilderExtensions
{
// 11 characters will fit -4294967296
static char[] numberBuffer = new char[11];
/// <summary>Append an integer without generating any garbage.</summary>
public static StringBuilder AppendNumber(this StringBuilder sb, Int32 number)
{
bool negative = (number < 0);
if(negative)
number = -number;
int i = numberBuffer.Length;
do
{
numberBuffer[--i] = (char)('0' + (number % 10));
number /= 10;
}
while(number > 0);
if(negative)
numberBuffer[--i] = '-';
sb.Append(numberBuffer, i, numberBuffer.Length - i);
return sb;
}
}
There are no memory leaks in C# (or well, they're very difficult to get). What you are experiencing is normal. The garbage collector doesn't "feel" like it needs to collect memory, so it does not. Whenever memory runs short, garbage collection will occur. If you're absolutely certain that you aren't keeping unneeded references to your strings, then everything's fine.
If you want to force a GC cycle use GC.Collect().

Struct vs class memory overhead

I'm writing an app that will create thousands of small objects and store them recursively in array. By "recursively" I mean that each instance of K will have an array of K instances which will have and array of K instances and so on, and this array + one int field are the only properties + some methods. I found that memory usage grows very fast for even small amount of data - about 1MB), and when the data I'm processing is about 10MB I get the "OutOfMemoryException", not to mention when it's bigger (I have 4GB of RAM) :). So what do you suggest me to do? I figured, that if I'd create separate class V to process those objects, so that instances of K would have only array of K's + one integer field and make K as a struct, not a class, it should optimize things a bit - no garbage collection and stuff... But it's a bit of a challenge, so I'd rather ask you whether it's a good idea, before I start a total rewrite :).
EDIT:
Ok, some abstract code
public void Add(string word) {
int i;
string shorter;
if (word.Length > 0) {
i = //something, it's really irrelevant
if (t[i] == null) {
t[i] = new MyClass();
}
shorterWord = word.Substring(1);
//end of word
if(shorterWord.Length == 0) {
t[i].WordEnd = END;
}
//saving the word letter by letter
t[i].Add(shorterWord);
}
}
}
For me already when researching deeper into this I had the following assumptions (they may be inexact; i'm getting old for a programmer). A class has extra memory consumption because a reference is required to address it. Store the reference and an Int32 sized pointer is needed on a 32bit compile. Allocated always on the heap (can't remember if C++ has other possibilities, i would venture yes?)
The short answer, found in this article, Object has a 12bytes basic footprint + 4 possibly unused bytes depending on your class (has no doubt something to do with padding).
http://www.codeproject.com/Articles/231120/Reducing-memory-footprint-and-object-instance-size
Other issues you'll run into is Arrays also have an overhead. A possibility would be to manage your own offset into a larger array or arrays. Which in turn is getting closer to something a more efficient language would be better suited for.
I'm not sure if there are libraries that may provide Storage for small objects in an efficient manner. Probably are.
My take on it, use Structs, manage your own offset in a large array, and use proper packing instructions if it serves you (although i suspect this comes at a cost at runtime of a few extra instructions each time you address unevenly packed data)
[StructLayout(LayoutKind.Sequential, Pack = 1)]
Your stack is blowing up.
Do it iteratively instead of recursively.
You're not blowing the system stack up, your blowing the code stack up, 10K function calls will blow it out of the water.
You need proper tail recursion, which is just an iterative hack.
Make sure you have enough memory in your system. Over 100mb+ etc. It really depends on your system. Linked list, recursive objects is what you are looking at. If you keep recursing, it is going to hit the memory limit and nomemoryexception will be thrown. Make sure you keep track of the memory usage on any program. Nothing is unlimited, especially memory. If memory is limited, save it to a disk.
Looks like there is infinite recursion in your code and out of memory is thrown. Check the code. There should be start and end in recursive code. Otherwise it will go over 10 terrabyte memory at some point.
You can use a better data structure
i.e. each letter can be a byte (a-0, b-1 ... ). each word fragment can be in indexed also especially substrings - you should get away with significantly less memory (though a performance penalty)
Just list your recursive algorithm and sanitize variable names. If you are doing BFS type of traversal and keep all objects in memory, you will run out of mem. For example, in this case, replace it with DFS.
Edit 1:
You can speed up the algo by estimating how many items you will generate then allocate that much memory at once. As the algo progresses, fill up the allocated memory. This reduces fragmentation and reallocation & copy-on-full-array operations.
Nonetheless, after you are done operating on these generated words you should delete them from your datastructure so they can be GC-ed so you don't run out of mem.

Microsoft Visual C# 2008 Reducing number of loaded dlls

How can I reduce the number of loaded dlls When debugging in Visual C# 2008 Express Edition?
When running a visual C# project in the debugger I get an OutOfMemoryException due to fragmentation of 2GB virtual address space and we assume that the loaded dlls might be the reason for the fragmentation.
Brian Rasmussen, you made my day! :)
His proposal of "disabling the visual studio hosting process" solved the problem.
(for more information see history of question-development below)
Hi,
I need two big int-arrays to be loaded in memory with ~120 million elements (~470MB) each, and both in one Visual C# project.
When I'm trying to instantiate the 2nd Array I get an OutOfMemoryException.
I do have enough total free memory and after doing a web-search I thought my problem is that there aren't big enough contiguous free memory blocks on my system.
BUT! - when I'm instantiating only one of the arrays in one Visual C# instance and then open another Visual C# instance, the 2nd instance can instantiate an array of 470MB.
(Edit for clarification: In the paragraph above I meant running it in the debugger of Visual C#)
And the task-manager shows the corresponding memory usage-increase just as you would expect it.
So not enough contiguous memory blocks on the whole system isn't the problem. Then I tried running a compiled executable that instantiates both arrays which works also (memory usage 1GB)
Summary:
OutOfMemoryException in Visual C# using two big int arrays, but running the compiled exe works (mem usage 1GB) and two separate Visual C# instances are able to find two big enough contiguous memory blocks for my big arrays, but I need one Visual C# instance to be able to provide the memory.
Update:
First of all special thanks to nobugz and Brian Rasmussen, I think they are spot on with their prediction that "the Fragmentation of 2GB virtual address space of the process" is the problem.
Following their suggestions I used VMMap and listdlls for my short amateur-analysis and I get:
* 21 dlls listed for the "standalone"-exe. (the one that works and uses 1GB of memory.)
* 58 dlls listed for vshost.exe-version. (the version which is run when debugging and that throws the exception and only uses 500MB)
VMMap showed me the biggest free memory blocks for the debugger version to be 262,175,167,155,108MBs.
So VMMap says that there is no contiguous 500MB block and according to the info about free blocks I added ~9 smaller int-arrays which added up to more than 1,2GB memory usage and actually did work.
So from that I would say that we can call "fragmentation of 2GB virtual address space" guilty.
From the listdll-output I created a small spreadsheet with hex-numbers converted to decimal to check free areas between dlls and I did find big free space for the standalone version inbetween (21) dlls but not for the vshost-debugger-version (58 dlls). I'm not claiming that there can't be anything else between and I'm not really sure if what I'm doing there makes sense but it seems consistent with VMMaps analysis and it seems as if the dlls alone already fragment the memory for the debugger-version.
So perhaps a solution would be if I would be able to reduce the number of dlls used by the debugger.
1. Is that possible?
2. If yes how would I do that?
You are battling virtual memory address space fragmentation. A process on the 32-bit version of Windows has 2 gigabytes of memory available. That memory is shared by code as well as data. Chunks of code are the CLR and the JIT compiler as well as the ngen-ed framework assemblies. Chunks of data are the various heaps used by .NET, including the loader heap (static variables) and the garbage collected heaps. These chunks are located at various addresses in the memory map. The free memory is available for you to allocate your arrays.
Problem is, a large array requires a contiguous chunk of memory. The "holes" in the address space, between chunks of code and data, are not large enough to allow you to allocate such large arrays. The first hole is typically between 450 and 550 Megabytes, that's why your first array allocation succeeded. The next available hole is a lot smaller. Too small to fit another big array, you'll get OOM even though you've got an easy gigabyte of free memory left.
You can look at the virtual memory layout of your process with the SysInternals' VMMap utility. Okay for diagnostics, but it isn't going to solve your problem. There's only one real fix, moving to a 64-bit version of Windows. Perhaps better: rethink your algorithm so it doesn't require such large arrays.
3rd update: You can reduce the number of loaded DLLs significantly by disabling the Visual Studio hosting process (project properties, debug). Doing so will still allow you to debug the application, but it will get rid of a lot of DLLs and a number of helper threads as well.
On a small test project the number of loaded DLLs went from 69 to 34 when I disabled the hosting process. I also got rid of 10+ threads. All in all a significant reduction in memory usage which should also help reduce heap fragmentation.
Additional info on the hosting process: http://msdn.microsoft.com/en-us/library/ms242202.aspx
The reason you can load the second array in a new application is that each process gets a full 2 GB virtual address space. I.e. the OS will swap pages to allow each process to address the total amount of memory. When you try to allocate both arrays in one process the runtime must be able to allocate two contiguous chunks of the desired size. What are you storing in the array? If you store objects, you need additional space for each of the objects.
Remember an application doesn't actually request physical memory. Instead each application is given an address space from which they can allocate virtual memory. The OS then maps the virtual memory to physical memory. It is a rather complex process (Russinovich spends 100+ pages on how Windows handle memory in his Windows Internal book). For more details on how Windows does this please see http://blogs.technet.com/markrussinovich/archive/2008/11/17/3155406.aspx
Update: I've been pondering this question for a while and it does sound a bit odd. When you run the application through Visual Studio, you may see additional modules loaded depending on your configuration. On my setup I get a number of different DLLs loaded during debug due to profilers and TypeMock (which essentially does its magic via the profiler hooks).
Depending on the size and load address of these they may prevent the runtime from allocating contiguous memory. Having said that, I am still a bit surprised that you get an OOM after allocating just two of those big arrays as their combined size is less than 1 GB.
You can look at the loaded DLLs using the listdlls tools from SysInternals. It will show you load addresses and size. Alternatively, you can use WinDbg. The lm command shows loaded modules. If you want size as well, you need to specify the v option for verbose output. WinDbg will also allow you to examine the .NET heaps, which may help you to pinpoint why memory cannot be allocated.
2nd Update: If you're on Windows XP, you can try to rebase some of the loaded DLLs to free up more contiguous space. Vista and Windows 7 uses ASLR, so I am not sure you'll benefit from rebasing on those platforms.
This isn't an answer per se, but perhaps an alternative might work.
If the problem is indeed that you have fragmented memory, then perhaps one workaround would be to just use those holes, instead of trying to find a hole big enough for everything consecutively.
Here's a very simple BigArray class that doesn't add too much overhead (some overhead is introduced, especially in the constructor, in order to initialize the buckets).
The statistics for the array is:
Main executes in 404ms
static Program-constructor doesn't show up
The statistics for the class is:
Main took 473ms
static Program-constructor takes 837ms (initializing the buckets)
The class allocates a bunch of 8192-element arrays (13 bit indexes), which on 64-bit for reference types will fall below the LOB limit. If you're only going to use this for Int32, you can probably up this to 14 and probably even make it nongeneric, although I doubt it will improve performance much.
In the other direction, if you're afraid you're going to have a lot of holes smaller than the 8192-element arrays (64KB on 64-bit or 32KB on 32-bit), you can just reduce the bit-size for the bucket indexes through its constant. This will add more overhead to the constructor, and add more memory-overhead, since the outmost array will be bigger, but the performance should not be affected.
Here's the code:
using System;
using NUnit.Framework;
namespace ConsoleApplication5
{
class Program
{
// static int[] a = new int[100 * 1024 * 1024];
static BigArray<int> a = new BigArray<int>(100 * 1024 * 1024);
static void Main(string[] args)
{
int l = a.Length;
for (int index = 0; index < l; index++)
a[index] = index;
for (int index = 0; index < l; index++)
if (a[index] != index)
throw new InvalidOperationException();
}
}
[TestFixture]
public class BigArrayTests
{
[Test]
public void Constructor_ZeroLength_ThrowsArgumentOutOfRangeException()
{
Assert.Throws<ArgumentOutOfRangeException>(() =>
{
new BigArray<int>(0);
});
}
[Test]
public void Constructor_NegativeLength_ThrowsArgumentOutOfRangeException()
{
Assert.Throws<ArgumentOutOfRangeException>(() =>
{
new BigArray<int>(-1);
});
}
[Test]
public void Indexer_SetsAndRetrievesCorrectValues()
{
BigArray<int> array = new BigArray<int>(10001);
for (int index = 0; index < array.Length; index++)
array[index] = index;
for (int index = 0; index < array.Length; index++)
Assert.That(array[index], Is.EqualTo(index));
}
private const int PRIME_ARRAY_SIZE = 10007;
[Test]
public void Indexer_RetrieveElementJustPastEnd_ThrowsIndexOutOfRangeException()
{
BigArray<int> array = new BigArray<int>(PRIME_ARRAY_SIZE);
Assert.Throws<IndexOutOfRangeException>(() =>
{
array[PRIME_ARRAY_SIZE] = 0;
});
}
[Test]
public void Indexer_RetrieveElementJustBeforeStart_ThrowsIndexOutOfRangeException()
{
BigArray<int> array = new BigArray<int>(PRIME_ARRAY_SIZE);
Assert.Throws<IndexOutOfRangeException>(() =>
{
array[-1] = 0;
});
}
[Test]
public void Constructor_BoundarySizes_ProducesCorrectlySizedArrays()
{
for (int index = 1; index < 16384; index++)
{
BigArray<int> arr = new BigArray<int>(index);
Assert.That(arr.Length, Is.EqualTo(index));
arr[index - 1] = 42;
Assert.That(arr[index - 1], Is.EqualTo(42));
Assert.Throws<IndexOutOfRangeException>(() =>
{
arr[index] = 42;
});
}
}
}
public class BigArray<T>
{
const int BUCKET_INDEX_BITS = 13;
const int BUCKET_SIZE = 1 << BUCKET_INDEX_BITS;
const int BUCKET_INDEX_MASK = BUCKET_SIZE - 1;
private readonly T[][] _Buckets;
private readonly int _Length;
public BigArray(int length)
{
if (length < 1)
throw new ArgumentOutOfRangeException("length");
_Length = length;
int bucketCount = length >> BUCKET_INDEX_BITS;
bool lastBucketIsFull = true;
if ((length & BUCKET_INDEX_MASK) != 0)
{
bucketCount++;
lastBucketIsFull = false;
}
_Buckets = new T[bucketCount][];
for (int index = 0; index < bucketCount; index++)
{
if (index < bucketCount - 1 || lastBucketIsFull)
_Buckets[index] = new T[BUCKET_SIZE];
else
_Buckets[index] = new T[(length & BUCKET_INDEX_MASK)];
}
}
public int Length
{
get
{
return _Length;
}
}
public T this[int index]
{
get
{
return _Buckets[index >> BUCKET_INDEX_BITS][index & BUCKET_INDEX_MASK];
}
set
{
_Buckets[index >> BUCKET_INDEX_BITS][index & BUCKET_INDEX_MASK] = value;
}
}
}
}
I had a similar issue once and what I ended up doing was using a list instead of an array. When creating the lists I set the capacity to the required sizes and I defined both lists BEFORE I tried adding values to them. I'm not sure if you can use lists instead of arrays but it might be something to consider. In the end I had to run the executable on a 64 bit OS, because when I added the items to the list the overall memory usage went above 2GB, but at least I wa able to run and debug locally with a reduced set of data.
A question: Are all elements of your array occupied? If many of them contain some default value then maybe you could reduce memory consumption using an implementation of a sparse array that only allocates memory for the non-default values. Just a thought.
Each 32bit process has a 2GB address space (unless you ask the user to add /3GB in boot options), so if you can accept some performance drop-off, you can start a new process to get 2GB more in address space - well, a little less than that. The new process would be still fragmented with all the CLR dlls plus all the Win32 DLLs they use, so you can get rid of all address space fragmentation caused by CLR dlls by writing the new process in a native language e.g. C++. You can even move some of your calculation to the new process so you get more address space in your main app and less chatty with your main process.
You can communicate between your processes using any of the interprocess communication methods. You can find many IPC samples in the All-In-One Code Framework.
I have experience with two desktop applications and one moble application hitting out-of-memory limits. I understand the issues. I do not know your requirements, but I suggest moving your lookup arrays into SQL CE. Performance is good, you will be surprised, and SQL CE is in-process. With the last desktop application, I was able to reduce my memory footprint from 2.1GB to 720MB, which had the benefit of speeding up the application due to significantly reducing page faults. (Your problem is fragmentation of the AppDomain's memory, which you have no control over.)
Honestly, I do not think you will be satisfied with performance after squeezing these arrays into memory. Don't forget, excessive page faults has a significant impact on performance.
If you do go SqlServerCe, make sure to keep the connection open to improve performance. Also, single row lookups (scalar) may be slower than returning a result set.
If you really want to know what is going on with memory, use CLR Profiler. VMMap is not going to help. The OS does not allocate memory to your application. The Framework does by grabbing large chucks of OS memory for itself (caching the memory) then allocating, when needed, pieces of this memory to applications.
CLR Profiler for the .NET Framework 2.0 at
https://github.com/MicrosoftArchive/clrprofiler

C# byte[] substring? (design)

I'm downloading some files asynchronously into a large byte array, and I have a callback that fires off periodically whenever some data is added to that array. If I want to give developers the ability to use the last chunk of data that was added to array, then... well how would I do that? In C++ I could give them a pointer to somewhere in the middle, and then perhaps tell them the number of bytes that were added in the last operation so they at least know the chunk they should be looking at... I don't really want to give them a 2nd copy of that data, that's just wasteful.
I'm just thinking if people want to process this data before the file has completed downloading. Would anyone actually want to do that? Or is it a useless feature anyway? I already have a callback for when the buffer (entire byte array) is full, and then they can dump the whole thing without worrying about start and end points...
.NET has a struct that does exactly what you want:
System.ArraySegment.
In any case, it's easy to implement it yourself too - just make a constructor that takes a base array, an offset, and a length. Then implement an indexer that offsets indexes behind the scenes, so your ArraySegment can be seamlessly used in the place of an array.
You can't give them a pointer into the array, but you could give them the array and start index and length of the new data.
But I have to wonder what someone would use this for. Is this a known need? or are you just guessing that someone might want this someday. And If so, is there any reason why you couldn't wait to add the capability once somone actually needs it?
Whether this is needed or not depends on whether you can afford to accumulate all the data from a file before processing it, or whether you need to provide a streaming mode where you process each chunk as it arrives. This depends on two things: how much data there is (you probably would not want to accumulate a multi-gigabyte file), and how long it takes the file to completely arrive (if you are getting the data over a slow link you might not want your client to wait till it had all arrived). So it is a reasonable feature to add, depending on how the library is to be used. Streaming mode is usually a desirable attribute, so I would vote for implementing the feature. However, the idea of putting the data into an array seems wrong, because it fundamentally implies a non-streaming design, and because it requires an additional copy. What you could do instead is to keep each chunk of arriving data as a discrete piece. These could be stored in a container for which adding at the end and removing from the front is efficient.
Copying a chunk of a byte array may seem "wasteful," but then again, object-oriented languages like C# tend to be a little more wasteful than procedural languages anyway. A few extra CPU cycles and a little extra memory consumption can greatly reduce complexity and increase flexibility in the development process. In fact, copying bytes to a new location in memory to me sounds like good design, as opposed to the pointer approach which will give other classes access to private data.
But if you do want to use pointers, C# does support them. Here is a decent-looking tutorial. The author is correct when he states, "...pointers are only really needed in C# where execution speed is highly important."
I agree with the OP: sometimes you just plain need to pay some attention to efficiency. I don't think the example of providing an API is the best, because that certainly calls for leaning toward safety and simplicity over efficiency.
However, a simple example is when processing large numbers of huge binary files that have zillions of records in them, such as when writing a parser. Without using a mechanism such as System.ArraySegment, the parser becomes a big memory hog, and is greatly slowed down by creating a zillion new data elements, copying all the memory over, and fragmenting the heck out of the heap. It's a very real performance issue. I write these kinds of parsers all the time for telecommunications stuff which generate millions of records per day in each of several categories from each of many switches with variable length binary structures that need to be parsed into databases.
Using the System.ArraySegment mechanism versus creating new structure copies for each record tremendously speeds up the parsing, and greatly reduces the peak memory consumption of the parser. These are very real advantages because the servers run multiple parsers, run them frequently, and speed and memory conservation = very real cost savings in not having to have so many processors dedicated to the parsing.
System.Array segment is very easy to use. Here's a simple example of providing a base way to track the individual records in a typical big binary file full of records with a fixed length header and a variable length record size (obvious exception control deleted):
public struct MyRecord
{
ArraySegment<byte> header;
ArraySegment<byte> data;
}
public class Parser
{
const int HEADER_SIZE = 10;
const int HDR_OFS_REC_TYPE = 0;
const int HDR_OFS_REC_LEN = 4;
byte[] m_fileData;
List<MyRecord> records = new List<MyRecord>();
bool Parse(FileStream fs)
{
int fileLen = (int)fs.FileLength;
m_fileData = new byte[fileLen];
fs.Read(m_fileData, 0, fileLen);
fs.Close();
fs.Dispose();
int offset = 0;
while (offset + HEADER_SIZE < fileLen)
{
int recType = (int)m_fileData[offset];
switch (recType) { /*puke if not a recognized type*/ }
int varDataLen = ((int)m_fileData[offset + HDR_OFS_REC_LEN]) * 256
+ (int)m_fileData[offset + HDR_OFS_REC_LEN + 1];
if (offset + varDataLen > fileLen) { /*puke as file has odd bytes at end*/}
MyRecord rec = new MyRecord();
rec.header = new ArraySegment(m_fileData, offset, HEADER_SIZE);
rec.data = new ArraySegment(m_fileData, offset + HEADER_SIZE,
varDataLen);
records.Add(rec);
offset += HEADER_SIZE + varDataLen;
}
}
}
The above example gives you a list with ArraySegments for each record in the file while leaving all the actual data in place in one big array per file. The only overhead are the two array segments in the MyRecord struct per record. When processing the records, you have the MyRecord.header.Array and MyRecord.data.Array properties which allow you to operate on the elements in each record as if they were their own byte[] copies.
I think you shouldn't bother.
Why on earth would anyone want to use it?
That sounds like you want an event.
public class ArrayChangedEventArgs : EventArgs {
public (byte[] array, int start, int length) {
Array = array;
Start = start;
Length = length;
}
public byte[] Array { get; private set; }
public int Start { get; private set; }
public int Length { get; private set; }
}
// ...
// and in your class:
public event EventHandler<ArrayChangedEventArgs> ArrayChanged;
protected virtual void OnArrayChanged(ArrayChangedEventArgs e)
{
// using a temporary variable avoids a common potential multithreading issue
// where the multicast delegate changes midstream.
// Best practice is to grab a copy first, then test for null
EventHandler<ArrayChangedEventArgs> handler = ArrayChanged;
if (handler != null)
{
handler(this, e);
}
}
// finally, your code that downloads a chunk just needs to call OnArrayChanged()
// with the appropriate args
Clients hook into the event and get called when things change. This is what most client code in .NET expects to have in an API ("call me when something happens"). They can hook into the code with something as simple as:
yourDownloader.ArrayChanged += (sender, e) =>
Console.WriteLine(String.Format("Just downloaded {0} byte{1} at position {2}.",
e.Length, e.Length == 1 ? "" : "s", e.Start));

Categories