I have 10 threads writing thousands of small buffers (16-30 bytes each) to a huge file in random positions. Some of the threads throw OutOfMemoryException on FileStream.Write() opreation.
What is causing the OutOfMemoryException ? What to look for?
I'm using the FileStream like this (for every written item - this code runs from 10 different threads):
using (FileStream fs = new FileStream(path, FileMode.OpenOrCreate, FileAccess.Write, FileShare.ReadWrite, BigBufferSizeInBytes, FileOptions.SequentialScan))
{
...
fs.Write();
}
I suspect that all the buffers allocated inside the FileStream don't get released in time by the GC. What I don't understand is why the CLR, instead of throwing, doesn't just run a GC cycle and free up all the unused buffers?
If ten threads are opening files as your code shows, then you have a maximum of ten undisposed FileStream objects at any one time. Yes, FileStream does have an internal buffer, the size of which you specify with "BigBufferSizeInBytes" in your code. Could you please disclose the exact value? If this is big enough (e.g. ~100MB) then it could well be the source of the problem.
By default (i.e. when you don't specify a number upon construction), this buffer is 4kB and that is usually fine for most applications. In general, if you really care about disk write performance, then you might increase this one to a couple of 100kB but not more.
However, for your specific application doing so wouldn't make much sense, as said buffer will never contain more than the 16-30 bytes you write into it before you Dispose() the FileStream object.
To answer your question, an OutOfMemoryException is thrown only when the requested memory can't be allocated after a GC has run. Again, if the buffer is really big then the system could have plenty of memory left, just not a contiguous chunk. This is because the large object heap is never compacted.
I've reminded people about this one a few times, but the Large object heap can throw that exception fairly subtially, when seemingly you have pleanty of available memory or the application is running OK.
I've run into this issue fairly frequently when doing almost exactally what your discribing here.
You need to post more of your code to answer this question properly. However, I'm guessing it could also be related to a potential Halloween problem (Spooky Dooky) .
Your buffer to which you are reading from may also be the problem (again large object heap related) also again, you need to put up more details about what's going on there in the loop. I've just nailed out the last bug I had which is virtually identical (I am performing many parallel hash update's which all require independent state to be maintained across read's of the input file)....
OOP! just scrolled over and noticed "BigBufferSizeInBytes", I'm leaning towards Large Object Heap again...
If I were you, (and this is exceedingly difficult due to the lack of context), I would provide a small dispatch "mbuf", where you copied in and out instead of allowing all of your disperate thread's to individually read across your large backing array... (i.e. it's hard to not cause insadential allocations with very subtile code syntax).
Buffers aren't generally allocated inside the FileStream. Perhaps the problem is the line "writing thousands of small buffers" - do you really mean that? Normally you re-use a buffer many, many, many times (i.e. on different calls to Read/Write).
Also - is this a single file? A single FileStream is not guaranteed to be thread safe... so if you aren't doing synchronization, expect chaos.
It's possible that these limitations arise from the underlying OS, and that the .NET Framework is powerless to overcome these kind of limitations.
What I cannot deduce from your code sample is whether you open up a lot of these FileStream objects at the same time, or open them really fast in sequence. Your use of the 'using' keyword will make sure that the files are closed after the fs.Write() call. There's no GC cycle required to close the file.
The FileStream class is really geared towards sequential read/write access to files. If you need to quickly write to random locations in a big file, you might want to take a look at using virtual file mapping.
Update: It seems that virtual file mapping will not be officially supported in .NET until 4.0. You may want to take a look at third party implementations for this functionality.
Dave
I'm experiencing something similar and wondered if you ever pinned down the root of your problem?
My code does quite a lot of copying between files, passing quite a few megs between different byte files. I've noticed that whilst the process memory usage stays within a reasonable range, the system memory allocation shoots up way too high during the copying - much more than is being used by my process.
I've tracked the issue down to the FileStream.Write() call - when this line is taken out, memory usage seems to go as expected. My BigBufferSizeInBytes is the default (4k), and I can't see anywhere where these could be collecting...
Anything you discovered whilst looking at your problem would be gratefully received!
Related
I am attempting to collect data from a USB port using D3XX.NET from FTDI. The data is collected and then sent to a fast fourier transform for plotting a spectrum. This works fine, even if you miss some data. You can't tell. However, if you then want to send this data to an audio output component, you will notice data missing. This is where my problem appears to be.
The data is collected and then sent to the audio device. All packets are making it within the time span needed. However, the audio is dropping data it appears. Here is a picture of what a sine wave looks like at the output of the audio:
You can see that some data is missing at the beginning and it seems a whole cycle is missing near the end. This is just one example, it changes all the time. Sometimes it appears that the data is just not there.
I have gone through the whole processing chain and i'm pretty sure the data packets for the sound are making it.
I have since used JetBrains performance profiler. What I have found is the following: The ReadPipe method takes 8.5ms which is exactly what you expect the read to take. So far so good. Once the ReadPipe command is finished, you have 0.5ms to do another ReadPipe or you will loose some data. Looking at the profiler output I see this:
The ReadPipe takes 8.5ms and then there is this entry for garbage collection which on average takes 1.6ms. If this is indeed occurring even occasionally, then I have lost some data.
So here is the code: It is a backgroundworker:
private void CollectData(object sender, DoWorkEventArgs e)
{
while (keepGoing)
{
ftStatus = d3xxDevice.ReadPipe(0x84, iqBuffer, 65536, ref bytesTransferred); //read IQ data - will get 1024 pairs - 2 bytes per value
_waitForData.Set();
}
}
The waithandle signifies to the other thread that data is available.
So is the GC the cause of the lost data? And if so, how can I avoid this?
Thanks!
If you can confirm that you aren't running out of memory, you could try setting GCSettings.LatencyMode to GCLatencyMode.SustainedLowLatency. This will prevent certain blocking garbage collections from occurring, unless you're low on memory. Check out the docs on latency modes for more details and restrictions.
If garbage collection is still too disruptive for your use case and you're using .NET 4.6 or later, you may be able to try calling GC.TryStartNoGCRegion. This method will attempt to reserve enough memory to allocate up to the amount specified, and block GC until you've exhausted the reservation. If your memory usage is fairly consistent, you might be able to get away with passing in a large enough value to accommodate your application's usage, but there's no guarantee that the call will succeed.
If you're on an older version of .NET that doesn't support either of these, you're probably out of luck. If this is a GUI application (which it looks like, judging by the event handler), you don't have enough control over allocations.
Another thing to consider is that C# isn't really the right tool for applications that can't tolerate disruptions. If you're familiar with writing native code, you could perform your time sensitive work on an un-managed thread; as far as I'm aware, this is the only reliable solution, especially if your application is going to run on end-user machines.
You need to be friendlier to your garbage collector and not allocate so much.
In short, if your GC is stalling your threads, you have a garbage problem. The GC will pause all threads to do a clean up and there is nothing you can really do apart form better management of what garbage you create.
If you have arrays, don't keep creating them constantly, instead reuse them (so on and so forth). Use lighter weight structures, use tools which allow you to reduce allocations like Span<T> and Memory<T>. Consider using less awaits if your code is heavily async, and don't put them in loops. Pass by ref and use ref locals and such, also stay away from large unmanaged data blocks if you can.
Also, it might be beneficial to call GC.Collect in any down time when it wont matter, though better design will likely be more beneficial.
In my Azure role running C# code inside a 64 bit process I want to download a ZIP file and unpack it as fast as possible. I figured I could do the following: create a MemoryStream instance, download to that MemoryStream, then pass the stream to some ZIP handling library for unpacking and once unpacking is done discard the stream. This way I would get rid of write-read-write sequence that unnecessarily performs a lot of I/O.
However I've read that MemoryStream is backed by an array and with half gigabytes that array will definitely be considered a "large object" and will be allocated in a large object heap that doesn't compact on garbage collection. Which makes me worried that maybe this usage of MemoryStream will lead to fragmenting the process memory and negative long term effects.
Will this likely have any long-term negative effects on my process?
The answer is in the accepted answer to the question you linked to. Thanks for providing the reference.
The real problem is assuming that a program should be allowed to consume all virtual memory at any time. A problem that otherwise disappears completely by just running the code on a 64-bit operating system.
I would say if this is a 64 bit process you have nothing to worry about.
The hole that is created only leads to fragmentation of the virtual address space of the LOH. Fragmentation here isn't a big problem for you. In a 64 bit process any whole pages wasted due to fragmentation will just become unused and the physical memory they were mapped to becomes available again to map a new page. Very few partial pages will be wasted because these are large allocations. And locality of reference (the other advantage of defragmentation) is mostly preserved, again because these are large allocations.
I have a program that processes high volumes of data, and can cache much of it for reuse with subsequent records in memory. The more I cache, the faster it works. But if I cache too much, boom, start over, and that takes a lot longer!
I haven't been too successful trying to do anything after the exception occurs - I can't get enough memory to do anything.
Also I've tried allocating a huge object, then de-allocating it right away, with inconsistent results. Maybe I'm doing something wrong?
Anyway, what I'm stuck with is just setting a hardcoded limit on the # of cached objects that, from experience, seems to be low enough. Any better Ideas? thanks.
edit after answer
The following code seems to be doing exactly what I want:
Loop
Dim memFailPoint As MemoryFailPoint = Nothing
Try
memFailPoint = New MemoryFailPoint( mysize) ''// size of MB of several objects I'm about to add to cache
memFailPoint.Dispose()
Catch ex As InsufficientMemoryException
''// dump the oldest items here
End Try
''// do work
next loop.
I need to test if it is slowing things down in this arrangement or not, but I can see the yellow line in Task Manager looking like a very healthy sawtooth pattern with a consistent top - yay!!
You can use MemoryFailPoint to check for available memory before allocating.
You may need to think about your release strategy for the cached objects. There is no possible way you can hold all of them forever so you need to come up with an expiration timeframe and have older cached objects removed from memory. It should be possible to find out how much memory is left and use that as part of your strategy but one thing is certain, old objects must go.
If you implement your cache with WeakRerefences (http://msdn.microsoft.com/en-us/library/system.weakreference.aspx) that will leave the cached objects still eligible for garbage collection in situations where you might otherwise throw an OutOfMemory exception.
This is an alternative to a fixed sized cache, but potentially has the problem to be overly aggressive in clearing out the cache when a GC does occur.
You might consider taking a hybrid approach, where there are a (tunable) fixed number of non-weakreferences in the cahce but you let it grow additionally with weakreferences. Or this may be overkill.
There are a number of metrics you can use to keep track of how much memory your process is using:
GC.GetTotalMemory
Environment.WorkingSet (This one isn't useful, my bad)
The native GlobalMemoryStatusEx function
There are also various properties on the Process class
The trouble is that there isn't really a reliable way of telling from these values alone whether or not a given memory allocation will fail as although there may be sufficient space in the address space for a given memory allocation memory fragmentation means that the space may not be continuous and so the allocation may still fail.
You can however use these values as an indication of how much memory the process is using and therefore whether or not you should think about removing objects from your cache.
Update: Its also important to make sure that you understand the distinction between virtual memory and physical memory - unless your page file is disabled (very unlikely) the cause of the OutOfMemoryException will be caused by a lack / fragmentation of the virtual address space.
If you're only using managed resources you can use the GC.GetTotalMemory method and compare the results with the maximum allowed memory for a process on your architecture.
A more advanced solution (I think this is how SQL Server manages to actually adapt to the available memory) is to use the CLR Hosting APIs:
the interface allows the CLR to inform the host of the consequences of
failing a particular allocation
which will mean actually removing some objects from the cache and trying again.
Anyway I think this is probably an overkill for almost all applications unless you really need an amazing performance.
The simple answer... By knowing what your memory limit is.
The closer you are to reach that limit the more you ARE ABOUT to get an OutOfMemoryException.
The more elaborated answer.... Unless you yourself writes a mechanism to do that kind of thing, programming languages/systems do not work that way; as far as I know they cannot inform you ahead or in advance you are exceeding limits BUT, they gladly inform you when the problem has occurred, and that usually happens through exceptions which you are supposed to write code to handle.
Memory is a resource that you can use; it has limits and it also has some conventions and rules for you to follow to make good use of that resource.
I believe what you are doing of setting a good limit, hard coded or configurable, seems to be your best bet.
i have a large string (e.g. 20MB).
i am now parsing this string. The problem is that strings in C# are immutable; this means that once i've created a substring, and looked at it, the memory is wasted.
Because of all the processing, memory is getting clogged up with String objects that i no longer used, need or reference; but it takes the garbage collector too long to free them.
So the application runs out of memory.
i could use the poorly performing club approach, and sprinkle a few thousand calls to:
GC.Collect();
everywhere, but that's not really solving the issue.
i know StringBuilder exists when creating a large string.
i know TextReader exists to read a String into a char array.
i need to somehow "reuse" a string, making it no longer immutable, so that i don't needlessly allocate gigabytes of memory when 1k will do.
If your application is dying, that's likely to be because you still have references to strings - not because the garbage collector is just failing to clean them up. I have seen it fail like that, but it's pretty unlikely. Have you used a profiler to check that you really do have a lot of strings in memory at a time?
The long and the short of it is that you can't reuse a string to store different data - it just can't be done. You can write your own equivalent if you like - but the chances of doing that efficiently and correctly are pretty slim. Now if you could give more information about what you're doing, we may be able to suggest alternative approaches that don't use so much memory.
This question is almost 10 years old. These days, please look at ReadOnlySpan - instantiate one from the string using AsSpan() method. Then you can apply index operators to get slices as spans without allocating any new strings.
I would suggest, considering the fact, that you can not reuse the strings in C#, use Memory-Mapped Files. You simply save string on a disk and process it with performance/memory-consuption excelent relationship via mapped file like a stream. In this case you reuse the same file, the same stream and operate only on small possible portion of the data like a string, that you need in that precise moment, and after immediately throw it away.
This solution is strictly depends on your project requieremnts, but I think one of the solutions you may seriously consider, as especially memory consumption will go down dramatically, but you will "pay" something in terms of performance.
Do you have some sample code to test whether possible solutions would work well?
In general though, any object that is bigger than 85KB is going to be allocated onto the Large Object Heap, which will probably be garbage collected less often.
Also, if you're really pushing the CPU hard, the garbage collector will likely perform its work less often, trying to stay out of your way.
I have an out of memory exception using C# when reading in a massive file
I need to change the code but for the time being can I increase the heap size (like I would in Java) as a shaort term fix?
.Net does that automatically.
Looks like you have reached the limit of the memory one .Net process can use for its objects (on 32 bit machine this is 2 standard or 3GB by using the /3GB boot switch. Credits to Leppie & Eric Lippert for the info).
Rethink your algorithm, or perhaps a change to a 64 bit machine might help.
No, this is not possible. This problem might occur because you're running on a 32-bit OS and memory is too fragmented. Try not to load the whole file into memory (for instance, by processing line by line) or, when you really need to load it completely, by loading it in multiple, smaller parts.
No you can't see my answer here: Is there any way to pre-allocate the heap in the .NET runtime, like -Xmx/-Xms in Java?
For reading large files it is usually preferable to stream them from disk, reading them in chunks and dealing with them a piece at a time instead of loading the whole thing up front.
As others have already pointed out, this is not possible. The .NET runtime handles heap allocations on behalf of the application.
In my experience .NET applications commonly suffer from OOM when there should be plenty of memory available (or at least, so it appears). The reason for this is usually the use of huge collections such as arrays, List (which uses an array to store its data) or similar.
The problem is these types will sometimes create peaks in memory use. If these peak requests cannot be honored an OOM exception is throw. E.g. when List needs to increase its capacity it does so by allocating a new array of double the current size and then it copies all the references/values from one array to the other. Similarly operations such as ToArray makes a new copy of the array. I've also seen similar problems on big LINQ operations.
Each array is stored as contiguous memory, so to avoid OOM the runtime must be able to obtain one big chunk of memory. As the address space of the process may be fragmented due to both DLL loading and general use for the heap, this is not always possible in which case an OOM exception is thrown.
What sort of file are you dealing with ?
You might be better off using a StreamReader and yield returning the ReadLine result, if it's textual.
Sure, you'll be keeping a file-pointer around, but the worst case scenario is massively reduced.
There are similar methods for Binary files, if you're uploading a file to SQL for example, you can read a byte[] and use the Sql Pointer mechanics to write the buffer to the end of a blob.