I have a class that inherits from MemoryStream in order to provide some buffering. The class works exactly as expected but every now and then I get an InvalidOperationException during a Read with the error message being
Collection was modified; enumeration operation may not execute.
My code is below and the only line that enumerates a collection would seem to be:
m_buffer = m_buffer.Skip(count).ToList();
However I have that and all other operations that can modify the m_buffer object within locks so I'm mystified as to how a Write operation could interfere with a Read to cause that exception?
public class MyMemoryStream : MemoryStream
{
private ManualResetEvent m_dataReady = new ManualResetEvent(false);
private List<byte> m_buffer = new List<byte>();
public override void Write(byte[] buffer, int offset, int count)
{
lock (m_buffer)
{
m_buffer.AddRange(buffer.ToList().Skip(offset).Take(count));
}
m_dataReady.Set();
}
public override int Read(byte[] buffer, int offset, int count)
{
if (m_buffer.Count == 0)
{
// Block until the stream has some more data.
m_dataReady.Reset();
m_dataReady.WaitOne();
}
lock (m_buffer)
{
if (m_buffer.Count >= count)
{
// More bytes available than were requested.
Array.Copy(m_buffer.ToArray(), 0, buffer, offset, count);
m_buffer = m_buffer.Skip(count).ToList();
return count;
}
else
{
int length = m_buffer.Count;
Array.Copy(m_buffer.ToArray(), 0, buffer, offset, length);
m_buffer.Clear();
return length;
}
}
}
}
I cannot say exactly what's going wrong from the code you posted, but a bit of an oddity is that you lock on m_buffer, but replace the buffer, so that the collection locked is not always the collection that is being read and modified.
It is good practice to use a dedicated private readonly object for the locking:
private readonly object locker = new object();
// ...
lock(locker)
{
// ...
}
You have at least one data race there: on the Read method , if you're pre-empted after the if(m_buffer.Count == 0) block and before the lock, Count can be 0 again. You should check the count inside the lock, and use Monitor.Wait, Monitor.Pulse and/or Monitor.PulseAll for the wait/signal coordination, like this:
// On Write
lock(m_buffer)
{
// ...
Monitor.PulseAll();
}
// On Read
lock(m_buffer)
{
while(m_buffer.Count == 0)
Monitor.Wait(m_buffer);
// ...
You have to protect all accesses to m_buffer, and calling m_buffer.Count is not special in that regard.
Do you modify the content of buffer in another thread somewhere, I suspect that may be the enumeration giving error rather than m_buffer.
Related
I have multiple threads asking for data that have to be loaded over network.
In order to have less network traffic and faster responses, I'd like to cache data, which are often requested. I also want to limit the Cache's data size.
My class looks something like this:
public class DataProvider
{
private ConcurrentDictionary<string, byte[]> dataCache;
private int dataCacheSize;
private int maxDataCacheSize;
private object dataCacheSizeLockObj = new object();
public DataProvider(int maxCacheSize)
{
maxDataCacheSize = maxCacheSize;
dataCache = new ConcurrentDictionary<string,byte[]>();
}
public byte[] GetData(string key)
{
byte[] retVal;
if (dataCache.ContainsKey(key))
{
retVal = dataCache[key];
}
else
{
retVal = ... // get data from somewhere else
if (dataCacheSize + retVal.Length <= maxDataCacheSize)
{
lock (dataCacheSizeLockObj)
{
dataCacheSize += retVal.Length;
}
dataCache[key] = retVal;
}
}
return retVal;
}
}
My problem is: how do I make sure, that dataCacheSize always has the correct value? If two threads request the same uncached data at the same time, they will both write their data to the cache, which is no problem, because the data is the same and the second thread will just overwrite the cached data with the same data. But how do I know, if it was overwritten or not to avoid counting its size twice?
It could also happen, that two threads are adding data at the same time resulting in a dataCache size larger than allowed...
Is there an elegant way to accomplish this task without adding complex locking mechanisms?
Instead of trying to "roll you own" caching, take a look at System.Runtime.Caching.MemoryCache. See comment above.
Since you update dataCacheSize inside lock, you can just check here if it would remain correct:
if (dataCacheSize + retVal.Length <= maxDataCacheSize)
{
lock (dataCacheSizeLockObj)
{
if (dataCacheSize + retVal.Length > maxDataCacheSize)
{
return retVal;
}
dataCacheSize += retVal.Length;
}
byte[] oldVal = dataCache.GetOrAdd(key, retVal);
if (oldVal != retVal)
{
// retVal wasn't actually added
lock (dataCacheSizeLockObj)
{
dataCacheSize -= retVal.Length;
}
}
}
I'm trying to take all items in one fell swoop from a ConcurrentBag. Since there's nothing like TryEmpty on the collection, I've resorted to using Interlocked.Exchange in the same fashion as described here: How to remove all Items from ConcurrentBag?
My code looks like this:
private ConcurrentBag<Foo> _allFoos; //Initialized in constructor.
public bool LotsOfThreadsAccessingThisMethod(Foo toInsert)
{
this._allFoos.Add(toInsert);
return true;
}
public void SingleThreadProcessingLoopAsALongRunningTask(object state)
{
var token = (CancellationToken) state;
var workingSet = new List<Foo>();
while (!token.IsCancellationRequested)
{
if (!workingSet.Any())
{
workingSet = Interlocked.Exchange(ref this._allFoos, new ConcurrentBag<Foo>).ToList();
}
var processingCount = (int)Math.Min(workingSet.Count, TRANSACTION_LIMIT);
if (processingCount > 0)
{
using (var ctx = new MyEntityFrameworkContext())
{
ctx.BulkInsert(workingSet.Take(processingCount));
}
workingSet.RemoveRange(0, processingCount);
}
}
}
The problem is that this sometimes misses items that are added to the list. I've written a test application that feeds data to my ConcurrentBag.Add method and verified that it is sending all of the data. When I set a breakpoint on the Add call and check the count of the ConcurrentBag after, it's zero. The item just isn't being added.
I'm fairly positive that it's because the Interlocked.Exchange call doesn't use the internal locking mechanism of the ConcurrentBag so it's losing data somewhere in the swap, but I have no knowledge of what's actually happening.
How can I just grab all the items out of the ConcurrentBag at one time without resorting to my own locking mechanism? And why does Add ignore the item?
I think taking all the items from the ConcurentBag is not needed. You can achieve exactly the same behavior you are trying to implement simply by changing the processing logic as follows (no need for own synchronization or interlocked swaps):
public void SingleThreadProcessingLoopAsALongRunningTask(object state)
{
var token = (CancellationToken)state;
var buffer = new List<Foo>(TRANSACTION_LIMIT);
while (!token.IsCancellationRequested)
{
Foo item;
if (!this._allFoos.TryTake(out item))
{
if (buffer.Count == 0) continue;
}
else
{
buffer.Add(item);
if (buffer.Count < TRANSACTION_LIMIT) continue;
}
using (var ctx = new MyEntityFrameworkContext())
{
ctx.BulkInsert(buffer);
}
buffer.Clear();
}
}
I have a simple performance test, that indirectly calls WriteAsync many times. It performs reasonably as long as WriteAsync is implemented as shown below. However, when I inline WriteByte into WriteAsync, performance degrades by about factor 7.
(To be clear: The only change that I make is replacing the statement containing the WriteByte call with the body of WriteByte.)
Can anybody explain why this happens? I've had a look at the differences in the generated code with Reflector, but nothing struck me as so totally different as that it would explain the huge perf hit.
public sealed override async Task WriteAsync(
byte[] buffer, int offset, int count, CancellationToken cancellationToken)
{
var writeBuffer = this.WriteBuffer;
var pastEnd = offset + count;
while ((offset < pastEnd) && ((writeBuffer.Count < writeBuffer.Capacity) ||
await writeBuffer.FlushAsync(cancellationToken)))
{
offset = WriteByte(buffer, offset, writeBuffer);
}
this.TotalCount += count;
}
private int WriteByte(byte[] buffer, int offset, WriteBuffer writeBuffer)
{
var currentByte = buffer[offset];
if (this.previousWasEscapeByte)
{
this.previousWasEscapeByte = false;
this.crc = Crc.AddCrcCcitt(this.crc, currentByte);
currentByte = (byte)(currentByte ^ Frame.EscapeXor);
++offset;
}
else
{
if (currentByte < Frame.InvalidStart)
{
this.crc = Crc.AddCrcCcitt(this.crc, currentByte);
++offset;
}
else
{
currentByte = Frame.EscapeByte;
this.previousWasEscapeByte = true;
}
}
writeBuffer[writeBuffer.Count++] = currentByte;
return offset;
}
async methods are rewritten by the compiler into a giant state machine, very similar to methods using yield return. All of your locals become fields in the state machine's class. The compiler currently doesn't try to optimize this at all, so any optimization is up to the coder.
Every local which would have sat happily in a register is now being read from and written to memory. Refactoring synchronous code out of async methods and into a sync method is a very valid performance optimization -- you're just doing the reverse!
So I think this might be a fundamental flaw in my approach to threading and incrementing a global counter, but here is my problem. I have a collection of file names from a DB that I iterate through, and for each file name I search it within a top level folder. Each iteration I thread the search and increment a counter when it completes so I can determine when it finishes. The problem is the counter never ever ever gets as high as the total file count, it comes very close sometimes, but never reaches what I would expect.
public class FindRogeRecords
{
private delegate void FindFileCaller(string folder, int uploadedID, string filename);
private Dictionary<int, string> _files;
private List<int> _uploadedIDs;
private int _filesProcessedCounter;
private bool _completed;
public void Run()
{
_files = GetFilesFromDB(); //returns a dictionary of id's and filenames
FindRogueRecords();
}
private void FindRogueRecords()
{
_uploadedIDs = new List<int>();
foreach (KeyValuePair<int, string> pair in _files)
{
var caller = new FindFileCaller(FindFile);
var result = caller.BeginInvoke(txtSource.Text, pair.Key, pair.Value, new AsyncCallback(FindFile_Completed), null);
}
}
private void FindFile(string documentsFolder, int uploadedID, string filename)
{
var docFolders = AppSettings.DocumentFolders;
foreach (string folder in docFolders)
{
string path = Path.Combine(documentsFolder, folder);
var directory = new DirectoryInfo(path);
var files = directory.GetFiles(filename, SearchOption.AllDirectories);
if (files != null && files.Length > 0) return;
}
lock (_uploadedIDs) _uploadedIDs.Add(uploadedID);
}
private void FindFile_Completed(System.IAsyncResult ar)
{
var result = (AsyncResult)ar;
var caller = (FindFileCaller)result.AsyncDelegate;
_filesProcessedCounter++;
_completed = (_files.Count == _filesProcessedCounter); //this never evaluates to true
}
}
You are accessing _filesProcessedCounter variable from multiple threads without any synchronization (even simple lock()) so this caused Race Condition in your code.
To increment an integer variable you can use Interlocked.Increment() which is thread-safe but considering that following line of code requires synchronization as well:
_completed = (_files.Count == _filesProcessedCounter);
I would suggest using lock object to cover both lines and keep code much clear:
// Add this field to a class fields list
private static readonly object counterLock = new object();
// wrap access to shared variables by lock as shown below
lock (counterLock)
{
_filesProcessedCounter++;
_completed = (_files.Count == _filesProcessedCounter);
}
This is because you have a race condition in your program. Since the ++ operator is equal to the following code
c = c + 1; // c++;
your can see, that it is not atomic. A thread that increments the value stores c in a register, increments the value by 1 and then writes it back. When the thread now gets pushed aside because another thread gets the CPU the execution of c = c + 1 may not be finished. A second thread performs the same action, reads the old c and increments its value by one. When the first thread gains the CPU again he would overwrite the data written by the second thread.
You can use locks to make sure, that only one thread can access the variable at one time, or use atomic functions like
Interlocked.Increment(ref c);
to increase c in a threadsave manner;
Use Interlocked.Increment(ref _filesProcessedCounter).
i think it should be:
if (files != null && files.Length > 0) continue;
and not return (you will not increase your counter...)
Convert
_filesProcessedCounter++;
_completed = (_files.Count == _filesProcessedCounter);
to
_completed = (_files.Count == Interlocked.Increment(ref _filesProcessedCounter));
I have an extremely large 2D bytearray in memory,
byte MyBA = new byte[int.MaxValue][10];
Is there any way (probably unsafe) that I can fool C# into thinking this is one huge continuous byte array? I want to do this such that I can pass it to a MemoryStream and then a BinaryReader.
MyReader = new BinaryReader(MemoryStream(*MyBA)) //Syntax obviously made-up here
I do not believe .NET provides this, but it should be fairly easy to implement your own implementation of System.IO.Stream, that seamlessly switches backing array. Here are the (untested) basics:
public class MultiArrayMemoryStream: System.IO.Stream
{
byte[][] _arrays;
long _position;
int _arrayNumber;
int _posInArray;
public MultiArrayMemoryStream(byte[][] arrays){
_arrays = arrays;
_position = 0;
_arrayNumber = 0;
_posInArray = 0;
}
public override int Read(byte[] buffer, int offset, int count){
int read = 0;
while(read<count){
if(_arrayNumber>=_arrays.Length){
return read;
}
if(count-read <= _arrays[_arrayNumber].Length - _posInArray){
Buffer.BlockCopy(_arrays[_arrayNumber], _posInArray, buffer, offset+read, count-read);
_posInArray+=count-read;
_position+=count-read;
read=count;
}else{
Buffer.BlockCopy(_arrays[_arrayNumber], _posInArray, buffer, offset+read, _arrays[_arrayNumber].Length - _posInArray);
read+=_arrays[_arrayNumber].Length - _posInArray;
_position+=_arrays[_arrayNumber].Length - _posInArray;
_arrayNumber++;
_posInArray=0;
}
}
return count;
}
public override long Length{
get {
long res = 0;
for(int i=0;i<_arrays.Length;i++){
res+=_arrays[i].Length;
}
return res;
}
}
public override long Position{
get { return _position; }
set { throw new NotSupportedException(); }
}
public override bool CanRead{
get { return true; }
}
public override bool CanSeek{
get { return false; }
}
public override bool CanWrite{
get { return false; }
}
public override void Flush(){
}
public override void Seek(long offset, SeekOrigin origin){
throw new NotSupportedException();
}
public override void SetLength(long value){
throw new NotSupportedException();
}
public override void Write(byte[] buffer, int offset, int count){
throw new NotSupportedException();
}
}
Another way to workaround the size-limitation of 2^31 bytes is UnmanagedMemoryStream which implements System.IO.Stream on top of an unmanaged memory buffer (which might be as large as the OS supports). Something like this might work (untested):
var fileStream = new FileStream("data",
FileMode.Open,
FileAccess.Read,
FileShare.Read,
16 * 1024,
FileOptions.SequentialScan);
long length = fileStream.Length;
IntPtr buffer = Marshal.AllocHGlobal(new IntPtr(length));
var memoryStream = new UnmanagedMemoryStream((byte*) buffer.ToPointer(), length, length, FileAccess.ReadWrite);
fileStream.CopyTo(memoryStream);
memoryStream.Seek(0, SeekOrigin.Begin);
// work with the UnmanagedMemoryStream
Marshal.FreeHGlobal(buffer);
Agree. Anyway you have limit of array size itself.
If you really need to operate huge arrays in a stream, write your custom memory stream class.
I think you can use a linear structure instead of a 2D structure using the following approach.
Instead of having byte[int.MaxValue][10] you can have byte[int.MaxValue*10]. You would address the item at [4,5] as int.MaxValue*(4-1)+(5-1). (a general formula would be (i-1)*number of columns+(j-1).
Of course you could use the other convention.
If I understand your question correctly, you've got a massive file that you want to read into memory and then process. But you can't do this because the amount of data in the file exceeds that of any single-dimensional array.
You mentioned that speed is important, and that you have multiple threads running in parallel to process the data as quickly as possible. If you're going to have to partition the data for each thread anyway, why not base the number of threads on the number of byte[int.MaxValue] buffers required to cover everything?
You can create a memoryStream and then pass the array in line by line using the method Write
EDIT:
The limit of a MemoryStream is certainly the amount of memory present for your application. Maybe there is a limit beneath that but if you need more memory, then you should consider to modify your overall architecture. E.g. you could process your data in chunks, or you could do a swapping mechanism to a file.
If you are using Framework 4.0, you have the option of working with a MemoryMappedFile. Memory mapped files can be backed by a physical file, or by the Windows swap file. Memory mapped files act like an in-memory stream, transparently swapping data to/from the backing storage if and when required.
If you are not using Framework 4.0, you can still use this option, but you will need to either write your own or find an exsiting wrapper. I expect there are plenty on The Code Project.