C#: is FileStream.ReadByte() a multi-threading friendly function?

C#: is FileStream.ReadByte() a multi-threading friendly function? - c#

So I have 16 threads that simultaneously run this method:
private void Work()
{
int currentByte;
char currentChar;
try
{
while (true)
{
position++;
currentByte = file.ReadByte();
currentChar = Convert.ToChar(currentByte);
entries.Add(new Entry(currentChar));
}
}
catch (Exception) { }
}
And then I have one more thread running this method:
private void ManageThreads()
{
bool done;
for(; ; )
{
done = !threads.Any(x => x.IsAlive == true);//Check if each thread is dead before continuing
if (done)
break;
else
Thread.Sleep(100);
}
PrintData();
}
Here is the problem: the PrintData method just prints everything in the 'entries' list to a text file. This text file is different every time the program is run even with the same input file. I am a bit of a noob when it comes to multi-threaded applications so feel free to dish out the criticism.

In general unless type explicitly calls out thread safety in its documentation you should assume it is not thread-safe*. Streams in .Net do not have such section and should be treated non-thread safe - use appropriate synchronization (i.e. locks) that guarantees that each stream is accessed from one thread at a time.
With file streams there is another concern - OS level file object may be updated from other threads - FileStream tries to mitigate it by checking if its internal state matches OS state - see FileStream:remarks section on MSDN.
If you want thread safe stream you can try to use Synchronized method as shown in C#, is there such a thing as a "thread-safe" stream?.
Note that code you have in the post will produce random results whether stream is thread safe or not. Thread safety of a stream will only guarantee that all bytes show up in output. If using non thread safe stream there is no guarantees at all and some bytes may show up multiple times, some skipped and any other behavior (crashes, partial reads,...) are possible.
* Thread-safe as in "internal state of the instance will be consistent whether it is called from one thread or multiple". It does not mean calling arbitrary methods from different threads will lead to useful behavior.

Related

CopyToAsync weird behaviour when used from multiple threads

I have the following function to write to a file asynchronously from multiple threads in parallel->
static startOffset = 0; // This variable will store the offset at which the thread begins to write
static int blockSize = 10; // size of block written by each thread
static Task<long> WriteToFile(Stream dataToWrite)
{
var startOffset= getStartfOffset(); // Definition of this function is given later
using(var fs = new FileStream(fileName,
FileMode.OpenOrCreate,
FileAccess.ReadWrite,
FileShare.ReadWrite))
{
fs.Seek(offset,SeekOrigin.Begin);
await dataToWrite.CopyToAsync(fs);
}
return startOffset;
}
/**
*I use reader writer lock here so that only one thread can access the value of the startOffset at
*a time
*/
static int getStartOffset()
{
int result = 0;
try
{
rwl.AcquireWriterLock();
result = startOffset;
startOffset+=blockSize; // increment the startOffset for the next thread
}
finally
{
rwl.ReleaseWriterLock();
}
return result;
}
I then access the above function using to write some strings from multiple threads.
var tasks = List<Task>();
for(int i=1;i<=4;i++)
{
tasks.Add(Task.Run( async() => {
String s = "aaaaaaaaaa"
byte[] buffer = new byte [10];
buffer = Encoding.Default.GetBytes(s);
Stream data = new MemoryStream(buffer);
long offset = await WriteToFile(data);
Console.WriteLine($"Data written at offset - {offset}");
});
}
Task.WaitAll(tasks.ToArray());
Now , this code executes well most of the times. But sometimes randomly, it write some Japanese characters or some other symbols in the file. Is there something that I am doing wrong in the multithreading?

Your calculation of startOffset assumes that each thread is writing exactly 10 bytes. There are several issues with this.
One, the data has unknown length:
byte[] buffer = new byte [10];
buffer = Encoding.Default.GetBytes(s);
The assignment doesn't put data into the newly allocated 10 byte array, it leaks the new byte[10] array (which will be garbage collected) and stores a reference to the return of GetBytes(s), which could have any length at all. It could overflow into the next Task's area. Or it could leave some content that existed in the file beforehand (you use OpenOrCreate) which lies in the area for the current Task, but past the end of the actual dataToWrite.
Two, you try to seek past the areas that other threads are expected to write to, but if those writes haven't completed, they haven't increased the file length. So you attempt to seek past the end of the file, which is allowed for the Windows API but might cause problems with the .NET wrappers. However, FileStream.Seek does indicate you are ok
When you seek beyond the length of the file, the file size grows
although this might not be precisely correct, since the Windows API says
It is not an error to set a file pointer to a position beyond the end of the file. The size of the file does not increase until you call the SetEndOfFile, WriteFile, or WriteFileEx function. A write operation increases the size of the file to the file pointer position plus the size of the buffer written, which results in the intervening bytes uninitialized.

I think that asynchronous file I/O is not usually meant to be utilized with multithreading. Just because something is asynchronous does not mean that an operation should have multiple threads assigned to it.
To quote the documentation for async file I/O: Asynchronous operations enable you to perform resource-intensive I/O operations without blocking the main thread. Basically, instead of using a bunch of threads on one operation, it dispatches a new thread to accomplish a less meaningful task. Eventually with a big enough application, nearly everything can be abstracted to be a not-so-meaningful task and computers can run massive apps pretty quickly utilizing multithreading.
What you are likely experiencing is undefined behavior due to multiple threads overwriting the same location in memory. These Japanese characters you are referring to are likely malformed ascii/unicode that your text editor is attempting to interpret.
If you would like to remedy the undefined behavior and remain using asynchronous operations, you should be able to await each individual task before the next one can start. This will prevent the offset variable from being in the incorrect position for the newest task. Although, logically it will run the same as a synchronous version.

Beyond "honor code", is there a difference usign a dedicated "lock object" and locking data directly?

I have two threads: one that feeds updates and one that writes them to disk. Only the most recent update matters, so I don't need a PC queue.
In a nutshell:
The feeder thread drops the latest update into a buffer, then sets a flag to indicate a new update.
The writer thread checks the flag, and if it indicates new content, writes the buffered update to disk and disables the flag again.
I'm currently using a dedicate lock object to ensure that there's no inconsistency, and I'm wondering what differences that has from locking the flag and buffer directly. The only one I'm aware of is that a dedicated lock object requires trust that everyone who wants to manipulate the flag and buffer uses the lock.
Relevant code:
private object cacheStateLock = new object();
string textboxContents;
bool hasNewContents;
private void MainTextbox_TextChanged(object sender, TextChangedEventArgs e)
{
lock (cacheStateLock)
{
textboxContents = MainTextbox.Text;
hasNewContents = true;
}
}
private void WriteCache() // running continually in a thread
{
string toWrite;
while (true)
{
lock (cacheStateLock)
{
if (!hasNewContents)
continue;
toWrite = textboxContents;
hasNewContents = false;
}
File.WriteAllText(cacheFilePath, toWrite);
}
}

First of all, if you're trying to use the bool flag in such manner, you should mark it as volatile (which isn't recommended at all, yet better than your code).
Second thing to note is that lock statement is a sintax sugar for a Monitor class methods, so even if you would be able to provide a value type for it (which is a compile error, by the way), two different threads will get their own version of the flag, making the lock useless. So you must provide a reference type for lock statement.
Third thing is that strings are immutable in the C# so it's theoretically possible for some method to store an old reference to the string and do the lock in a wrong way. Also a string could became a null from MainTextbox.Text in your case, which will throw in runtime, comparing with a private object which wouldn't ever change (you should mark it as readonly by the way).
So, introduction of a dedicated object for synchronization is an easiest and natural way to separate locking from actual logic.
As for your initial code, it has a problem, as MainTextbox_TextChanged could override the text which wasn't being written down. You can introduce some additional synchronization logic or use some library here. #Aron suggested the Rx here, I personally prefer the TPL Dataflow, it doesn't matter.
You can add the BroadcastBlock linked to ActionBlock<string>(WriteCache), which will remove the infinite loop from WriteCache method and the lock from both of your methods:
var broadcast = new BroadcastBlock<string>(s => s);
var consumer = new ActionBlock<string>(s => WriteCache(s));
broadcast.LinkTo(consumer);
// fire and forget
private async void MainTextbox_TextChanged(object sender, TextChangedEventArgs e)
{
await broadcast.SendAsync(MainTextbox.Text);
}
// running continually in a thread without a loop
private void WriteCache(string toWrite)
{
File.WriteAllText(cacheFilePath, toWrite);
}

Detect Boolean value changes inside Thread

I have a c++ dll function that i want to run inside the C# thread.
Some times I need to cancel that thread, and here is the issue :
Thread.Abort() is evil from the multitude of articles I've read on
the topic
The only way to do that was to use a bool and check it's value periodically.
My problem that even i set this value to true it didn't change and still equal to false in c++ code. However when I show a MessageBox that value changed and it works fine.
Any ideas why that value changed only when the MessageBox showed and please tell me how to fix that issue.
C#
public void AbortMesh()
{
if (currMeshStruct.Value.MeshThread != null && currMeshStruct.Value.MeshThread.IsAlive)
{
//here is my c++ Object and cancel mesh used to set bool to true;
MeshCreator.CancelMesh();
}
}
C++
STDMETHODIMP MeshCreator::CancelMesh(void)
{
this->m_StopMesh = TRUE;
return S_OK;
}
when I test the boolean value
if (m_StopMesh)
return S_FALSE;
The value here is always false even i call AbortMesh()
if (m_StopMesh)
return S_FALSE;
MessageBox(NULL,aMessage,L"Test",NULL);
if (m_StopMesh) // here the value is changed to true
return S_FALSE;

The non-deterministic thread abortion (like with Thread.Abort) is a really bad practice. The problem is that it is the only practice that allows you to stop your job when job does not know that it could be stopped.
There is no library or framework in .NET I know of that allows to write threaded code that could allow you to run an arbitrary task and abort it at any time without dire consequences.
So, you was completely write when you decided to use manual abort using some synchronization technique.
Solutions:
1) The simplest one is using of a volatile Boolean variable as it was already suggested:
C#
public void AbortMesh()
{
if (currMeshStruct.Value.MeshThread != null && currMeshStruct.Value.MeshThread.IsAlive)
{
MeshCreator.CancelMesh();
}
}
C++/CLI
public ref class MeshCreator
{
private:
volatile System::Boolean m_StopMesh;
...
}
STDMETHODIMP MeshCreator::CancelMesh(void)
{
this->m_StopMesh = TRUE;
return S_OK;
}
void MeshCreator::ProcessMesh(void)
{
Int32 processedParts = 0;
while(processedParts != totalPartsToProcess)
{
ContinueProcessing(processedParts);
processedParts++;
if (this->m_StopMesh)
{
this->MakeCleanup();
MessageBox(NULL,aMessage,L"Test",NULL);
}
}
}
Such code should not require any synchronization if you do not make any assumptions on completion of thread after the CancelMesh call - it is not instantaneous and may take variable amount of time to happen.
I don't know why the use of the volatile didn't help you, but there are few moments you could check:
Are you sure that the MeshCreator.CancelMesh(); method call actually happen?
Are you sure that m_StopMesh is properly initialized before the actual processing begins?
Are you sure that you check the variable inside the ProcessMesh often enough to have decent response time from your worker and not expecting something instantaneous?
2)Also if you use .NET 4 or higher you could also try to use the CancellationToken-CancellationTokenSource model. It was initially designed to work with Tasks model but works well with standard threads. It won't really simplify your code but taking into an account the async nature of your processing code will possibly simplify future integration with TPL
CancellationTokenSource cancTokenSource = new CancellationTokenSource();
CancellationToken cancToken = cancTokenSource.Token;
Thread thread = new Thread(() =>
{
Int32 iteration = 0;
while (true)
{
Console.WriteLine("Iteration {0}", iteration);
iteration++;
Thread.Sleep(1000);
if (cancToken.IsCancellationRequested)
break;
}
});
thread.Start();
Console.WriteLine("Press any key to cancel...");
Console.ReadKey();
cancTokenSource.Cancel();
3) You may want to read about interlocked class,monitor locks, autoresetevents and other synchronization, but they are not actually needed in this application
EDIT:
Well, I don't know how it couldn't help(it is not the best idea, but should work for such a scenario), so I'll try later to mock your app and check the issue - possibly it has something to do with how MSVC and CSC handle volatile specifier.
For now try to use Interlocked reads and writes in your app:
public ref class MeshCreator
{
private:
System::Boolean m_StopMesh;
...
}
STDMETHODIMP MeshCreator::CancelMesh(void)
{
Interlocked::Exchange(%(this->m_StopMesh), true);
return S_OK;
}
void MeshCreator::ProcessMesh(void)
{
Int32 processedParts = 0;
while(processedParts != totalPartsToProcess)
{
ContinueProcessing(processedParts);
processedParts++;
if (Interlocked::Read(%(this->m_StopMesh))
{
this->MakeCleanup();
MessageBox(NULL,aMessage,L"Test",NULL);
}
}
}
P.S.: Can you post the code that actually processes the data and checks the variable(I don't mean your full meshes calculations method, just its main stages and elements)?
EDIT: AT LEAST IT'S CLEAR WHAT THE SYSTEM IS ABOUT
It is possible that your child processes are just not exterminated quick enough. Read this SO thread about process killing.
P.S.: And edit your question to more clearly describe your system and problem. It is difficult to get the right answer to a wrong or incomplete question.

Try putting volatile before the field m_StopMesh:
volatile BOOL m_StopMesh;

I launched the c++ process using a thread and it worked fine.
If you want to communicate across process boundaries, you will need to use some sort of cross-process communication.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365574(v=vs.85).aspx
I find Named Pipes convenient and easy to use.
UPDATE
Your comment clarifies that the C++ code is running in-process.
I would suggest a ManualResetEvent. For a great overview of thread synchronization (and threads in general) check out http://www.albahari.com/threading/

How to handle large numbers of concurrent disk write requests as efficiently as possible

Say the method below is being called several thousand times by different threads in a .net 4 application. What’s the best way to handle this situation? Understand that the disk is the bottleneck here but I’d like the WriteFile() method to return quickly.
Data can be can be up to a few MB. Are we talking threadpool, TPL or the like?
public void WriteFile(string FileName, MemoryStream Data)
{
try
{
using (FileStream DiskFile = File.OpenWrite(FileName))
{
Data.WriteTo(DiskFile);
DiskFile.Flush();
DiskFile.Close();
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}

If you want to return quickly and not really care that operation is synchronous you could create some kind of in memory Queue where you will be putting write requests , and while Queue is not filled up you can return from method quickly. Another thread will be responsible for dispatching Queue and writing files. If your WriteFile is called and queue is full you will have to wait until you can queue and execution will become synchronous again, but that way you could have a big buffer so if process file write requests is not linear , but is more spiky instead (with pauses between write file calls spikes) such change can be seen as an improvement in your performance.
UPDATE:
Made a little picture for you. Notice that bottleneck always exists, all you can possibly do is optimize requests by using a queue. Notice that queue has limits, so when its filled up , you cannot insta queue files into, you have to wait so there is a free space in that buffer too. But for situation presented on picture (3 bucket requests) its obvious you can quickly put buckets into queue and return, while in first case you have to do that 1 by one and block execution.
Notice that you never need to execute many IO threads at once, since they will all be using same bottleneck and you will just be wasting memory if you try to parallel this heavily, I believe 2 - 10 threads tops will take all available IO bandwidth easily, and will limit application memory usage too.

Since you say that the files don't need to be written in order nor immediately, the simplest approach would be to use a Task:
private void WriteFileAsynchronously(string FileName, MemoryStream Data)
{
Task.Factory.StartNew(() => WriteFileSynchronously(FileName, Data));
}
private void WriteFileSynchronously(string FileName, MemoryStream Data)
{
try
{
using (FileStream DiskFile = File.OpenWrite(FileName))
{
Data.WriteTo(DiskFile);
DiskFile.Flush();
DiskFile.Close();
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
The TPL uses the thread pool internally, and should be fairly efficient even for large numbers of tasks.

If data is coming in faster than you can log it, you have a real problem. A producer/consumer design that has WriteFile just throwing stuff into a ConcurrentQueue or similar structure, and a separate thread servicing that queue works great ... until the queue fills up. And if you're talking about opening 50,000 different files, things are going to back up quick. Not to mention that your data that can be several megabytes for each file is going to further limit the size of your queue.
I've had a similar problem that I solved by having the WriteFile method append to a single file. The records it wrote had a record number, file name, length, and then the data. As Hans pointed out in a comment to your original question, writing to a file is quick; opening a file is slow.
A second thread in my program starts reading that file that WriteFile is writing to. That thread reads each record header (number, filename, length), opens a new file, and then copies data from the log file to the final file.
This works better if the log file and the final file are are on different disks, but it can still work well with a single spindle. It sure exercises your hard drive, though.
It has the drawback of requiring 2X the disk space, but with 2-terabyte drives under $150, I don't consider that much of a problem. It's also less efficient overall than directly writing the data (because you have to handle the data twice), but it has the benefit of not causing the main processing thread to stall.

Encapsulate your complete method implementation in a new Thread(). Then you can "fire-and-forget" these threads and return to the main calling thread.
foreach (file in filesArray)
{
try
{
System.Threading.Thread updateThread = new System.Threading.Thread(delegate()
{
WriteFileSynchronous(fileName, data);
});
updateThread.Start();
}
catch (Exception ex)
{
string errMsg = ex.Message;
Exception innerEx = ex.InnerException;
while (innerEx != null)
{
errMsg += "\n" + innerEx.Message;
innerEx = innerEx.InnerException;
}
errorMessages.Add(errMsg);
}
}

readerwriterlock allowing reads while write lock is acquired?

I have a static class which is accessed by multiple remoting and other internal to the application threads. Part of the functionality of this class is controlling read/write access to various files, so I've implemented a static ReaderWriterLock on the list of files. The project uses the .net framework 2.0 as part of the customer requirements.
However when I stress test the system using a number of different clients (generally I'm using 16) each performing a large amount of reads and writes then very intermittently and only after several hours or even days have passed with at least 500k+ transactions completed the system crashes. Ok so we got a bug..
But when I check the logs of all locking events I can see that the following has happened:
1: Thread A acquires a write lock directly, checking IsWriterLock shows it to be true.
2: Thread B tries to acquire a reader lock and succeeds even though Thread A still has the write lock
3: System now crashes, stack trace now shows a null reference exception to the readerwriterlock
This process has been run several hundred thousand times previously with no errors and I can check the logs and see that the read lock was blocked in all cases previously until the write had exited. I have also tried implementing the readerwriterlock as a singleton but the issue still occurs
Has anybody ever seen anything like this before ??
A slimed down version of the readerwriterlock implementation used is shown below:
private const int readwriterlocktimeoutms = 5000;
private static ReaderWriterLock readerWriterLock = new ReaderWriterLock();
// this method will be called by thread A
public static void MethodA()
{
// bool to indicate that we have the lock
bool IsTaken = false;
try
{
// get the lock
readerWriterLock.AcquireWriterLock(readwriterlocktimeoutms);
// log that we have the lock for debug
// Logger.LogInfo("MethodA: acquired write lock; writer lock held {0}; reader lock held {1}", readerWriterLock.IsWriterLockHeld.ToString(),readerWriterLock.IsReaderLockHeld.ToString(), );
// mark that we have taken the lock
IsTaken = true;
}
catch(Exception e)
{
throw new Exception(string.Format("Error getting lock {0} {1}", e.Message, Environment.StackTrace));
}
try
{
// do some work
}
finally
{
if (IsTaken)
{
readerWriterLock.ReleaseWriterLock();
}
}
}
// this method will be called by thread B
public static void MethodB()
{
// bool to indicate that we have the lock
bool IsTaken = false;
try
{
// get the lock
readerWriterLock.AcquireReaderLock(readwriterlocktimeoutms);
// log that we have the lock for debug
// Logger.LogInfo("MethodB: acquired read lock; writer lock held {0}; reader lock held {1}", readerWriterLock.IsWriterLockHeld.ToString(),readerWriterLock.IsReaderLockHeld.ToString(), );
// mark that we have taken the lock
IsTaken = true;
}
catch (Exception e)
{
throw new Exception(string.Format("Error getting lock {0} {1}", e.Message, Environment.StackTrace));
}
try
{
// do some work
}
finally
{
if (IsTaken)
{
readerWriterLock.ReleaseReaderLock();
}
}
}
enter code here

#All finally have a solution to this problem. #Yannick you were on the right track...
If MSDN says that it's impossible to have reader and writer lock held at same time.
Today I got confirmation from microsoft that in cases of very heavy load on multiprocessor systems (note: I could never reproduce this problem on an AMD system only on Intel) its possible for ReaderWriterLock class objects to become corrupted, the risk of this is increased if the numer of writers at any given stage grows as these can backup in the queue.
For the last two weeks I've been running using the .Net 3.5 ReaderWriterLockSlim class and have not encountered the issue, which corresponds to what Microsoft have confirmed that the readerwriterlockslim class does not have the same risk of corruption as the fat ReaderWriterLock class.

If MSDN says that it's impossible to have reader and writer lock held at same time.
Is it possible in your process to have 2 readerWriterLock objects at any time, for some other reason ?
Another thing strange, is that Debugging a thread using isWriterLockHeld, whereas the current thread is a reader one, don't allow you to know about writing within another thread.
How do you know that Thread A still holds a writer lock, and how do you know that it's not the debug-Logging system that delay or "mix" instructions given by threads ?
Other thought, is it possible that other resource shared leads to a deadlock ? That would results somehow to a crash ? (while, Null Exception is still strange unless where consider the deadlock cleaned and readerWriterLock reset.
Your problem is strange, true.
And other question, that won't solve your problem. What do you use isTaken, whereas in debugging your application you rely on isWriterLockHeld (or isReaderLockHeld) ?
why not use it in your finally blocks ?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C#: is FileStream.ReadByte() a multi-threading friendly function? - c#

Related

CopyToAsync weird behaviour when used from multiple threads

Beyond "honor code", is there a difference usign a dedicated "lock object" and locking data directly?

Detect Boolean value changes inside Thread

How to handle large numbers of concurrent disk write requests as efficiently as possible

readerwriterlock allowing reads while write lock is acquired?

Categories

Resources