Prevent Reading and Writing to a File at the Same Time - c#

I have a process that needs to read and write to a file. The application has a specific order to its reads and writes and I want to preserve this order. What I would like to do is implement something that lets the first operation start and makes the second operation wait until the first is done with a first come first served like of queue to access the file. From what I have read file locking seems like it might be what I am looking for but I have not been able to find a very good example. Can anyone provide one?
Currently I am using a TextReader/Writer with .Synchronized but this is not doing what I hoped it would.
Sorry if this is a very basic question, threading gives me a headache :S

It should be as simple as this:
public static readonly object LockObj = new object();
public void AnOperation()
{
lock (LockObj)
{
using (var fs = File.Open("yourfile.bin"))
{
// do something with file
}
}
}
public void SomeOperation()
{
lock (LockObj)
{
using (var fs = File.Open("yourfile.bin"))
{
// do something else with file
}
}
}
Basically, define a lock object, then whenever you need to do something with your file, make sure you get a lock using the C# lock keyword. On reaching the lock statement, execution will block indefinitely until a lock has been obtained.
There are other constructs you can use for locking, but I find the lock keyword to be the most straightforward.

If you're using a current version of the .Net Framework, you can benefit from Task.ContinueWith.
If your units of work are logically always, "read some, then write some", the following expresses that intent succinctly and should scale:
string path = "file.dat";
// Start a reader task
var task = Task.Factory.StartNew(() => ReadFromFile(path));
// Continue with a writer task
task.ContinueWith(tt => WriteToFile(path));
// We're guaranteed that the read will occur before the write
// and that the write will occur once the read completes.
// We also can check the antecedent task's result (tt.Result in our
// example) for any special error logic we need.

Related

C#: is FileStream.ReadByte() a multi-threading friendly function?

So I have 16 threads that simultaneously run this method:
private void Work()
{
int currentByte;
char currentChar;
try
{
while (true)
{
position++;
currentByte = file.ReadByte();
currentChar = Convert.ToChar(currentByte);
entries.Add(new Entry(currentChar));
}
}
catch (Exception) { }
}
And then I have one more thread running this method:
private void ManageThreads()
{
bool done;
for(; ; )
{
done = !threads.Any(x => x.IsAlive == true);//Check if each thread is dead before continuing
if (done)
break;
else
Thread.Sleep(100);
}
PrintData();
}
Here is the problem: the PrintData method just prints everything in the 'entries' list to a text file. This text file is different every time the program is run even with the same input file. I am a bit of a noob when it comes to multi-threaded applications so feel free to dish out the criticism.
In general unless type explicitly calls out thread safety in its documentation you should assume it is not thread-safe*. Streams in .Net do not have such section and should be treated non-thread safe - use appropriate synchronization (i.e. locks) that guarantees that each stream is accessed from one thread at a time.
With file streams there is another concern - OS level file object may be updated from other threads - FileStream tries to mitigate it by checking if its internal state matches OS state - see FileStream:remarks section on MSDN.
If you want thread safe stream you can try to use Synchronized method as shown in C#, is there such a thing as a "thread-safe" stream?.
Note that code you have in the post will produce random results whether stream is thread safe or not. Thread safety of a stream will only guarantee that all bytes show up in output. If using non thread safe stream there is no guarantees at all and some bytes may show up multiple times, some skipped and any other behavior (crashes, partial reads,...) are possible.
* Thread-safe as in "internal state of the instance will be consistent whether it is called from one thread or multiple". It does not mean calling arbitrary methods from different threads will lead to useful behavior.

Beyond "honor code", is there a difference usign a dedicated "lock object" and locking data directly?

I have two threads: one that feeds updates and one that writes them to disk. Only the most recent update matters, so I don't need a PC queue.
In a nutshell:
The feeder thread drops the latest update into a buffer, then sets a flag to indicate a new update.
The writer thread checks the flag, and if it indicates new content, writes the buffered update to disk and disables the flag again.
I'm currently using a dedicate lock object to ensure that there's no inconsistency, and I'm wondering what differences that has from locking the flag and buffer directly. The only one I'm aware of is that a dedicated lock object requires trust that everyone who wants to manipulate the flag and buffer uses the lock.
Relevant code:
private object cacheStateLock = new object();
string textboxContents;
bool hasNewContents;
private void MainTextbox_TextChanged(object sender, TextChangedEventArgs e)
{
lock (cacheStateLock)
{
textboxContents = MainTextbox.Text;
hasNewContents = true;
}
}
private void WriteCache() // running continually in a thread
{
string toWrite;
while (true)
{
lock (cacheStateLock)
{
if (!hasNewContents)
continue;
toWrite = textboxContents;
hasNewContents = false;
}
File.WriteAllText(cacheFilePath, toWrite);
}
}
First of all, if you're trying to use the bool flag in such manner, you should mark it as volatile (which isn't recommended at all, yet better than your code).
Second thing to note is that lock statement is a sintax sugar for a Monitor class methods, so even if you would be able to provide a value type for it (which is a compile error, by the way), two different threads will get their own version of the flag, making the lock useless. So you must provide a reference type for lock statement.
Third thing is that strings are immutable in the C# so it's theoretically possible for some method to store an old reference to the string and do the lock in a wrong way. Also a string could became a null from MainTextbox.Text in your case, which will throw in runtime, comparing with a private object which wouldn't ever change (you should mark it as readonly by the way).
So, introduction of a dedicated object for synchronization is an easiest and natural way to separate locking from actual logic.
As for your initial code, it has a problem, as MainTextbox_TextChanged could override the text which wasn't being written down. You can introduce some additional synchronization logic or use some library here. #Aron suggested the Rx here, I personally prefer the TPL Dataflow, it doesn't matter.
You can add the BroadcastBlock linked to ActionBlock<string>(WriteCache), which will remove the infinite loop from WriteCache method and the lock from both of your methods:
var broadcast = new BroadcastBlock<string>(s => s);
var consumer = new ActionBlock<string>(s => WriteCache(s));
broadcast.LinkTo(consumer);
// fire and forget
private async void MainTextbox_TextChanged(object sender, TextChangedEventArgs e)
{
await broadcast.SendAsync(MainTextbox.Text);
}
// running continually in a thread without a loop
private void WriteCache(string toWrite)
{
File.WriteAllText(cacheFilePath, toWrite);
}

Open thread in foreach loop

I am getting an XML feed and I parse it the my MQ server, then I have a service that listen to the MQ server and reading all its messages.
I have a foreach loop that opens a new thread each iteration, in order to make the parsing faster, cause there are around 500 messages in the MQ (means there are 500 XMLs)
foreach (System.Messaging.Message m in msgs)
{
byte[] bytes = new byte[m.BodyStream.Length];
m.BodyStream.Read(bytes, 0, (int)m.BodyStream.Length);
System.Text.ASCIIEncoding ascii = new System.Text.ASCIIEncoding();
ParserClass tst = new ParserClass(ascii.GetString(bytes, 0, (int)m.BodyStream.Length));
new Thread( new ThreadStart(tst.ProcessXML)).Start();
}
In the ParserClass I have this code:
private static object thLockMe = new object();
public string xmlString { get; set; }
public ParserClass(string xmlStringObj)
{
this.xmlString = xmlStringObj;
}
public void ProcessXML()
{
lock (thLockMe)
{
XDocument reader = XDocument.Parse(xmlString);
//Some more code...
}
}
The problem is, when I run this foreach loop with 1 thread only, it works perfect, but slow.
When I run it with more then 1 thread, I get an error "Object reference not set to an instance of an object".
I guess there is something wrong with my locking since I am not very experienced with threading.
I am kinda hopeless, hope you can help!
Cheers!
I note that you are running a bunch of threads with their entire code wrapped inside a lock statement. You might as well run the methods in a sequence this way, because you are not getting any parallelism.
Since you are creating a new ParserClass instance on every iteration of your loop, and also creating and starting a new thread every iteration, you do not need a lock in your ParseXML method.
Your object on which you lock is currently static, so it is not instance bound, which means, once one thread is inside your ParseXML method, no other will be able to do anything, until the first has finished.
You are not sharing any data (from the code I can see) in your Parser class amongst threads, so you don't need a lock, inside your ParseXML function.
If you are using data that is shared between threads, then you should have a lock.
If you're going to be using lots of threads, then you're better of using a ThreadPool, and taking a finite (4 perhaps) from your pool, assigning them some work, and recycling them for the next 4 tasks.
Creating threads is an expensive operation, which requires a call into the OS kernel, so you do not want to do that 500 times. This is too costly. Also, the min reserved memory for a threadstack in Windows is 1MB, so that is 500MB in stackspace alone for your threads.
An optimal number of threads should be equal to the number of cores in your machine, however since that's not real for most purposes, you can do double or triple that, but then you're better off with a threadpool, where you recycle threads, instead of creating new one's all the time.
Even though this probably won't solve your problem, instead of creating 500 simultaneous threads you should just use the ThreadPool, which manages threads in a much more efficient way:
foreach (System.Messaging.Message m in msgs)
{
byte[] bytes = new byte[m.BodyStream.Length];
m.BodyStream.Read(bytes, 0, (int)m.BodyStream.Length);
System.Text.ASCIIEncoding ascii = new System.Text.ASCIIEncoding();
ParserClass tst = new ParserClass(ascii.GetString(bytes, 0, (int)m.BodyStream.Length));
ThreadPool.QueueUserWorkItem(x => tst.ProcessXML());
}
And to make sure they run as simultaneously as possible change your code in the ParserClass like this (assuming you indeed have resources you share between threads - if you don't have any, you don't have to lock at all):
private static object thLockMe = new object();
public string XmlString { get; set; }
public ParserClass(string xmlString)
{
XmlString = xmlString;
}
public void ProcessXML()
{
XDocument reader = XDocument.Parse(xmlString);
// some more code which doesn't need to access the shared resource
lock (thLockMe)
{
// the necessary code to access the shared resource (and only that)
}
// more code
}
Regarding your actual question:
Instead of calling OddService.InsertEvent(...) multiple times with the same parameters (that method reeks of remote calls and side effects...) you should call it once, store the result in a variable and do all subsequent operations on that variable. That way you can also conveniently check if it's not that precise method which returns null sometimes (when accessed simultaneously?).
Edit:
Does it work if you put all calls to OddService.* in lock blocks?

Implementing stop and restart in file stream transfer - how to? C# .NET

I'm looking for texts or advice on implementing stop and restart in file stream transfer.
The goal is for the application to use a single read source, and output to multiple write sources, and be able to restart from a recorded position if a transfer fails.
The application is being written in C# .NET.
Psuedo code:
while (reader.Read())
{
foreach(writer in writers)
{
writer.WriteToStream();
}
}
I need to be able to implement stop or pause. Which could work like so. To stop, continue is marked false:
while (reader.Read() && Continue)
{
foreach(writer in writers)
{
writer.WriteToStream();
}
}
Clearly at this stage I need to record the number of bytes read, and the number of bytes written to each write source.
My questions are:
If I were to only record the read bytes, and use this for restarts, one or more writers could have written while others have not. Simply restarting using a measure of read progress might corrupt the written data. So I need to use a 'written bytes per writer' record as my new start position. How can I be sure that the bytes were written (I may not have the ability to read the file from the write source to read the file length)?
Can anyone adviser or point me in the right direction of a text on this kind of issue?
Use a thread synchronization event.
(pseudocode):
ManualResetEvent _canReadEvent = new ManualResetEvent(true);
public void WriterThreadFunc()
{
while (_canReadEvent.Wait() && reader.Read())
{
foreach(writer in writers)
{
writer.WriteToStream();
}
}
}
public void Pause()
{
_canReadEvent.Reset();
}
public void Continue()
{
_canReadEvent.Set();
}
The good thing is that the writer thread won't consume any CPU when it's paused and it will continue directly it's signaled (as opposed to using a flag and Thread.Sleep())
The other note is that any check should be the first argument in the while since reader.Read() will read from the stream otherwise (but the content will be ignored since the flag will prevent the while block from being executed).

Multithreading and concurency with C#

In a Windows Form window, multiple events can trigger an asynchronous method. This method downloads a file and caches it. My problem is that I want that method to be executed once. In other words, I want to prevent the file to be downloaded multiple times.
If the method downloading the file is triggered twice, I want the second call to wait for the file (or wait for the first method to be done).
Does someone have an idea on how to achieve that?
UPDATE: I am simply trying to prevent unnecessary downloads. In my case, when a client put its mouse over an item in a ListBox for more than a couple milliseconds, we start to download. We make the assumption that the user will click and request the file. What can potentially happen is that the user keeps his mouse over the item for one second and then click. In this case two downloads start. I am looking for the best way to handle such scenario.
UPDATE 2:: There is a possibility that the user will move its mouse over multiple items. In consequences, multiple downloads will occur. I've not really tough of this scenario, but right now if we face such scenario we don't abandon the download. The file will be downloaded (files are usually around 50-100kb) and then are going to be cached.
Maintain the state of what's happening in a form variable and have your async method check that state before it does anything. Make sure you synchronize access to it, though! Mutexes and semaphores are good for this kind of thing.
If you can download different files simultaneously, you'll need to keep track of what's being downloaded in a list for reference.
If only one file can be downloaded at a time, and you don't want to queue things up, you could just unhook the event while something is being downloaded, too, and rehook it when the download is complete.
Here is a dummy implementation that supports multiple file downloads:
Dictionary<string, object> downloadLocks = new Dictionary<string, object>();
void DownloadFile(string localFile, string url)
{
object fileLock;
lock (downloadLocks)
{
if (!downloadLocks.TryGetValue(url, out fileLock))
{
fileLock = new object();
downloadLocks[url] = fileLock;
}
}
lock (fileLock)
{
// check if file is already downloaded
// if not then download file
}
}
You can simply wrap your method call within a lock statement like this
private static readonly Object padLock = new Object();
...
lock(padLock)
{
YourMethod();
}
i'm not sure how it would be done in C#, but in java, you would synchonize on an private static final object in the class before downloading the file. This would block any further requests until the current one was completed. You could then check to see if the file was downloaded or not and act appropriately.
private static final Object lock = new Object();
private File theFile;
public method() {
synchronized(lock) {
if(theFile != null) {
//download the file
}
}
}
In general, I agree with Michael, use a lock around the code that actually gets the file. However, if there's a single event that always occurs first and you can always load the file then, consider using Futures. In the initial event, start the future running
Future<String> file = InThe.Future<String>(delegate { return LoadFile(); });
and in every other event, wait on the future's value
DoSomethingWith(file.Value);
If you want one thread to wait for another thread to finish a task, you probably want to use a ManualResetEvent. Maybe something like this:
private ManualResetEvent downloadCompleted = new ManualResetEvent();
private bool downloadStarted = false;
public void Download()
{
bool doTheDownload = false;
lock(downloadCompleted)
{
if (!downloadStarted)
{
downloadCompleted.Reset();
downloadStarted = true;
doTheDownload = true;
}
}
if (doTheDownload)
{
// Code to do the download
lock(downloadCompleted)
{
downloadStarted = false;
}
// When finished notify anyone waiting.
downloadCompleted.Set();
}
else
{
// Wait until it is done...
downloadCompleted.WaitOne();
}
}

Categories