I would like to ask help on my code. I am a newbie and wanted to implement safe multi threading in writing to a text file.
StreamWriter sw = new StreamWriter(#"C:\DailyLog.txt");
private void Update(){
var collection = Database.GetCollection<Entity>("products");
StreamReader sr = new StreamReader(#"C:\LUSTK.txt");
string[] line = sr.ReadLine().Split(new char[] { ';' });
while (!sr.EndOfStream)
{
line = sr.ReadLine().Split(new char[] { ';' });
t = delegate {
UpdateEach(Convert.ToInt32(line[5]));
};
new Thread(t).Start();
}
sr.Close();
}
private void UpdateEach(int stock)
{
sw.WriteLine(ean);
}
I got no error on my code but it seems not all written to my text file. I did not make sw.Close() because i know some threads were not finish yet. In addition, how can i implement sw.Close knowing that no thread left unfinish. I have 5 milion records in my LUSTK.text that is to be read by StreamReader and each created a thread and each thread access same text file.
You aren't going to be able to concurrently write to the same writer from different threads. The object wasn't designed to support concurrent access.
Beyond that, the general idea of writing to the same file from multiple threads is flawed. You still only have one physical disk, and it can only spin so fast. Telling it to do things faster won't make it spin any faster.
Beyond that, you're not closing the writer, as you said, and as a result, the buffer isn't being flushed.
You also have a bug in that your anonymous method is closing over line, and all of the methods are closing over the same variable, which is changing. It's important that they each close over their own identifier that won't change. (This can be accomplished simply by declaring line inside of the while loop.) But since you shouldn't be using multiple threads to begin with, there's no real need to focus on this.
You can also use File.ReadLines and File.WriteAllLines to do your file IO; it results in much cleaner code:
var values = File.ReadLines(inputFile)
.Select(line => line.Split(';')[5]);
File.WriteAllLines(outputFile, values);
If you were to want to parallelize this process it would be because you're doing some CPU bound work on each item after you read the line and before you write the line. Parallelizing the actual file IO, as said before, is likely to be harmful, not helpful. In this case the CPU bound work is just splitting the line and grabbing one value, and that's likely to be amazingly fast compared to the file IO. If you needed to, for example, hit the database or do some expensive processing on each line, then you would consider parallelizing just that part of the work, while synchronizing the file IO through a single thread.
A StreamWriter is simply not thread-safe; you would need to synchronize access to this via lock or similar. However, I would advise rethinking your strategy generally:
starting lots of threads is a really bad idea - threads are actually pretty expensive, and should not be used for small items of work (a Task or the ThreadPool might be fine, though) - a low number of threads perhaps separately dequeuing from a thread-safe queue would be preferable
you will have no guarantee of order in terms of the output
frankly, I would expect IO to be your biggest performance issue here, and that isn't impacted by the number of threads (or worse: can be adversely impacted)
Related
I'm going to start by describing my use case:
I have built an app which processes LARGE datasets, runs various transformations on them and them spits them out. This process is very time sensitive so a lot of time has gone into optimising.
The idea is to read a bunch of records at a time, process each one on different threads and write the results to file. But instead of writing them to one file, the results are written to one of many temp files which get combined into the desired output file at the end. This is so that we avoid memory write protection exceptions or bottlenecks (as much as possible).
To achieve that, we have an array of 10 fileUtils, 1 of which get passed to a thread as it is initiated. There is a threadCountIterator which increments at each localInit, and is reset back to zero when that count reaches 10. That value is what determines which of the fileUtils objects get passed to the record processing object per thread. The idea is that each util class is responsible for collecting and writing to just one of the temp output files.
It's worth nothing that each FileUtils object gathers about 100 records in a member outputBuildString variable before writing it out, hence having them exist separately and outside of the threading process, where objects lifespan is limited.
The is to more or less evenly disperse the responsability for collecting, storing and then writing the output data across multiple fileUtil objects which means we can write more per second than if we were just writing to one file.
my problem is that this approach results in a Array Out Of Bounds exception as my threadedOutputIterator jumps above the upper limit value, despite there being code that is supposed to reduce it when this happens:
//by default threadCount = 10
private void ProcessRecords()
{
try
{
Parallel.ForEach(clientInputRecordList, new ParallelOptions { MaxDegreeOfParallelism = threadCount }, LocalInit, ThreadMain, LocalFinally);
}
catch (Exception e)
{
Console.WriteLine("The following error occured: " + e);
}
}
private SplitLineParseObject LocalInit()
{
if (threadedOutputIterator >= threadCount)
{
threadedOutputIterator = 0;
}
//still somehow goes above 10, and this is where the excepetion hits since there are only 10 objects in the threadedFileUtils array
SplitLineParseObject splitLineParseUtil = new SplitLineParseObject(parmUtils, ref recCount, ref threadedFileUtils[threadedOutputIterator], ref recordsPassedToFileUtils);
if (threadedOutputIterator<threadCount)
{
threadedOutputIterator++;
}
return splitLineParseUtil;
}
private SplitLineParseObject ThreadMain(ClientInputRecord record, ParallelLoopState state, SplitLineParseObject threadLocalObject)
{
threadLocalObject.clientInputRecord = record;
threadLocalObject.ProcessRecord();
recordsPassedToObject++;
return threadLocalObject;
}
private void LocalFinally(SplitLineParseObject obj)
{
obj = null;
}
As explained in the above comment,it still manages to jump above 10, and this is where the excepetion hits since there are only 10 objects in the threadedFileUtils array. I understand that this is because multiple threads would be incrementing that number at the same time before either of the code in those if statements could be called, meaning theres still the chance it will fail in its current state.
How could I better approach this such that I avoid that exception, while still being able to take advantage of the read, store and write efficiency that having multiple fileUtils gives me?
Thanks!
But instead of writing them to one file, the results are written to one of many temp files which get combined into the desired output file at the end
That is probably not a great idea. If you can fit the data in memory it is most likely better to keep it in memory, or do the merging of data concurrently with the production of data.
To achieve that, we have an array of 10 fileUtils, 1 of which get passed to a thread as it is initiated. There is a threadCountIterator which increments at each localInit, and is reset back to zero when that count reaches 10
This does not sound safe to me. The parallel loop should guarantee that no more than 10 threads should run concurrently (if that is your limit), and that local init will run once for each thread that is used. As far as I know it makes no guarantee that no more than 10 threads will be used in total, so it seem possible that thread #0 and thread #10 could run concurrently.
The correct usage would be to create a new fileUtils-object in the localInit.
This more or less works and ends up being more efficient than if we are writing to just one file
Are you sure? typically IO does not scale very well with concurrency. While SSDs are absolutely better than HDDs, both tend to work best with sequential IO.
How could I better approach this?
My approach would be to use a single writing thread, and a blockingCollection as a thread-safe buffer between the producers and the writer. This assumes that the order of items is not significant:
public async Task ProcessAndWriteItems(List<int> myItems)
{
// BlockingCollection uses a concurrentQueue by default
// Can also set a max size , in case the writer cannot keep up with the producers
var writeQueue = new BlockingCollection<string>();
var writeTask = Task.Run(() => Writer(writeQueue));
Parallel.ForEach(
myItems,
item =>
{
writeQueue.Add(item.ToString());
});
writeQueue.CompleteAdding(); // signal the writer to stop once all items has been processed
await writeTask;
}
private void Writer(BlockingCollection<string> queue)
{
using var stream = new StreamWriter(myFilePath);
foreach (var line in queue.GetConsumingEnumerable())
{
stream.WriteLine(line);
}
}
There is also dataflow that should be suitable for tasks like this. But I have not used it, so I cannot provide specific recommendations.
Note that multi threaded programming is difficult. While it can be made easier by proper use of modern programming techniques, you still need need to know a fair bit about thread safety to understand the problems, and what options and tools exist to solve them. You will not always be so lucky to get actual exceptions, a more typical result of multi threading bugs would be that your program just produces the wrong result. If you are unlucky this only occur in production, on a full moon, and only when processing important data.
LocalInit obviously is not thread safe, so when invoked multiple times in parallel it will have all the multithreading problems caused by not-atomic operations. As a quick fix you can lock the whole method:
private object locker = new object();
private SplitLineParseObject LocalInit()
{
lock (locker)
{
if (threadedOutputIterator >= threadCount)
{
threadedOutputIterator = 0;
}
SplitLineParseObject splitLineParseUtil = new SplitLineParseObject(parmUtils, ref recCount,
ref threadedFileUtils[threadedOutputIterator], ref recordsPassedToFileUtils);
if (threadedOutputIterator < threadCount)
{
threadedOutputIterator++;
}
return splitLineParseUtil;
}
}
Or maybe try to workaround with Interlocked for more fine-grained control and better performance (but it would not be very easy, if even possible).
Note that even if you will implement this in current code - there is still no guarantee that all previous writes are actually finished i.e. for 10 files there is a possibility that the one with 0 index is not yet finished while next 9 are and the 10th will try writing to the same file as 0th is writing too. Possibly you should consider another approach (if you still want to write to multiple files, though IO does not usually scale that well, so possibly just blocking write with queue in one file is a way to go) - you can consider splitting your data in chunks and process them in parallel (i.e. "thread" per chunk) while every chunk writes to it's own file, so there is no need for sync.
Some potentially useful reading:
Overview of synchronization primitives
System.Threading.Channels
TPL Dataflow
Threading in C# by Joseph Albahari
I need to read 1M rows from an IDataReader, and write n text files simultaneously. Each of those files will be a different subset of the available columns; all n text files will be 1M lines long when complete.
Current plan is one TransformManyBlock to iterate the IDataReader, linked to a BroadcastBlock, linked to n BufferBlock/ActionBlock pairs.
What I'm trying to avoid is having my ActionBlock delegate perform a using (StreamWriter x...) { x.WriteLine(); } that would open and close every output file a million times over.
My current thought is in lieu of ActionBlock, write a custom class that implements ITargetBlock<>. Is there is a simpler approach?
EDIT 1: The discussion is of value for my current problem, but the answers so far got hyper focused on file system behavior. For the benefit of future searchers, the thrust of the question was how to build some kind of setup/teardown outside the ActionBlock delegate. This would apply to any kind of disposable that you would ordinarily wrap in a using-block.
EDIT 2: Per #Panagiotis Kanavos the executive summary of the solution is to setup the object before defining the block, then teardown the object in the block's Completion.ContinueWith.
Writing to a file one line at a time is expensive in itself even when you don't have to open the stream each time. Keeping a file stream open has other issues too, as file streams are always buffered, from the FileStream level all the way down to the file system driver, for performance reasons. You'd have to flush the stream periodically to ensure the data was written to disk.
To really improve performance you'd have to batch the records, eg with a BatchBlock. Once you do that, the cost of opening the stream becomes negligible.
The lines should be generated at the last possible moment too, to avoid generating temporary strings that will need to be garbage collected. At n*1M records, the memory and CPU overhead of those allocations and garbage collections would be severe.
Logging libraries batch log entries before writing to avoid this performance hit.
You can try something like this :
var batchBlock=new BatchBlock<Record>(1000);
var writerBlock=new ActionBlock<Record[]>( records => {
//Create or open a file for appending
using var writer=new StreamWriter(ThePath,true);
foreach(var record in records)
{
writer.WriteLine("{0} = {1} :{2}",record.Prop1, record.Prop5, record.Prop2);
}
});
batchBlock.LinkTo(writerBlock,options);
or, using asynchronous methods
var batchBlock=new BatchBlock<Record>(1000);
var writerBlock=new ActionBlock<Record[]>(async records => {
//Create or open a file for appending
await using var writer=new StreamWriter(ThePath,true);
foreach(var record in records)
{
await writer.WriteLineAsync("{0} = {1} :{2}",record.Prop1, record.Prop5, record.Prop2);
}
});
batchBlock.LinkTo(writerBlock,options);
You can adjust the batch size and the StreamWriter's buffer size for optimum performance.
Creating an actual "Block" that writes to a stream
A custom block can be created using the technique shown in the Custom Dataflow block walkthrough - instead of creating an actual custom block, create something that returns whatever is needed for LinkTo to work, in this case an ITargetBlock< T> :
ITargetBlock<Record> FileExporter(string path)
{
var writer=new StreamWriter(path,true);
var block=new ActionBlock<Record>(async msg=>{
await writer.WriteLineAsync("{0} = {1} :{2}",record.Prop1, record.Prop5, record.Prop2);
});
//Close the stream when the block completes
block.Completion.ContinueWith(_=>write.Close());
return (ITargetBlock<Record>)target;
}
...
var exporter1=CreateFileExporter(path1);
previous.LinkTo(exporter,options);
The "trick" here is that the stream is created outside the block and remains active until the block completes. It's not garbage-collected because it's used by other code. When the block completes, we need to explicitly close it, no matter what happened. block.Completion.ContinueWith(_=>write.Close()); will close the stream whether the block completed gracefully or not.
This is the same code used in the Walkthrough, to close the output BufferBlock :
target.Completion.ContinueWith(delegate
{
if (queue.Count > 0 && queue.Count < windowSize)
source.Post(queue.ToArray());
source.Complete();
});
Streams are buffered by default, so calling WriteLine doesn't mean the data will actually be written to disk. This means we don't know when the data will actually be written to the file. If the application crashes, some data may be lost.
Memory, IO and overheads
When working with 1M rows over a significant period of time, things add up. One could use eg File.AppendAllLinesAsync to write batches of lines at once, but that would result in the allocation of 1M temporary strings. At each iteration, the runtime would have to use at least as RAM for those temporary strings as the batch. RAM usage would start ballooning to hundreds of MBs, then GBs, before the GC fired freezing the threads.
With 1M rows and lots of data it's hard to debug and track data in the pipeline. If something goes wrong, things can crash very quickly. Imagine for example 1M messages stuck in one block because one message got blocked.
It's important (for sanity and performance reasons) to keep individual components in the pipeline as simple as possible.
Often when using TPL, I will make custom classes so I can create private member variables and private methods that are used for blocks in my pipeline, but instead of implementing ITargetBlock or ISourceBlock, I'll just have whatever blocks I need inside of my custom class, and then I expose an ITargetBlock and or an ISourceBlock as public properties so that other classes can use the source and target blocks to link things together.
I'm not completely new to C#, but I'm not familiar enough with the language to know how to do what I need to do.
I have a file, call it File1.txt. File1.txt has 100,000 lines or so.
I will duplicate File1.txt and call it File1_untested.txt.
I will also create an empty file "Successes.txt"
For each line in the file:
Remove this line from File1_untested.txt
If this line passes the test, write it to Successes.txt
So, my question is, how can I multithread this?
My approach so far has been to create an object (LineChecker), give the object its line to check, and pass the object into a ThreadPool. I understand how to use ThreadPools for a few tasks with a CountdownEvent. However, it seems unreasonable to queue up 100,000 tasks all at once. How can I gradually feed the pool? Maybe 1000 lines at a time or something like that.
Also, I need to ensure that no two threads are adding to Successes.txt or removing from File1_untested.txt at the same time. I can handle this with lock(), right? What should I be passing into lock()? Can I use a static member of LineChecker?
I'm just trying to get a broad understanding of how something like this can be designed.
Since the tests takes a relatively significant amount of time then it makes sense to utilize multiple CPU cores. However, such utilization should be done only for the relatively expensive test, not for reading/updating the file. This is because reading/updating the file is relatively cheap.
Here is some example code that you can use:
Assuming the you have a relatively expensive Test method:
private bool Test(string line)
{
//This test is expensive
}
Here is a code sample that can utilize multiple CPU for testing:
Here we limit the number of items in the collection to 10, so that the thread that is reading from the file will wait for the other threads to catch up before reading more lines from the file.
This input thread will read much faster than the other threads can test, so we at the worst case we will have read 10 more lines than the testing threads have done testing. This makes sure we have good memory consumption.
CancellationTokenSource cancellation_token_source = new CancellationTokenSource();
CancellationToken cancellation_token = cancellation_token_source.Token;
BlockingCollection<string> blocking_collection = new BlockingCollection<string>(10);
using (StreamReader reader = new StreamReader(new FileStream(filename, FileMode.Open, FileAccess.Read)))
{
using (
StreamWriter writer =
new StreamWriter(new FileStream(success_filename, FileMode.OpenOrCreate, FileAccess.Write)))
{
var input_task = Task.Factory.StartNew(() =>
{
try
{
while (!reader.EndOfStream)
{
if (cancellation_token.IsCancellationRequested)
return;
blocking_collection.Add(reader.ReadLine());
}
}
finally //In all cases, even in the case of an exception, we need to make sure that we mark that we have done adding to the collection so that the Parallel.ForEach loop will exit. Note that Parallel.ForEach will not exit until we call CompleteAdding
{
blocking_collection.CompleteAdding();
}
});
try
{
Parallel.ForEach(blocking_collection.GetConsumingEnumerable(), (line) =>
{
bool test_reault = Test(line);
if (test_reault)
{
lock (writer)
{
writer.WriteLine(line);
}
}
});
}
catch
{
cancellation_token_source.Cancel(); //If Paralle.ForEach throws an exception, we inform the input thread to stop
throw;
}
input_task.Wait(); //This will make sure that exceptions thrown in the input thread will be propagated here
}
}
If your "test" was fast, then multithreading would not have given you any advantage whatsoever, because your code would be 100% disk-bound, and presumably you have all of your files on the same disk: you cannot improve the throughput of a single disk with multithreading.
But since your "test" will be waiting for a response from a webserver, this means that the test is going to be slow, so there is plenty of room for improvement by multithreading. Basically, the number of threads you need depends on how many requests the webserver can be servicing simultaneously without degrading the performance of the webserver. This number might still be low, so you might end up not gaining anything, but at least you can try.
If your file is not really huge, then you can read it all at once, and write it all at once. If each line is only 80 characters long, then this means that your file is only 8 megabytes, which is peanuts, so you can read all the lines into a list, work on the list, produce another list, and in the end write out the entire list.
This will allow you to create a structure, say, MyLine which contains the index of each line and the text of each line, so that you can sort all lines before writing them, so that you do not have to worry about out-of-order responses from the server.
Then, what you need to do is use a bounding blocking queue like BlockingCollection as #Paul suggested.
BlockingCollection accepts as a constructor parameter its maximum capacity. This means that once its maximum capacity has been reached, any further attempts to add to it are blocked (the caller sits there waiting) until some items are removed. So, if you want to have up to 10 simultaneously pending requests, you would construct it as follows:
var sourceCollection = new BlockingCollection<MyLine>(10);
Your main thread will be stuffing sourceCollection with MyLine objects, and you will have 10 threads which block waiting to read MyLines from the collection. Each thread sends a request to the server, waits for a response, saves the result into a thread-safe resultCollection, and attempts to fetch the next item from sourceCollection.
Instead of using multiple threads you could instead use the async features of C#, but I am not terribly familiar with them, so I cannot advice you on precisely how you would do that.
In the end, copy the contents of resultCollection into a List, sort the list, and write it to the output file. (The copy into a separate List is probably a good idea because sorting the thread-safe resultCollection will probably be much slower than sorting a non-thread-safe List. I said probably.)
I am learning to use RX and tried this sample. But could not fix the exception that happens in the highlighted while statement - while(!f.EndofStream)
I want to read a huge file - line by line - and for every line of data - I want to do some processing in a different thread (so I used ObserverOn)
I want the whole thing async. I want to use ReadLineAsync since it returns TASK and so I can convert that to Observables and subscribe to it.
I guess the task thread which I create first, gets in between the Rx threads. But even if I use Observe and Subscribe using the currentThread, I still cannot stop the exception. Wonder how I do accomplish this neatly Aysnc with Rx.
Wondering if the whole thing could be done even simpler ?
static void Main(string[] args)
{
RxWrapper.ReadFileWithRxAsync();
Console.WriteLine("this should be called even before the file read begins");
Console.ReadLine();
}
public static async Task ReadFileWithRxAsync()
{
Task t = Task.Run(() => ReadFileWithRx());
await t;
}
public static void ReadFileWithRx()
{
string file = #"C:\FileWithLongListOfNames.txt";
using (StreamReader f = File.OpenText(file))
{
string line = string.Empty;
bool continueRead = true;
***while (!f.EndOfStream)***
{
f.ReadLineAsync()
.ToObservable()
.ObserveOn(Scheduler.Default)
.Subscribe(t =>
{
Console.WriteLine("custom code to manipulate every line data");
});
}
}
}
The exception is an InvalidOperationException - I'm not intimately familiar with the internals of FileStream, but according to the exception message this is being thrown because there is an in-flight asynchronous operation on the stream. The implication is that you must wait for any ReadLineAsync() calls to finish before checking EndOfStream.
Matthew Finlay has provided a neat re-working of your code to solve this immediate problem. However, I think it has problems of its own - and that there is a bigger issue that needs to be examined. Let's look at the fundamental elements of the problem:
You have a very large file.
You want to process it asynchronously.
This suggests that you don't want the whole file in memory, you want to be informed when the processing is done, and presumably you want to process the file as fast as possible.
Both solutions are using a thread to process each line (the ObserveOn is passing each line to a thread from the thread pool). This is actually not an efficient approach.
Looking at both solutions, there are two possibilities:
A. It takes more time on average to read a file line than it does to process it.
B. It takes less time on average to read a file line than it does to process it.
A. File read of a line slower than processing a line
In the case of A, the system will basically spend most of it's time idle while it waits for file IO to complete. In this scenario, Matthew's solution won't result in memory filling up - but it's worth seeing if using ReadLines directly in a tight loop produces better results due to less thread contention. (ObserveOn pushing the line to another thread will only buy you something if ReadLines isn't getting lines in advance of calling MoveNext - which I suspect it does - but test and see!)
B. File read of a line faster than processing a line
In the case of B (which I assume is more likely given what you have tried), all those lines will start to queue up in memory and, for a big enough file, you will end up with most of it in memory.
You should note that unless your handler is firing off asynchronous code to process a line, then all lines will be processed serially because Rx guarantees OnNext() handler invocations won't overlap.
The ReadLines() method is great because it returns an IEnumerable<string> and it's your enumeration of this that drives reading the file. However, when you call ToObservable() on this, it will enumerate as fast as possible to generate the observable events - there is no feedback (known as "backpressure" in reactive programs) in Rx to slow down this process.
The problem is not the ToObservable itself - it's the ObserveOn. ObserveOn doesn't block the OnNext() handler it is invoked on waiting until it's subscribers are done with the event - it queues up events as fast as possible against the target scheduler.
If you remove the ObserveOn, then - as long as your OnNext handler is synchronous - you'll see each line is read and processed one at a time because the ToObservable() is processing the enumeration on the same thread as the handler.
If this isn't want you want, and you attempt to mitigate this in pursuit of parallel processing by firing an async job in the subscriber - e.g. Task.Run(() => /* process line */ or similar - then things won't go as well as you hope.
Because it takes longer to process a line than read a line, you will create more and more tasks that aren't keeping pace with the incoming lines. The thread count will gradually increase and you will be starving the thread pool.
In this case, Rx isn't a great fit really.
What you probably want is a small number of worker threads (probably 1 per processor core) that fetch a line of code at a time to work on, and limit the number of lines of the file in memory.
A simple approach could be this, which limits the number of lines in memory to a fixed number of workers. It's a pull-based solution, which is a much better design in this scenario:
private Task ProcessFile(string filePath, int numberOfWorkers)
{
var lines = File.ReadLines(filePath);
var parallelOptions = new ParallelOptions {
MaxDegreeOfParallelism = numberOfWorkers
};
return Task.Run(() =>
Parallel.ForEach(lines, parallelOptions, ProcessFileLine));
}
private void ProcessFileLine(string line)
{
/* Your processing logic here */
Console.WriteLine(line);
}
And use it like this:
static void Main()
{
var processFile = ProcessFile(
#"C:\Users\james.world\Downloads\example.txt", 8);
Console.WriteLine("Processing file...");
processFile.Wait();
Console.WriteLine("Done");
}
Final Notes
There are ways of dealing with back pressure in Rx (search around SO for some discussions) - but it's not something that Rx handles well, and I think the resulting solutions are less readable than the alternative above. There are also many other approaches that you can look at (actor based approaches such as TPL Dataflows, or LMAX Disruptor style ring-buffers for high-performance lock free approaches) but the core idea of pulling work from queues will be prevalent.
Even in this analysis, I am conveniently glossing over what you are doing to process the file, and tacitly assuming that the processing of each line is compute bound and truly independent. If there is work to merge the results and/or IO activity to store the output then all bets are off - you will need to examine the efficiency of this side of things carefully too.
In most cases where performing work in parallel as an optimization is under consideration, there are usually so many variables in play that it is best to measure the results of each approach to determine what is best. And measuring is a fine art - be sure to measure realistic scenarios, take averages of many runs of each test and properly reset the environment between runs (e.g. to eliminate caching effects) in order to reduce measurement error.
I haven't looked into what is causing your exception, but I think the neatest way to write this is:
File.ReadLines(file)
.ToObservable()
.ObserveOn(Scheduler.Default)
.Subscribe(Console.Writeline);
Note: ReadLines differs from ReadAllLines in that it will start yielding without having read the entire file, which is the behavior that you want.
I have a server which handles multiple incoming socket connections and creates 2 different threads which store the data in XML format.
I was using the lock statement for thread safety almost in every event handler called asyncronously and in the 2 threads in different parts of code. Sadly using this approach my application significantly slows down.
I tried to not use lock at all and the server is very fast in execution, even the file storage seems to boost; but the program crashes for reasons I don't understand after 30sec - 1min. of work.
So. I thought that the best way is to use less locks or to use it only there where's strictly necessary. As such, I have 2 questions:
Is the lock needed when I write to the public accessed variables (C# lists) only or even when I read from them ?
Is the lock needed only in the asyncronous threads created by the socket handler or in other places too ?
Someone could give me some practical guidelines, about how to operate. I'll not post the whole code this time. It hasn't sense to post about 2500 lines of code.
You ever sit in your car or on the bus at a red light when there's no cross traffic? Big waste of time, right? A lock is like a perfect traffic light. It is always green except when there is traffic in the intersection.
Your question is "I spend too much time in traffic waiting at red lights. Should I just run the red light? Or even better, should I remove the lights entirely and just let everyone drive through the intersection at highway speeds without any intersection controls?"
If you're having a performance problem with locks then removing locks is the last thing you should do. You are waiting at that red light precisely because there is cross traffic in the intersection. Locks are extraordinarily fast if they are not contended.
You can't eliminate the light without eliminating the cross traffic first. The best solution is therefore to eliminate the cross traffic. If the lock is never contended then you'll never wait at it. Figure out why the cross traffic is spending so much time in the intersection; don't remove the light and hope there are no collisions. There will be.
If you can't do that, then adding more finely-grained locks sometimes helps. That is, maybe you have every road in town converging on the same intersection. Maybe you can split that up into two intersections, so that code can be moving through two different intersections at the same time.
Note that making the cars faster (getting a faster processor) or making the roads shorter (eliminating code path length) often makes the problem worse in multithreaded scenarios. Just as it does in real life; if the problem is gridlock then buying faster cars and driving them on shorter roads gets them to the traffic jam faster, but not out of it faster.
Is the lock needed when I write to the public accessed variables (C# lists) only or even when I read from them ?
Yes (even when you read).
Is the lock needed only in the asyncronous threads created by the socket handler or in other places too ?
Yes. Wherever code accesses a section of code which is shared, always lock.
This sounds like you may not be locking individual objects, but locking one thing for all lock situations.
If so put in smart discrete locks by creating individual unique objects which relate and lock only certain sections at a time, which don't interfere with other threads in other sections.
Here is an example:
// This class simulates the use of two different thread safe resources and how to lock them
// for thread safety but not block other threads getting different resources.
public class SmartLocking
{
private string StrResource1 { get; set; }
private string StrResource2 { get; set; }
private object _Lock1 = new object();
private object _Lock2 = new object();
public void DoWorkOn1( string change )
{
lock (_Lock1)
{
_Resource1 = change;
}
}
public void DoWorkOn2( string change2 )
{
lock (_Lock2)
{
_Resource2 = change2;
}
}
}
Always use a lock when you access members (either read or write). If you are iterating over a collection, and from another thread you're removing items, things can go wrong quickly.
A suggestion is when you want to iterate a collection, copy all the items to a new collection and then iterate the copy. I.e.
var newcollection; // Initialize etc.
lock(mycollection)
{
// Copy from mycollection to newcollection
}
foreach(var item in newcollection)
{
// Do stuff
}
Likewise, only use the lock the moment you are actually writing to the list.
The reason that you need to lock while reading is:
let's say you are making change to one property and it has being read twice while the thread is inbetween a lock. Once right before we made any change and another after, then we will have inconsistent results.
I hope that helps,
Basically this can be answered pretty simple:
You need to lock all the things that are accessed by different threads. It actually doesnt really matter if its about reading or writing. If you are reading and another thread is overwriting the data at the same time the data read may get invalid and you possibly are performing invalid operations.