which way is better in a frequently write case with thread? - c#

We need to write some data into a file at about 100ms interval for about 30 minutes and then wait for a while, and repeat again. Our application is a C#, .net 3.5 application. The data is small for each write , less than 1MB. Right now we get a thread from the Threadpool and let that thread to write into the file each time when a new data is received (at about 100ms interval).
There is another way to do this, I think. We can get a thread from Threadpool at beginning and keep that thread running during the entire writing session. When that thread finished a write, let it wait for next signal to get updated data from a shared place and write again. The downside of this way is we need to synchroized the shared data object to make sure it will not be overwriten by the the new data if the writting is slower. then it may slow down the communication which the data is transferred from another system.
I don't have time to write code test them yet. do you think it is worthy to test them? or it is obviously one way is better than another?

You can use a producer/consumer pattern, so a thread can have the file locked all the time and you don't need to open and close the FileStream.
Here an example: http://www.yoda.arachsys.com/csharp/threads/deadlocks.shtml
(in the "More Monitor methods" section)

If you think that you might have contention writing to your file, then a single thread writer is a good idea. Otherwise, there is no real problem dispatching a single operation to a miscellanous thread imo.

Related

How do I ensure that any of the threads are not waiting for something indefinitely?

I'm in process of writing a multi threaded application. Here's my case.
I grab a thousand records from database. Divide it into 5 chunks of list objects. and create 5 threads to process them. I do this same thing every minute until I have records remaining in the database
Task.Factory.StartNew(() => ProcessRecords(listRecords))
Inside ProcessRecords method, there is a small database update and some send mail takes place. (I'm using System.Net.Mail for email and don't use any ORM for db operation.)
Now I am worried that a thread might not complete because of some unknown issues. What will happen in that situation? Lets say one process (or even more) keeps on waiting for a deadlock in the database or something, what will happen to my application. It will keep on adding new threads with new set of records while some never ending threads. How can I implement something like timeout in this situation?
I want to run this process, terminate it in 5 minutes if it is not able to complete it.
Check out something called a TaskCancellationToken. You can use that to kill the task if you decide (by whatever means you prefer) that it's been running too long.
Alternatively, you could build that into the ProcessRecords() method itself: just have it commit seppuku if it runs too long by having it track its own start time and checking the elapsed time now and then; could be simpler.
That said, if you haven't already given it a shot, you might check to see whether .AsParallel() will save you some headaches here. There are a lot of cases where you can leave your parallelization woes to the compiler entirely.
Parallel.ForEach(db.Records, r => ProcessRecord(r));
Edit:
Parallel.ForEach(db.Records, ProcessRecord);
Yes. :)
Further edit:
For the OP, no, the TaskFactory doesn't offer anything like that out of the box. If you want to terminate the process from outside the process, you'll need to roll your own mechanism using some kind of a watcher thread to keep track of which tasks you have running, how long they've been running, and their respective cancellation tokens (or maybe just a bool you have at the top of a while loop...).

Will using multiple threads speed up my HTML file processing application?

I just finished up my most complex and feature-laden WinForms application to date. It loads a list any number of HTML files, then loads the content of one, uses some RegEx to match some tags and remove or replace them (yes, yes, I've seen this. It works just fine, thanks Cthulu), then writes it to disk.
However, I noticed that ~200 files takes roughly 30 seconds to process, and after the first 5-10 seconds the program is reported as "Not Responding". I'm assuming it's not wise to do something like this guy did, as the hard drive is a bottleneck.
Perhaps it'd be possible to load as many as possible into memory, then process each one with a thread, write those, then load some more into memory?
At the very least, would creating a worker thread separate from the UI thread prevent the "Not Responding" issue? (This MSDN article covers what I was considering.)
I guess I'm asking if multithreading will offer any sort of speed improvement, and if so, what would be the best way of going about it?
Any help or advice is much appreciated!
Yes, you should start by using a Backgroundworker to decouple your work from the GUI. Handling a GUI event should never take too much time. Aim for 20ms, not 20s.
Then as a bonus you could see if the processing (CPU intensive part) can be split into independent jobs and execute them as TPL Tasks.
There is insufficient information to say if or how you should do that.
Threading jobs, tasks, etc. will, in most cases, prevent the primary, or main thread from becoming non-responsive. Do not create multiple threads for disk IO (obviously). I would dedicate a single worker thread to taking your files off a queue and processing the disk IO. Otherwise, 1 or 2 worker threads to do in-memory processing should be sufficient while your main thread can remain responsive.
First of all, if you want the program to remain responsive move the calculations to a separate thread (remove it from the UI thread).
The actual performance improve depends on the number of processors you have, not the number of threads.
So if you have P threads, you can divide the work to P work items and get some work improvement. (Amdahl's Law)
You can use BackgroundWorker to divide the work properly. : C# BackgroundWorker Tutorial
Why not use StreamReader.ReadAllLines() to read each file into an array, and then process each element of the array?
If you do all your processing in the GUI-thread, your application will show the 'not responding' if it takes very long. In my opinion, you should try to never do (extensive) processing actions in the same thread as your GUI.
In addition, you could even just create a thread for each file to be processed. This will most likeley speed things up, as long as the seperate threads do not need any data from eachother.

Creating an image from webcam every x milliseconds

I am using c# to integrate with a web cam. I need to generate a snapshot image every x milliseconds and save it to file.
I already have the code up and running to save to file on a button click event, however I wonder what am I supposed to do when taking snapshots in the background - Should this be multi threaded? I'm honestly not sure.
I could just block the UI thread, put Thread.Sleep and then just take the snapshot, but I don't know if this is right.
I thought of using a background worker, but I am now experiencing cross threaded difficulties with SendMessage... So I wonder if I should even go and bother to multi-thread or just block the UI.
There will be a physical hardware limit to how fast the camera can update its pixel buffer. Webcams don't go far above 30fps. Getting the actual image should be more or less instantaneous (unless at very high res), so you would not require threading to start off with. When I did it a while ago I used the approach as given on
http://weblogs.asp.net/nleghari/pages/webcam.aspx
I think you should put this task on a separate thread. The process of creating and saving the image may take more time is some situations and at that time your HMI may freeze. To avoid this put this task on a separate thread.
You could create a timer to kick a delegate every n milliseconds and that delegate could queue a worker thread to do what your OnClick() handler does already.
I would NOT write this as a single-threaded app because, depending on the performance of the user's webcam, you could easily end up in an eternal loop handling timer events, causing your main UI thread to be permanently blocked.
ThreadQueue.QueueUserWorkitem((args) =>
{
// Blah ...
}
should not require much effort to get working correctly.

After FileSystemWatcher fires - Thread Pool or Dedicated thread?

I am about to implement the archetypal FileSystemWatcher solution. I have a directory to monitor for file creations, and the task of sucking up created files and inserting the into a DB. Roughly this will involve reading and processing 6 or 7, 80 char text files that appear at a rate of 150mS in bursts that occur every couple of seconds, and rarely a 2MB binary file will also have to be processed. This will most likely be a 24/7 process.
From what I have read about the FileSystemWatcher object it is better to enqueue its events in one thread and then dequeue/process them in another thread. The quandary I have right now is what would be the better creation mechanism of the thread that does the processing. The choices I can see are:
Each time I get a FSW event I manually create a new thread (yeah I know .. stupid architecture, but I had to say it).
Throw the processing at the CLR thread pool whenever I get an FSW event
On start up, create a dedicated second thread for the processing and use a producer/consumer model to handle the work. The main thread enqueues the request and the second thread dequeues it and performs the work.
I am tending towards the third method as the preferred one as I know the work thread will always be required - and also probably more so because I have no feel for the thread pool.
If you know that the second thread will always be required, and you also know that you'll never need more than one worker thread, then option three is good enough.
The third option is the most logical.
In regards to FSW missing some file events, I implemented this:
1) FSW Object which fires on FileCreate
2) tmrFileCheck, ticks = 5000 (5 seconds)
- Calls tmrFileChec_Tick
When the FileCreate event occurs, if (tmrFileCheck.Enabled == false) then tmrFileCheck.Start()
This way, after 10 seconds tmrFileCheck_Tick fires which
a) tmrFileCheck.Stop()
b) CheckForStragglerFiles
Of tests I've run, this works effectively where there are a < 100 files created per minute.
A variant is to merely have a timer tick ever NN seconds and sweep the directory(ies) for straggler files.
Another variant is to hire me to press F5 to refresh the window and call you when there are straggler files; just a suggestion. :-P
Just be aware that FileSystemWatcher may miss events, there's no guarantee it will deliver all specific events that have transpired. Your design of keeping the work done by the thread receiving events to a minimum, should reduce the chances of that happening, but it is still a possibility, given the finite event buffer size (tops out at 64KB).
I would highly recommend developing a battery of torture tests if you decide to use FileSystemWatcher.
In our testing, we encountered issues with network locations, that changing the InternalBufferSize did not fix, yet when we encountered this scenario, we did not receive Error event notifications either.
Thus, we developed our own polling mechanism for doing so, using Directory.GetFiles, followed by comparing the state of the returned files with the previously polled state, ensuring we always had an accurate delta.
Of course, this comes at a substantial cost in performance, which may not be good enough for you.

Create new threads or get more work for threads

I've got a program I'm creating(in C#) and I see two approaches..
1) A job manager that waits for any number of X threads to finish, when finished it gets the next chunk of work and creates a new thread and gives it that chunk
or
2) We create X threads to start, give them each a chunk of work, and when a thread finishes a chunk its asks the job manager for more work. If there isn't any more work it sleeps and then asks again, with the sleep becoming progressively longer.
This program will be a run and done, tho I could see it turning into a service that continually looks for more jobs.
Each chunk will consists of a number of data ids, a call to the database to get some info or perform an operation on the data id, and then writing to the database info on the data id.
Assuming you are aware of the additional precautions that need to be taken when dealing with multithreaded database operations, it sounds like you're describing two different scenarios. In the first, you have several threads running, and once ALL of them finish it will look for new work. In the second, you have several threads running and their operations are completely parallel. Your environment is going to be what determines the proper approach to take; if there is something tying all of the work in the several threads where additional work cannot continue until all of them are finished, then with the former. If they don't have much affect on each other, go with the latter.
The second option isn't really right, as making the sleep time progressively longer means that you will unnecessarily keep those threads blocked.
Rather, you should have a pooled set of threads like the second option, but they use WaitHandles to wait for work and use a producer/consumer pattern. Basically, when the producer indicates that there is work, it sends a signal to a consumer (there will be a manager which will determine which thread will get the work, and then signal that thread) which will wake up and start working.
You might want to look into the Parallel Task Library. It's in beta now, but if you can use it and are comfortable with it, I would recommend it, as it will manage a great deal of this for you (and much better, taking into account the number of cores on a machine, the optimal number of threads, etc, etc).
The former solution (spawn a thread for each new piece of work), is easier to code, and not too bad, if the units of work are large enough.
The second solution (thread-pool, with a queue of work), is more complicated to code, but supports smaller units of work.
Instead of rolling your own solution, you should look at the ThreadPool class in the .NET framework. You could use the QueueUserWorkItem method. It should do exactly what you want to accomplish.

Categories