I am about to implement the archetypal FileSystemWatcher solution. I have a directory to monitor for file creations, and the task of sucking up created files and inserting the into a DB. Roughly this will involve reading and processing 6 or 7, 80 char text files that appear at a rate of 150mS in bursts that occur every couple of seconds, and rarely a 2MB binary file will also have to be processed. This will most likely be a 24/7 process.
From what I have read about the FileSystemWatcher object it is better to enqueue its events in one thread and then dequeue/process them in another thread. The quandary I have right now is what would be the better creation mechanism of the thread that does the processing. The choices I can see are:
Each time I get a FSW event I manually create a new thread (yeah I know .. stupid architecture, but I had to say it).
Throw the processing at the CLR thread pool whenever I get an FSW event
On start up, create a dedicated second thread for the processing and use a producer/consumer model to handle the work. The main thread enqueues the request and the second thread dequeues it and performs the work.
I am tending towards the third method as the preferred one as I know the work thread will always be required - and also probably more so because I have no feel for the thread pool.
If you know that the second thread will always be required, and you also know that you'll never need more than one worker thread, then option three is good enough.
The third option is the most logical.
In regards to FSW missing some file events, I implemented this:
1) FSW Object which fires on FileCreate
2) tmrFileCheck, ticks = 5000 (5 seconds)
- Calls tmrFileChec_Tick
When the FileCreate event occurs, if (tmrFileCheck.Enabled == false) then tmrFileCheck.Start()
This way, after 10 seconds tmrFileCheck_Tick fires which
a) tmrFileCheck.Stop()
b) CheckForStragglerFiles
Of tests I've run, this works effectively where there are a < 100 files created per minute.
A variant is to merely have a timer tick ever NN seconds and sweep the directory(ies) for straggler files.
Another variant is to hire me to press F5 to refresh the window and call you when there are straggler files; just a suggestion. :-P
Just be aware that FileSystemWatcher may miss events, there's no guarantee it will deliver all specific events that have transpired. Your design of keeping the work done by the thread receiving events to a minimum, should reduce the chances of that happening, but it is still a possibility, given the finite event buffer size (tops out at 64KB).
I would highly recommend developing a battery of torture tests if you decide to use FileSystemWatcher.
In our testing, we encountered issues with network locations, that changing the InternalBufferSize did not fix, yet when we encountered this scenario, we did not receive Error event notifications either.
Thus, we developed our own polling mechanism for doing so, using Directory.GetFiles, followed by comparing the state of the returned files with the previously polled state, ensuring we always had an accurate delta.
Of course, this comes at a substantial cost in performance, which may not be good enough for you.
Related
I wrote some code that mass imports a high volume of users into AD. To refrain from overloading the server, I put a thread.sleep() in the code, executed at every iteration.
Is this a good use of the method, or is there a better alternative (.NET 4.0 applies here)?
Does Thread.Sleep() even aid in performance? What is the cost and performance impact of sleeping a thread?
The Thread.Sleep() method will just put the thread in a pause state for the specified amount of time. I could tell you there are 3 different ways to achieve the same Sleep() calling the method from three different Types. They all have different features. Anyway most important, if you use Sleep() on the main UI thread, it will stop processing messages during that pause and the GUI will look locked. You need to use a BackgroundWorker to run the job you need to sleep.
My opinion is to use the Thread.Sleep() method and just follow my previous advice. In your specific case I guess you'll have no issues. If you put some efforts looking for the same exact topic on SO, I'm sure you'll find much better explanations about what I just summarized before.
If you have no way to receive a feedback from the called service, like it would happen on a typical event driven system (talking in abstract..we could also say callback or any information to understand how the service is affected by your call), the Sleep may be the way to go.
I think that Thread.Sleep is one way to handle this; #cHao is correct that using a timer would allow you to do this in another fashion. Essentially, you're trying to cut down number of commands sent to the AD server over a period of time.
In using timers, you're going to need to devise a way to detect trouble (that's more intuitive than a try/catch). For instance, if your server starts stalling and responding slower, you're going to continue stacking commands that the server can't handle (which may cascade in other errors).
When working with AD I've seen the Domain Controller freak out when too many commands come in (similar to a DOS attack) and bring the server to a crawl or crash. I think by using the sleep method you're creating a manageable and measurable flow.
In this instance, using a thread with a low priority may slow it down, but not to any controllable level. The thread priority will only be a factor on the machine sending the commands, not to the server having to process them.
Hope this helps; cheers!
If what you want is not overload the server you can just reduce the priority of the thread.
Thread.Sleep() do not consume any resources. However, the correct way to do this is set the priority of thread to a value below than Normal: Thread.Current.Priority = ThreadPriority.Lowest for example.
Thread.Sleep is not that "evil, do not do it ever", but maybe (just maybe) the fact that you need to use it reflects some lack on solution design. But this is not a rule at all.
Personally I never find a situation where I have to use Thread.Sleep.
Right now I'm working on an ASP.NET MVC application that uses a background thread to load a lot of data from database into a memory cache and after that write some data to the database.
The only feature I have used to prevent this thread to eat all my webserver and db processors was reduce the thread priority to the Lowest level. That thread will get about to 35 minutes to conclude all the operations instead of 7 minutes if a use a Normal priority thread. By the end of process, thread will have done about 230k selects to the database server, but this do not has affected my database or webserver performance in a perceptive way for the user.
tip: remember to set the priority back to Normal if you are using a thread from ThreadPool.
Here you can read about Thread.Priority:
http://msdn.microsoft.com/en-us/library/system.threading.thread.priority.aspx
Here a good article about why not use Thread.Sleep in production environment:
http://msmvps.com/blogs/peterritchie/archive/2007/04/26/thread-sleep-is-a-sign-of-a-poorly-designed-program.aspx
EDIT Like others said here, maybe just reduce your thread priority will not prevent the thread to send a large number of commands/data to AD. Maybe you'll get better results if you rethink all the thing and use timers or something like that. I personally think that reduce priority could resolve your problem, although I think you need to do some tests using your data to see what happens to your server and other servers involved in the process.
You could schedule the thread at BelowNormal priority instead. That said, that could potentially lead to your task never running if something else overloads the server. (Assuming Windows scheduling works the way the documentation on scheduling threads mentions for "some operating systems".)
That said, you said you're moving data into AD. If it's over the nework, it's entirely possible the CPU impact of your code will be negligible compared to I/O and processing on the AD side.
I don't see any issue with it except that during the time you put the thread to sleep then that thread will not be responsive. If that is your main thread then your GUI will become non responsive. If it is a background thread then you won't be able to communicate with it (eg to cancel it). If the time you sleep is short then it shouldn't matter.
I don't think reducing the priority of the thread will help as 1) your code might not even be running on the server and 2) most of the work being done by the server is probably not going to be on your thread anyway.
Thread.sleep does not aid performance (unless your thread has to wait for some resource). It incurs at least some overhead, and the amount of time that you sleep for is not guaranteed. The OS can decide to have your Thread sleep longer than the amount of time you specify.
As such, it would make more sense to do a significant batch of work between calls to Thread.Sleep().
Thread.Sleep() is a CPU-less wait state. Its overhead should be pretty minimal. If execute Thread.Sleep(0), you don't [necessarily] sleep, but you voluntarily surrender your time slice so the scheduler can let lower priority thread run.
You can also lower your thread's priority by setting Thread.Priority.
Another way of throttling your task is to use a Timer:
// instantiate a timer that 'ticks' 10 times per second (your ideal rate might be different)
Timer timer = new Timer( ImportUserIntoActiveDirectory , null , 0 , 100 ) ;
where ImportUserIntoActiveDirectory is an event handler that will import just user into AD:
private void ImportUserIntoActiveDirectory( object state )
{
// import just one user into AD
return
}
This lets you dial things in. The event handler is called on thread pool worker threads, so you don't tie up your primary thread. Let the OS do the work for you: all you do is decide on your target transaction rate.
We need to write some data into a file at about 100ms interval for about 30 minutes and then wait for a while, and repeat again. Our application is a C#, .net 3.5 application. The data is small for each write , less than 1MB. Right now we get a thread from the Threadpool and let that thread to write into the file each time when a new data is received (at about 100ms interval).
There is another way to do this, I think. We can get a thread from Threadpool at beginning and keep that thread running during the entire writing session. When that thread finished a write, let it wait for next signal to get updated data from a shared place and write again. The downside of this way is we need to synchroized the shared data object to make sure it will not be overwriten by the the new data if the writting is slower. then it may slow down the communication which the data is transferred from another system.
I don't have time to write code test them yet. do you think it is worthy to test them? or it is obviously one way is better than another?
You can use a producer/consumer pattern, so a thread can have the file locked all the time and you don't need to open and close the FileStream.
Here an example: http://www.yoda.arachsys.com/csharp/threads/deadlocks.shtml
(in the "More Monitor methods" section)
If you think that you might have contention writing to your file, then a single thread writer is a good idea. Otherwise, there is no real problem dispatching a single operation to a miscellanous thread imo.
I have queue of tasks for the ThreadPool, and each task has a tendency to froze locking up all the resources it is using. And these cant be released unless the service is restarted.
Is there a way in the ThreadPool to know that its thread is already frozen? I have an idea of using a time out, (though i still dont know how to write it), but i think its not safe because the length of time for processing is not uniform.
I don't want to be too presumptuous here, but a good dose of actually finding out what the problem is and fixing it is the best course with deadlocks.
Run a debug version of your service and wait until it deadlocks. It will stay deadlocked as this is a wonderful property of deadlocks.
Attach the Visual Studio debugger to the service.
"Break All".
Bring up your threads windows, and start spelunking...
Unless you have a sound architecture\design\reason to choose victims in the first place, don't do it - period. It's pretty much a recipe for disaster to arbitrarily bash threads over the head when they're in the middle of something.
(This is perhaps a bit lowlevel, but at least it is a simple solution. As I don't know C#'s API, this is a general solution for any language using thread-pools.)
Insert a watchdog task after each real task that updates a time value with the current time. If this value is larger than you max task run time (say 10 seconds), you know that something is stuck.
Instead of setting a time and polling it, you could continuously set and reset some timers 10 secs into the future. When it triggers, a task has hung.
The best way is probably to wrap each task in a "Watchdog" Task class that does this automatically. That way, upon completion, you'd clear the timer, and you could also set a per-task timeout, which might be useful.
You obviously need one time/timer object for each thread in the threadpool, but that's solvable via thread-local variables.
Note that this solution does not require you to modify your tasks' code. It only modifies the code putting tasks into the pool.
One way is to use a watchdog timer (a solution usually done in hardware but applicable to software as well).
Have each thread set a thread-specific value to 1 at least once every five seconds (for example).
Then your watchdog timer wakes every ten seconds (again, this is an example figure only) and checks to ensure that all the values are 1. If they're not 1, then a thread has locked up.
The watchdog timer then sets them all to 0 and goes back to sleep for the next cycle.
Providing your worker threads are written in such a way so that they will be able to set the values in a timely manner under non-frozen conditions, this scheme will work okay.
The first thread that locks up will not set its value to 1, and this will be detected by the watchdog timer on the next cycle.
However, a better solution is to find out why the threads are freezing in the first place and fix that.
I'm working on my first ThreadPool application in Visual Studio 2008 with C#.
I have a report that has to perform calculations on 2000 to 4000 parts using data on our SQL Server.
I am Queuing all of the part numbers in a ThreadPool, where they go off and calculate their results. When these threads are finished, the RegisterWaitForSingleObject event fires to Unregister the Queued Item's Handle.
After all of the Queued Items have finished, is there a way to remove them from the ThreadPool?
The way it looks, if someone runs another report using a new set of 2000 to 4000 parts, I have no way of removing the previous array of parts.
How would I remove the Previously Queued Items? Would calling SetMaxThreads with workerThreads = 0 do it?
I realize I could experiment, but then I could waste most of the week experimenting.
Thanks for your time,
Joe
Once a ThreadPool item completes, it is automatically removed from the queue. What is indicating to you that they aren't?
Assuming you mean to interrupt (cancel) the work on the current queue...
Changing the max-threads won't affect the pending work; it'll just change the number of threads available to do it - and it is generally a bad idea to mess with this (your code isn't the only thing using the ThreadPool). I would use a custom queue - it is fairly easy to write a basic (thread-safe) producer/consumer queue, or .NET 4.0 includes some very good custom thread queues.
Then you can just abort the custom queue and start a new one.
I wrote a simple one here; at the moment it wants to exit cleanly (i.e. drain the queue until it is empty) before terminating, but it would be easy enough to add a flag to stop immediately after the current item (don't resort to interrupting/aborting threads at an arbitrary point in execution; never a good idea).
When using APIs handling asynchronous events in .Net I find myself unable to predict how the library will scale for large numbers of objects.
For example, using the Microsoft.Office.Interop.UccApi library, when I create an endpoint it gets events when phone events happen. Now let's say I want to create 1000 endpoints. The number of events per endpoint is small, but is what's happening behind the scenes in the API able to keep up with the event flow? I don't know because it never says how it's architected.
Let's say I want to create all 1000 objects in the main thread. Then I want to put the Login method into a large thread pool so all objects login in parallel. Then once all the objects have logged in the next phase will begin.
Are the event callbacks the API raises happening in the original creating thread? A separate threadpool? Or the same threadpool I'm accessing with ThreadPool.QueueUserWorkItem?
Would I be better putting each object in it's own thread? Grouping a few objects in each thread? Or is it fine just creating all 1000 objects in the main thread and through .Net magic it will all be OK?
thanx
The events from interop assemblies are just wrappers around the COM connection points. The thread on which the call from the connection point arrive depends on the threading model of the object that advised on that connection point. COM will ensure the proper thread switching for this.
If your objects are implemented on the main thread, which in .Net is usually an STA, all events should arrive on that same thread. If you want your calls to arrive on a random thread from the COM thread pool (which I think is the same as the CLR thread pool), you need to create your objects on a thread that is configured as an MTA.
I would strongly advise against creating a thread for each object: 1) If you create these threads as STA, each of them will have a message queue, waisting system resource; 2) If you create them as MTA, nothing guarantees you the event call will arrive on your thread; 3) You'll have 1000 idle threads doing nothing and just waiting on an event to shutdown; and 4) Starting up and shutting down all these threads will have terrible perf cost on your application.
It really depends on a lot of things, primarily how powerful your hardware is. The threadpool does have a certain number of threads (which you can increase) that it will make available for your application. So if all of your events are firing at the same time some will most likely be waiting for a few moments while your threadpool waits for threads to become free again. The tradeoff is that you don't have the performance hit of creating new threads all the time either. Probably creating 1000 threads isn't the right answer either.
It may turn out that this is ideal, both because of the performance gains in reusing threads but also because having 1000 threads all running simultaneously might be more memory / CPU usage than it's worth.
I just wanted to note that in .NET 2.0 and greater it's possible to programmatically increase the maximum number of threads in the thread pool using ThreadPool.SetMaxThreads(). Given this you can put a hard cap on the number of threads and so ensure the scheduler won't be brought to it's knees by the overhead.
Even more useful in this sort of case, you can set the minimum number of threads with ThreadPool.SetMinThreads(). With this you can ensure that you only pay the "horrible performance price" Franci is talking about once, at application startup. You could balance this against the expected number peak of users and so ensure you won't be creating tons of new threads.
A single new thread creation won't destroy you. What I would be worried about is the case where a lot of threads need to be created at the same time. If you can say that this will only happen at startup you would be golden.