Monitoring a multi-threaded application

Monitoring a multi-threaded application - c#

I am writing a multi-thread application (using C#) where the job of each thread is to insert some data into database. As soon the thread completes its job of inserting data into database it becomes free (i.e. ready to insert another data into database). All the threads are reading data from a queue.
The problem is, how to monitor which thread has completed its current job and ready to take second job? Whether we can use C# task instead of thread and how?
Please note every thread is inserting data to the same database.

The problem is, how to monitor which thread has completed its current job and
ready to take second job?
Why would you do that? Threads created are looping until there is no data. If there is no thread (or less than wanted) and data arrives, start a new thread/task. Actually uses tasks. There is no need to monitor them. This would be ridiculously inefficient.
Whether we can use C# task instead of thread and how?
Yes, and it is as simple as "look up how to start a task, which google has an answer for". That said, your architecture likely needs adjustment - doing too many things in parallel will only waste memory, rather limit the number of active threads/tasks to a specific number.

In my opinion you should use only Task not "Task in Thread".
Tasks are more flexible and already implemented robustly. In your case you can create an Task[] (with 10 Tasks if you want) and to know if the task has completed his work you can check the Task.Result value if you have declared Task<\TResult> objects.
in this way you can have direct control of the processes during the asynchronous execution including exception handling.

Related

Thread management console program with continuous (endless) tasks

I'm new to threading so have some patience please.
I have tens of thousands of rows in a database. Each row represents a job needed to be done over the internet. I read a data row, I do some network-related work (which can even take between a couple of seconds up to a couple of minutes) and I grab the next data row (my C# application uses console, not GUI). As you might expect I want to do these jobs concurrently.
I looked into this subject and I thought I would use BackgroundThreads, but if I understand correctly people suggest there is no point in using them in a console application.
I assume I should not use Tasks, because each of my "tasks" will be represented by a single thread.
So I thought I would use ThreadPool with regular Threads.
To make things simple I just want to keep a constant number of threads (spawn new ones when one finishes) untill I run out of things to do (then I wait for data - usually alot of it - to arrive in the database and spawn threads). I need to know when a Thread ends because I have to spawn a new thread and update the database row containing data it was working with. To keep threads and database in sync I would probably have to mark database row with some kind of thread id when it is retrieved and then mark the row (success/fail) when thread ends.
Is this solution (try catch in thread delegate) enough to be sure that a thread has ended (and if it succeded or threw exception)?
I am not sure how to "wait" for the first thread to end - not all and not a particular one.
I also think that I don't want to read too much data in advance (and potentially wait for a thread to free up) because there might be other programs doing the same thing using the same database.
Any ideas appreciated!

Just use Parallel.ForEach to do this:
Parallel.ForEach(rows, row => ProcessRow(row));
If you need to specify a max degree of parallelization because the automatic partitioner happens to be using too many thread pool threads then you can specify it like so:
Parallel.ForEach(rows, new ParallelOptions() { MaxDegreeOfParallelism = 5 }
, row => ProcessRow(row));

The "bag of tasks" concept in C#, enqueue,pause,cancel logical tasks

The app I'm developing is composed this way:
A producer task scan the file system for text files and put a reference to them in a bag.
Many consumer tasks take file refs from the bag concurrently and read the files (and do some short work with their content)
I must be able to pause and resume the whole process.
I've tried using TPL, creating a task for every file ref as they are put in the bag (in this case the bag is just a concept, the producer directly create the consumers task as it find files) but this way I don't have control over the task I create, I can't (or I don't know how to) pause them. I can write some code to suspend the thread currently executing the task but that will ruin the point of working with logical tasks instead of manully creating threads wouldn't it? I would want something like "task already assigned to phisical thread can complete but waiting logical tasks should not start until resume command"
How can I achive this? Can it be done with TPL or should I use something else?
EDIT:
Your answers are all valid but my main doubt remains unanswered. We are talking about tasks, if I use TPL my producer and my many consumer will be tasks (right?) not threads (well, ok at the moment of the execution tasks will be mapped on threads). Every synchronization mechanism i've found (like the one proposed in the comment "ManualResetEventSlim") work at thread level.
E.g. the description of the Wait() method of "ManualResetEventSlim" is "Blocks the current thread until the current ManualResetEventSlim is set."
My knowledge of task is purely academic, I don't know how things works in the "real world" but it seem logical to me that I need a way to coordinate (wait/signal/...) tasks at task level or things could get weird... like... two task may be mapped on the same thread but one was supposed to signal the other that was waiting then deadlock. I'm a bit confused. This is why I asked if my app could use TPL instead of old style simple threads.

Yes, you can do that. First, you have a main thread, your application. There you have two workers, represented by threads. The first worker would be a producer and the second worker would be a consumer.
When your application starts, you start the workers. Both of them operates on the concurrency collection, the bag. Producer searches for files and puts references to the bag and consumer takes references from the bag and starts a task per reference.
When you want to signal pause, simply pause the producer. If you do that, consumer also stops working if there is nothing in the bag. If this is not a desired behaviour, you can simply define that pausing of the producer also clears the bag - backup your bag first and than clear it. This way all running tasks will finish their job and consumer will not start new tasks, but it can still run and wait for the results.
EDIT:
Based on your edit. I don't know how to achieve it the way you want, but although it is nice try to use new technologies, don't let your mind be clouded. Using a ThreadPool is also nice thing. It will take more time to start the application, but once it is running, consuming will be faster, because you already have workers ready.
It is not a bad idea, you can specify a maximum number of workers. If you create a task for every item in the bag, it will be more memory-consuming because you will still allocate and release memory. This will not happen with ThreadPool.

Sure you can use TPL for this. And may be also reactive extensions and LINQ to simplify grouping and pausing/resuming the thread works.
If you have just a short job on each file, it is pretty good idea to not to disturb the handler function with cancellations. You can just suspend queueing the workers instead.
I imagine something like this:
You directory scanner thread puts the found files into an observable collection.
The consumer thread subscribes the collection changes and gets/removes the files and assigns them to workers.

Resource usage of ThreadPool RegisterWaitForSingleObject

I am writing a server application which processes request from multiple clients. For the processing of requests I am using the threadpool.
Some of these requests modify a database record, and I want to restrict the access to that specific record to one threadpool thread at a time. For this I am using named semaphores (other processes are also accessing these records).
For each new request that wants to modify a record, the thread should wait in line for its turn.
And this is where the question comes in:
As I don't want the threadpool to fill up with threads waiting for access to a record, I found the RegisterWaitForSingleObject method in the threadpool.
But when I read the documentation (MSDN) under the section Remarks:
New wait threads are created automatically when required. ...
Does this mean that the threadpool will fill up with wait-threads? And how does this affect the performance of the threadpool?
Any other suggestions to boost performance is more than welcome!
Thanks!

Your solution is a viable option. In the absence of more specific details I do not think I can offer other tangible options. However, let me try to illustrate why I think your current solution is, at the very least, based on sound theory.
Lets say you have 64 requests that came in simultaneously. It is reasonable to assume that the thread pool could dispatch each one of those requests to a thread immediately. So you might have 64 threads that immediately begin processing. Now lets assume that the mutex has already been acquired by another thread and it is held for a really long time. That means those 64 threads will be blocked for a long time waiting for the thread that currently owns the mutex to release it. That means those 64 threads are wasted on doing nothing.
On the other hand, if you choose to use RegisterWaitForSingleObject as opposed to using a blocking call to wait for the mutex to be released then you can immediately release those 64 waiting threads (work items) and allow them to be put back into the pool. If I were to implement my own version of RegisterWaitForSingleObject then I would use the WaitHandle.WaitAny method which allows me to specify up to 64 handles (I did not randomly choose 64 for the number of requests afterall) in a single blocking method call. I am not saying it would be easy, but I could replace my 64 waiting threads for only a single thread from the pool. I do not know how Microsoft implemented the RegisterWaitForSingleObject method, but I am guessing they did it in a manner that is at least as efficient as my strategy. To put this another way, you should be able to reduce the number of pending work items in the thread pool by at least a factor of 64 by using RegisterWaitForSingleObject.
So you see, your solution is based on sound theory. I am not saying that your solution is optimal, but I do believe your concern is unwarranted in regards to the specific question asked.

IMHO you should let the database do its own synchronization. All you need to do is to ensure that you're sync'ed within your process.
Interlocked class might be a premature optimization that is too complex to implement. I would recommend using higher-level sync objects, such as ReaderWriterLockSlim. Or better yet, a Monitor.

An approach to this problem that I've used before is to have the first thread that gets one of these work items be responsible for any other ones that occur while it's processing the work item(s), This is done by queueing the work items then dropping into a critical section to process the queue. Only the 'first' thread will drop into the critical section. If a thread can't get the critical section, it'll leave and let the thread already operating in the critical section handle the queued object.
It's really not very complicated - the only thing that might not be obvious is that when leaving the critical section, the processing thread has to do it in a way that doesn't potentially leave a late-arriving workitem on the queue. Basically, the 'processing' critical section lock has to be released while holding the queue lock. If not for this one requirement, a synchronized queue would be sufficient, and the code would really be simple!
Pseudo code:
// `workitem` is an object that contains the database modification request
//
// `queue` is a Queue<T> that can hold these workitem requests
//
// `processing_lock` is an object use to provide a lock
// to indicate a thread is processing the queue
// any number of threads can call this function, but only one
// will end up processing all the workitems.
//
// The other threads will simply drop the workitem in the queue
// and leave
void threadpoolHandleDatabaseUpdateRequest(workitem)
{
// put the workitem on a queue
Monitor.Enter(queue.SyncRoot);
queue.Enqueue(workitem);
Monitor.Exit(queue.SyncRoot);
bool doProcessing;
Monitor.TryEnter(processing_queue, doProcessing);
if (!doProcessing) {
// another thread has the processing lock, it'll
// handle the workitem
return;
}
for (;;) {
Monitor.Enter(queue.SyncRoot);
if (queue.Count() == 0) {
// done processing the queue
// release locks in an order that ensures
// a workitem won't get stranded on the queue
Monitor.Exit(processing_queue);
Monitor.Exit(queue.SyncRoot);
break;
}
workitem = queue.Dequeue();
Monitor.Exit(queue.SyncRoot);
// this will get the database mutex, do the update and release
// the database mutex
doDatabaseModification(workitem);
}
}

ThreadPool creates a wait thread for ~64 waitable objects.
Good comments are here: Thread.sleep vs Monitor.Wait vs RegisteredWaitHandle?

Threading in C#

I have console application. In that i have some process that fetch the data from database through different layers ( business and Data access). stores the fetched data in respective objects. Like if data is fetched for student then this data will store (assigned ) to Student object. same for school. and them a delegate call the certain method that generates outputs as per requirement. This process will execute many times say 10 times. Ok? I want to run simultaneously this process. not one will start, it will finish and then second will start. I want after starting 1'st process, just 2'nd , 3rd....10'th must be start. Means it should be multithreading. how can i achieve this ? is that will give me error while connection with data base open and close ?
I have tried this concept . but when thread 1'st is starting then data will fetched for thread 1 will stored in its respective (student , school) objects. ok? when simultaneous 2'nd thread starts , but the data is changing of 1'st object ,while control flowing in program. What have to do?

Here is some sample code:
static void Main(string[] args)
{
for (int i=0; i<10; i++)
System.Threading.ThreadPool.QueueUserWorkItem(new WaitCallback(DbWork));
}
public void DbWork(object state)
{
// Call your database code here.
}

Have a look at the ThreadPool class. It should put you in the right place for easy handling of multi threaded applications

Yes this would require multithreading but I feel it makes a poor choice for a first attempt. Multithreading is complicated and requires you to "unlearn" a few things you picked up from procedural programming. This is further complicated by simultaneous database connections to the same database engine which may or may not improve your performance anyway.

Have a look at Parallel Programming in .NET 4.0. Especially the parallel task library gives you great control over threading different tasks that can have dependencies on each other.

You need to instantiate a new instance of each class for each thread. If each call to the database is modifying the data from the first call, you are referencing the same instance. Multithreading is considered an "advanced topic", you may want to find another way to solve this problem if at all possible.

this may help!
start a thread that is master thread that calls all threads 1,2,3 .... in sequence and do this logic to master thread
master thread started..
thread 1 started if first time or resumed if not first time
work completed
suspend t1
start t2 is first time
or resume t2 if not first time
t2 work complete
t2 suspend
t3 started or resumed as condition handled first
and so on for all thread
the master thread will control the sequence for all thread and suspend and resume them also
try this,

You should not implement a multithreaded solution unless you understand the mechanisms and inherent problems.
That said, when you feel you are ready to move to a parallel algorithm, use the pattern known as APM (Asynchronous Programming Model) where you spawn the worker threads and let them notify the main thread via callback methods (i.e. event delegates).
Jeffrey Richter explains: Implementing the CLR Asynchronous Programming Model

C# lower thread priority in thread pool

I have several low-imprtance tasks to be performed when some cpu time is available. I don't want this task to perform if other more import task are running. Ie if a normal/high priority task comes I want the low-importance task to pause until the importance task is done.
There is a pretty big number of low importance task to be performed (50 to 1000). So I don't want to create one thread per task. However I believe that the threadpool do not allow some priority specification, does it ?
How would you do solve this ?

You can new up a Thread and use a Dispatcher to send it takes of various priorities.
The priorities are a bit UI-centric but that doesn't really matter.

You shouldn't mess with the priority of the regular ThreadPool, since you aren't the only consumer. I suppose the logical approach would be to write your own - perhaps as simple as a producer/consumer queue, using your own Thread(s) as the consumer(s) - setting the thread priority yourself.
.NET 4.0 includes new libraries (the TPL etc) to make all this easier - until then you need additional code to create a custom thread pool or work queue.

When you are using the build in ThreadPool all threads execute with the default priority. If you mess with this setting it will be ignored. This is a case where you should roll your own ThreadPool. A few years ago I extended the SmartThreadPool to meet my needs. This may satisfy yours as well.

I'd create a shared Queue of pending task objects, with each object specifying its priority. Then write a dispatcher thread that watches the Queue and launches a new thread for each task, up to some max thread limit, and specifying the thread priority as it creates it. Its only a small amount of work to do that, and you can have the dispatcher report activity and even dynamically adjust the number of running threads. That concept has worked very well for me, and can be wrapped in a windows service to boot if you make your queue a database table.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.