I am automating some tasks on my website, but I'm currently stuck.
public void Execute(JobExecutionContext context)
{
var linqFindAccount = from Account in MainAccounts
where Account.Done == false
select Account;
foreach (var acc in linqFindAccount)
{
acc.Done = true;
// stuff
}
}
The issue is that when I start multiple threads the first threads get assigned to the same first account because they set the Done value to true at the same time. How am I supposed to avoid this?
EDIT:
private object locker = new object();
public void Execute(JobExecutionContext context)
{
lock (locker)
{
var linqFindAccount = from Account in MainAccounts
where Account.Done == false
select Account;
foreach (var acc in linqFindAccount)
{
Console.WriteLine(context.JobDetail.Name + " assigned to " + acc.Mail);
acc.Done = true;
// stuff
}
}
}
Instance [ 2 ] assigned to firstmail#hotmail.com
Instance [ 1 ] assigned to firstmail#hotmail.com
First two threads got assigned to the first account, even though the list contains 30 accounts.
Thanks.
Use
private static readonly object locker = new object();
instead of
private object locker = new object();
Your problem is that the deferred execution happens when you start the foreach loop. So the result is cached and not reevaluated every loop. So every thread will work with it's own list of the items. So when an Account is set to done, the other list still remain with the object in it.
A Queue is more suitable in this case. Just put the items in a shared Queue and let the loops take items of the queue and let them finish when the Queue is empty.
Few problems with your code:
1) Assuming you use stateless Quartz jobs, your lock does not do any good. Quartz creates new job instance every time it fires a trigger. That is why you see the same account processed twice. It would only work if you use stateful job (IStatefulJob). Or make lock static, but read on.
2) Even if 1) is fixed, it would defeat the purpose of having multiple threads because they will all wait for each other on the same lock. You might as well have one thread doing this.
I don't know enough about requirements especially what's going on in // stuff. It maybe that you don't need this code to run on multiple threads and sequential execution will do just fine. I assume this is not the case and you want to run it multiple threads. The easiest way is to have only one Quartz job. In this job, load Accounts in chunks, say 100 jobs in every chunk. This will give you 5 chunks if you have 500 accounts. Offload every chunk processing to thread pool. It will take care of using optimal number of threads. This would be a poor man's Producer Consumer Queue.
public void Execute(JobExecutionContext context) {
var linqFindAccount = from Account in MainAccounts
where Account.Done == false
select Account;
IList<IList<Account>> chunks = linqFindAccount.SplitIntoChunks(/* TODO */);
foreach (IList<Account> chunk in chunks) {
ThreadPool.QueueUserWorkItem(DoStuff, chunk);
}
}
private static void DoStuff(Object parameter) {
IList<Account> chunk = (IList<Account>) parameter;
foreach (Account account in chunk) {
// stuff
}
}
As usual, with multiple threads you have to be very careful with accessing mutable shared state. You will have to make sure that everything you do in 'DoStuff' method will not cause undesired side effects. You may find this and this useful.
foreach (var acc in linqFindAccount)
{
string mailComponent = acc.Mail;
Console.WriteLine(context.JobDetail.Name + " assigned to " + mailComponent);
acc.Done = true;
// stuff
}
Try above.
Related
Sorry for the confusing title, but that's basically what i need, i could do something with global variables but that would only be viable for 2 threads that are requested one after the other.
here is a pseudo code that might explain it better.
/*Async function that gets requests from a server*/
if ()//recieved request from server
{
new Thread(() =>
{
//do stuff
//in the meantime a new thread has been requested from server
//and another one 10 seconds later.. etc.
//wait for this current thread to finish
//fire up the first thread that was requested while this ongoing thread
//after the second thread is finished fire up the third thread that was requested 10 seconds after this thread
//etc...
}).Start();
}
I don't know when each thread will be requested, as it is based on the server sending info to client, so i cant do Task.ContiuneWith as it's dynamic.
So Michael suggested me to look into Queues, and i came up with it
static Queue<Action> myQ = new Queue<Action>();
static void Main(string[] args)
{
new Thread(() =>
{
while (1 == 1)
{
if (myQ.FirstOrDefault() == null)
break;
myQ.FirstOrDefault().Invoke();
}
}).Start();
myQ.Enqueue(() =>
{
TestQ("First");
});
myQ.Enqueue(() =>
{
TestQ("Second");
});
Console.ReadLine();
}
private static void TestQ(string s)
{
Console.WriteLine(s);
Thread.Sleep(5000);
myQ.Dequeue();
}
I commented the code, i basically need to check if the act is first in queue or not.
EDIT: So i re-made it, and now it works, surely there is a better way to do this ? because i cant afford to use an infinite while loop.
You will have to use a global container for the threads. Maybe check Queues.
This class implements a queue as a circular array. Objects stored in a
Queue are inserted at one end and removed from the other.
Queues and stacks are useful when you need temporary storage for
information; that is, when you might want to discard an element after
retrieving its value. Use Queue if you need to access the information
in the same order that it is stored in the collection. Use Stack if
you need to access the information in reverse order. Use
ConcurrentQueue(Of T) or ConcurrentStack(Of T) if you need to access
the collection from multiple threads concurrently.
Three main operations can be performed on a Queue and its elements:
Enqueue adds an element to the end of the Queue.
Dequeue removes the oldest element from the start of the Queue.
Peek returns the oldest element that is at the start of the Queue but does not remove it from the Queue.
EDIT (From what you added)
Here is how I would change your example code to implement the infinite loop and keep it under your control.
static Queue<Action> myQ = new Queue<Action>();
static void Main(string[] args)
{
myQ.Enqueue(() =>
{
TestQ("First");
});
myQ.Enqueue(() =>
{
TestQ("Second");
});
Thread thread = new Thread(() =>
{
while(true) {
Thread.Sleep(5000)
if (myQ.Count > 0) {
myQ.Dequeue().Invoke()
}
}
}).Start();
// Do other stuff, eventually calling "thread.Stop()" the stop the infinite loop.
Console.ReadLine();
}
private static void TestQ(string s)
{
Console.WriteLine(s);
}
You could put the requests that you receive into a queue if there is a thread currently running. Then, to find out when threads return, they could fire an event. When this event fires, if there is something in the queue, start a new thread to process this new request.
The only thing with this is you have to be careful about race conditions, since you are communicating essentially between multiple threads.
I have an integration service which runs a calculation heavy, data bound process. I want to make sure that there are never more than say, n = 5, (but n will be configurable, changeable at runtime) of these processes running at the same. The idea is to throttle the load on the server to a safe level. The amount of data processed by the method is limited by batching, so I don't need to worry about 1 process representing a much bigger load than another.
The processing method is called by another process, where requests to run payroll are held on a queue, and I can insert some logic at that point to determine whether to process this request now, or leave it on the queue.
So i want a seperate method on the same service as the processing method, which can tell me if the server can accept another call to the processing method. It's going to ask, "how many payroll runs are going on? is that less than n?" What's a neat way of achieving this?
-----------edit------------
I think I need to make it clear, the process that decides whether to take the request off the queue this is seperated from the service that processes the payroll data by a WCF boundary. Stopping a thread on the payroll processing process isn't going to prevent more requests coming in
You can use a Semaphore to do this.
public class Foo
{
private Semaphore semaphore;
public Foo(int numConcurrentCalls)
{
semaphore = new Semaphore(numConcurrentCalls, numConcurrentCalls);
}
public bool isReady()
{
return semaphore.WaitOne(0);
}
public void Bar()
{
try
{
semaphore.WaitOne();//it will only get past this line if there are less than
//"numConcurrentCalls" threads in this method currently.
//do stuff
}
finally
{
semaphore.Release();
}
}
}
Review the Object Pool pattern. This is what you're describing. While not strictly required by the pattern, you can expose the number of objects currently in the pool, the maximum (configured) number, the high-watermark, etc.
I think that you might want a BlockingCollection, where each item in the collection represents one of the concurrent calls.
Also see IProducerConsumerCollection.
If you were just using threads, I'd suggest you look at the methods for limiting thread concurrency (e.g. the TaskScheduler.MaximumConcurrencyLevel property, and this example.).
Also see ParallelEnumerable.WithDegreeOfParallelism
void ThreadTest()
{
ConcurrentQueue<int> q = new ConcurrentQueue<int>();
int MaxCount = 5;
Random r = new Random();
for (int i = 0; i <= 10000; i++)
{
q.Enqueue(r.Next(100000, 200000));
}
ThreadStart proc = null;
proc = () =>
{
int read = 0;
if (q.TryDequeue(out read))
{
Console.WriteLine(String.Format("[{1:HH:mm:ss}.{1:fff}] starting: {0}... #Thread {2}", read, DateTime.Now, Thread.CurrentThread.ManagedThreadId));
Thread.Sleep(r.Next(100, 1000));
Console.WriteLine(String.Format("[{1:HH:mm:ss}.{1:fff}] {0} ended! #Thread {2}", read, DateTime.Now, Thread.CurrentThread.ManagedThreadId));
proc();
}
};
for (int i = 0; i <= MaxCount; i++)
{
new Thread(proc).Start();
}
}
I have a queue , it receives data overtime.
I used multi thread to dequeue and save to database.
I create an Array of Thread to do this job.
for (int i = 0; i < thr.Length; i++)
{
thr[i] = new Thread(new ThreadStart(SaveData));
thr[i].Start();
}
SaveData
Note : eQ and eiQ is 2 global queue. I used while to keep thread alive.
public void SaveData()
{
var imgDAO = new imageDAO ();
string exception = "";
try
{
while (eQ.Count > 0 && eiQ.Count > 0)
{
var newRecord = eQ.Dequeue();
var newRecordImage = eiQ.Dequeue();
imageDAO.SaveEvent(newEvent, newEventImage);
var storepath = Properties.Settings.Default.StorePath;
save.WriteFile(storepath, newEvent, newEventImage);
}
}
catch (Exception e)
{
Global._logger.Info(e.Message + e.Source);
}
}
It did create multi thread but when I debug, only 1 thread alive, the rest is dead.
I dont know why ? Any one have idea? Tks
You are using WriteFile in that thread function.
Is this possible, that you trying to write file that may be locked by another thread (same filename or something)?
And one more thing - saving data on disk by multiple threads - i dont like it.
I think you should create some buffer instead of many threads and write it every few records/entries.
As mentioned in the comments, your threads will only live as long as there are elements in the queues, as soons as both are emptied the threads will terminate. This could explain why you see only one living thread while debugging.
A potential answer to you question would be to use a BlockingCollection from the System.Collections.Concurrent classes instead of a Queue. That has the capability of doing a blocking dequeue, which will stop the thread(s) until more elements are available for processing.
Another problem is the nice race condition between eQ and eiQ -- consider using a single queue with a Tuple or custom data type so you can dequeue both newRecord and newRecordImage in a single operation.
I have a finite set of consumer threads each consuming a job. Once they process the job, they have a list of subjobs that were listed in the consumed job. I need to add the subjobs from that list that I don't already have in the database. There are 3 million in the database, so getting the list of which ones aren't already in the database is slow. I don't mind each thread blocking on that call, but since I have a race condition (see code) I have to lock them all on that slow call, so they can only call that section one at a time and my program crawls. What can I do to fix this so the threads don't slow down for that call? I tried a queue, but since the threads are pushing out lists of jobs faster than the computer can determine which ones should be added to the database, I end up with a queue that keeps growing and never empties.
My code:
IEnumerable<string> getUniqueJobNames(IEnumerable<job> subJobs, int setID)
{
return subJobs.Select(el => el.name)
.Except(db.jobs.Where(el => el.set_ID==setID).Select(el => el.name));
}
//...consumer thread i
lock(lockObj)
{
var uniqueJobNames = getUniqueJobNames(consumedJob.subJobs, consumerSetID);
//if there was a context switch here to some thread i+1
// and that thread found uniqueJobs that also were found in thread i
// then there will be multiple copies of the same job added in the database.
// So I put this section in a lock to prevent that.
saveJobsToDatabase(uniqueJobName, consumerSetID);
}
//continue consumer thread i...
Rather than going back to the database to check for uniqueness of job names you could the relevant info into a lookup data structure into memory, which allows you to check the existence much faster:
Dictionary<int, HashSet<string>> jobLookup = db.jobs.GroupBy(i => i.set_ID)
.ToDictionary(i => i.Key, i => new HashSet<string>(i.Select(i => i.Name)));
This you only do once. Afterwards every time you need to check for uniqueness you use the lookup:
IEnumerable<string> getUniqueJobNames(IEnumerable<job> subJobs, int setID)
{
var existingJobs = jobLookup.ContainsKey(setID) ? jobLookup[setID] : new HashSet<string>();
return subJobs.Select(el => el.Name)
.Except(existingJobs);
}
If you need to enter a new sub job also add it to the lookup:
lock(lockObj)
{
var uniqueJobNames = getUniqueJobNames(consumedJob.subJobs, consumerSetID);
//if there was a context switch here to some thread i+1
// and that thread found uniqueJobs that also were found in thread i
// then there will be multiple copies of the same job added in the database.
// So I put this section in a lock to prevent that.
saveJobsToDatabase(uniqueJobName, consumerSetID);
if(!jobLookup.ContainsKey(newconsumerSetID))
{
jobLookup.Add(newconsumerSetID, new HashSet<string>(uniqueJobNames));
}
else
{
jobLookup[newconsumerSetID] = new HashSet<string>(jobLookup[newconsumerSetID].Concat(uniqueJobNames)));
}
}
Current implementation: Waits until parallelCount values are collected, uses ThreadPool to process the values, waits until all threads complete, re-collect another set of values and so on...
Code:
private static int parallelCount = 5;
private int taskIndex;
private object[] paramObjects;
// Each ThreadPool thread should access only one item of the array,
// release object when done, to be used by another thread
private object[] reusableObjects = new object[parallelCount];
private void MultiThreadedGenerate(object paramObject)
{
paramObjects[taskIndex] = paramObject;
taskIndex++;
if (taskIndex == parallelCount)
{
MultiThreadedGenerate();
// Reset
taskIndex = 0;
}
}
/*
* Called when 'paramObjects' array gets filled
*/
private void MultiThreadedGenerate()
{
int remainingToGenerate = paramObjects.Count;
resetEvent.Reset();
for (int i = 0; i < paramObjects.Count; i++)
{
ThreadPool.QueueUserWorkItem(delegate(object obj)
{
try
{
int currentIndex = (int) obj;
Generate(currentIndex, paramObjects[currentIndex], reusableObjects[currentIndex]);
}
finally
{
if (Interlocked.Decrement(ref remainingToGenerate) == 0)
{
resetEvent.Set();
}
}
}, i);
}
resetEvent.WaitOne();
}
I've seen significant performance improvements with this approach, however there are a number of issues to consider:
[1] Collecting values in paramObjects and synchronization using resetEvent can be avoided as there is no dependency between the threads (or current set of values with the next set of values). I'm only doing this to manage access to reusableObjects (when a set paramObjects is done processing, I know that all objects in reusableObjects are free, so taskIndex is reset and each new task of the next set of values will have its unique 'reusableObj' to work with).
[2] There is no real connection between the size of reusableObjects and the number of threads the ThreadPool uses. I might initialize reusableObjects to have 10 objects, and say due to some limitations, ThreadPool can run only 3 threads for my MultiThreadedGenerate() method, then I'm wasting memory.
So by getting rid of paramObjects, how can the above code be refined in a way that as soon as one thread completes its job, that thread returns its taskIndex(or the reusableObj) it used and no longer needs so that it becomes available to the next value. Also, the code should create a reUsableObject and add it to some collection only when there is a demand for it. Is using a Queue here a good idea ?
Thank you.
There's really no reason to do your own manual threading and task management any more. You could restructure this to a more loosely-coupled model using Task Parallel Library (and possibly System.Collections.Concurrent for result collation).
Performance could be further improved if you don't need to wait for a full complement of work before handing off each Task for processing.
TPL came along in .Net 4.0 but was back-ported to .Net 3.5. Download here.