So, I've got a WCF application that accepts requests to do work at a specific time. I could have a list of thousands of things to do in the future at varying times. Is there an existing framework that we can leverage to do this? The current implementation polls a database, looking for things to do based on a datetime, which smells.
A few ideas.
Timers. Set a timer when the request comes in that fires are the appropriate time. This seems like I could have too many threads floating around.
Maintain a list of objects with a datetime in memory, poll this for things to do.
Use a library like quartz. I have concerns as to whether this can handle the volume.
If you keep a list of tasks sorted by their trigger times (Your database should be able to do this without any issues. If you want to keep it in-memory, Power Collections has a priority queue you could use), you can get by with a single timer that always activates for the first one in the list.
Related
I am developing a c# winform gui that revolves around a few datagridviews that need to be updated at some manageable interval depending on # of updates, somewhere around the 1 - 3 second mark.
Currently, upon startup, it retrieves the current batch of data for the grids from a mysql database, then listens to various redis channels for updates/adds/deletes to those grids. These updates are handled by the redis multiplexer and heavy calculations are offloaded to worker threads, then the gui is updated. So essentially it is one to one, meaning one update/add/delete is processed and the gui is then updated. This works well enough so far however during a crunch of data i'm beginning to notice slowness, it should not happen since i'm expecting much heavier loads in the future.
As the system throughput grows, where it is currently at most around 100 redis messages every couple of seconds, this should be designed to handle thousands.
Conceptually, when it comes to general gui design in this fashion, would it be better to do one of the following:
Decouple from the current 1 to 1 update scenario described above, redis msg -> process -> update gui, and have all redis messages queue in a list or datatable, then on a timer poll this awaiting update queue by the gui and update. This way the gui is not flooded, it updates on its own schedule.
Since these updates coming from redis are also persisted in the mysql database, just ignore redis completely, and at some timed interval query the database, however this would probably result in an entire requeue since it will be tough to know what has changed since the last pull.
Do away with attempting to update the gui in semi-realtime fashion, and only provide a summary view then if user digs in, retrieve data accordingly, but this still runs in to the same problem as the data that is then being viewed should be updated, albeit a smaller subset. However, there exist tons of sophisticated c# enterprise level applications that represent large amounts of data updating especially in the finance industry and seem to work just fine.
What is best practice here? I prefer options 1 or 2 because in theory it should be able to work.
thank you in advance
So the question is long but pretty self explanatory. I have an app that runs on multiple servers that uses parallel looping to handle objects coming out of a MongoDB Collection. Since MongoDB forces me to allow multi read access I cannot stop multiple processes and or servers from grabbing the same document from the collection and duplicating work.
The program is such that the app waits for information to appear, does some work to figure out what to do with it, then deletes it once it's done. What I hope to achieve is that if I could keep documents from being accessed at the same time, knowing that once one has been read it will eventually be deleted, I can speed up my throughput a bit overall by reducing the number of duplicates and allowing the apps to grab things that aren't being worked.
I don't think pessimistic is quite what I'm looking for but maybe I misunderstood the concept. Also if alternative setups are being used to solve the same problem I would love to hear what might be being used.
Thanks!
What I hope to achieve is that if I could keep documents from being accessed at the same time
The simplest way to achieve this is by introducing a dispatch process architecture. Add a dedicated process that just watch for changes then delegate or dispatch the tasks out to multiple workers.
The process could utilise MongoDB ChangeStreams to access real-time data changes on a single collection, a database or an entire deployment. Once it receives a stream/document, just sends to a worker for processing.
This should also reduce multiple workers trying to access the same tasks and have a logic to back-down.
First off, I will be talking about some legacy code and we are trying to avoid changing it as much as possible. Also, my experience with windows services and WCF is a bit limited so some of the questions may be a bit newbie. Just to give a bit of context before the question.
We have an existing service that loops. It checks via a database call to see if it has records to process. If it does not find any records, it sleeps for 30 seconds and then wakes back up to try again.
I would like to add an entry point to this service that would allow me to pass a record to this service in addition to it processing the records from the database. So the basic flow would be.
Loop
* Read record from database
* If no record from DB, process any records that were passed in via the entry point.
* No records at all, sleep for 30 seconds.
My concern is this. Is it possible to implement this in one service such that I have the looping process but I also allow for calls to come in at any time and add additional items to a queue that can be processed within the loop. My concern is with concurrency and keeping the loop and the listener from stepping on each other.
I know this question may not be worded quite right but I am on the new side with working with this. Any help would be appreciated.
My concern is with concurrency and keeping the loop and the listener from stepping on each other.
This shouldn't be an issue, provided you synchronize access correctly.
The simplest option might be to use a thread safe collection, such as a ConcurrentQueue<T>, to hold your items to process. The WCF service can just add items to the collection without worry, and your next processing step would handle it. The synchronization in this case is really minimal, as the queue would already be fully thread safe.
In addition to Reed's excellent answer, you might want to persist the records in a MSMQ queue to prevent your service from losing records on shutdown, restart, or crash of your service.
I have been given a windows service written by a previous intern at my current internship that monitors an archive and alerts specific people through emails and pop-ups should one of the recorded values go outside a certain range. It currently uses a timer to check the archive every 30 seconds, and I have been asked if I would be able to update it to allow a choice of time depending on what "tag" is being monitored. It uses an XML file to keep track of which tags are being monitored. Would creating multiple timers in the service be the most efficient way of going about this? I'm not really sure what approach to take.
The service is written in C# using .NET 3.5.
Depending on the granularity, you could use a single timer that is a common factor of the timing intervals they want. Say they want to put in the XML file that each archive is to be checked every so many minutes. You set up a timer that goes off once a minute, and you check how long it's been since you did each one and whether to do it or not.
If you're getting a chance to re-architect, I would move away from a service to a set of scheduled tasks. Write it so one task does one archive. Then write a controller program that sets up the scheduled tasks (and can stop them, change them etc.) The API for scheduled tasks on Windows 7 is nice and understandable, and unlike a service you can impose restrictions like "don't do it if the computer is on battery" or "only do it if the machine is idle" along with your preferences for what to do if a chance to run the task was missed. 7 or 8 scheduled tasks, each on their own schedule, using the same API of yours, passing in the archive path and the email address, is a lot neater than one service trying to juggle everything at once. Plus the machine will start up faster when you don't have yet another autostart service on it.
Efficient? Possibly not - especially if you have lots of tags, as each timer takes a tiny but finite amount of resources.
An alternative approach might be to have one timer that fires every second, and when that happens you check a list of outstanding requests.
This has the benefit of being easier to debug if things go wrong as there's only one active thread.
As in most code maintenance situations, however, it depends on your existing code, your ability, and how you feel more comfortable.
I woould suggest to just use one timer scheduled at the least common divisor.
For example configure your timer to signal every second and you can handle every interval (1 second, 2 seconds, ...) by counting the according number of timer ticks.
I will say this right off the bat. I am an amateur at threading. I am a senior c# web developer, but I have a project that requires me to populate a lot of objects that take a long time to populate as they require WebRequests and Responses to populate. I have everything working without threading, but it does not run fast enough for my requirements. I would like to pass everything to a ThreadPool to have the threading managed for me as I may be queuing up 20,000 threads at the same time and for obvious reasons. I do not want to hit a website with the requests needed to populate all of them at once.
What I would like to do is to pass in an object, populate it, and then add it to a collection in the main thread once it is populated. Then once all the objects are populated, continue on with execution of the program. I do not know how many objects will need to be populated until they are all populated either.
My question...What is the best approach to doing this?
Here is the loop that I am trying to speed up:
foreach (HElement hElement in repeatingTag.RunRepeatingTagInstruction())
{
object newObject = Activator.CreateInstance(currentObject.GetType().GetGenericArguments()[0]);
List<XElement> ordering = GetOrdering(tagInstructions.Attribute("type").Value);
RunOrdering(ordering, newObject, hElement);
MethodInfo method = currentObject.GetType().GetMethod("Add");
method.Invoke(currentObject, new[] { newObject });
}
I don't know what the object is beforehand so I create it using the Activator. The RunOrdering method runs through the instructions that I pass that tell it how to populate the object. Then I add it to the collection. Also, the object itself may have properties that will require this method to run through and populate their data.
Since you probably have to wait for them all to be complete, all you need is a Parallel.ForEach() or equivalent. And a Thread-safe collection. Note that for I/O intensive tasks you would want to limit the number of Threads. 20.00 threads would be insane in any situation.
But we would need to see more details (code). Note that there is no such thing as "a collection in the main thread".
populate a lot of objects that take a
long time to populate as they require
WebRequests and Responses
Avoid Threading if you are doing requests.
No speedup after two threads, merely existent with the two.
A lot of truble for nothing.
Couple of suggestions:
If you are on .net 4 try using Tasks instead. You would have much better control over scheduling. Try to not share any objects, make them immutable and all the warnings and best practices about synchronisation, shared data etc.
And secondly you might want to think of an out of process solution like message queues (xMQ products or poor man's database table as queue) so you would have the chance to distribute your task over multiple machines if you need to.