C# Windows Service queue with Pool - c#

EDIT: Context
I have to develop a web asp.net application which will allow user to create "Conversion request" for one or several CAO files.
My application should just upload files to a directory where I can convert them.
I then want to have a service that will check the database updated by the web application to see if a conversion is waiting to be done. Then I have to call a batch command on this file with some arguments.
The thing is that those conversion can freeze if the user give a CAO file which has been done wrongly. In this case, we want the user or an admin to be able to cancel the conversion process.
The batch command use to convert is a third party tool that I can't change. It need a token to convert, and I can multithread as long as I have more than one token available. An other application can use those token at the same moment so I can't just have a pre-sized pool of thread according to the number of token I have.
The only way I have to know if I can convert is to start the conversion with the command and see if in the logs it tells me that I have a licence problem which mean either "No token available" or "Current licence doesn't accept the given input format". As I allow only available input formats on the web application, I can't have the second problem.
The web application is almost done, I mean that I can upload file and download results and conversion logs at the end. Now I need to do the service that will take input files, convert them, update convert status in database and lastly add those files in the correct download dirrectory.
I have to work on a service which will look in a database at a high frequency (maybe 5 or 10 seconds) if a row is set as "Ready to convert" or "Must be canceled".
If the row is set to "ready to convert" I must try to convert it, but I do it using a third party dll that have a licence token system that allow me to do only 5 converts simultaneously atm.
If the row is set to "Must be canceled" I must kill the conversion because it's either freeze and the admin had to kill it or because the user canceled his own task.
Also, conversion times can be very long, from 1 or 2 seconds to several hours depending on the file size and how it has been created.
I was thinking about a pooling system, as I saw here :
Stackoverflow answer
Pooling system give me the advantage to isolate the reading database part to the conversion process. But I have the feeling that I loose a kind of control on background process. Which is maybe juste because I'm not used to them.
But I'm not very used to services and even if this pool system seems good, I don't know how I can cancel any task if needed ?
The tool I use to convert work as a simple batch command that will just return me an error if no licence are available now, but using a pool how can I make the convert thread wait for the convert to be done if No licence are available is a simple infinite while loop an appropriate answer ? It seems quite bad to me.
Finally, I can't just use a "5 threads pool" as thoses licences are also used by 2 others applications which doesn't let me know at any time how many of them are available.
The idea of using pool can also be wrong, as I said, I'm not very used to services and before starting something stupid, I prefer ask abotu the good way to do it.
Moreover, about the database reading/writing, even if I think that the second option is better, should I:
Use big models files that I already use on my ASP.NET web application which will create a lot of objects (one for each row as it's entities models).
Don't use entities models but lighter models which will be less object entities oriented but will probably be less ressources demanding. This will also be harder to maintain.
So I'm more asking about some advices on how I should do it than a pure code answer, but some example could be very useful.
EDIT: to be more precise, I need to find a way to:
(For the moment, I stay with only one licence available, I will evolve it later if needed)
Have a service that run as a loop and will if possible start a new thread for the given request. The service must still be running as the status can be set to "Require to be cancel".
At the moment I was thinking about a task with a cancellation token. Which would probably achive this.
But if the task find that the token isn't currently available, how can I say to my main loop of the service that it can't process now ? I was thinking about having just an integer value as a return where the return code would be an indicator on the ending reason: Cancellation / No token / Ended... Is that a good way to do ?

What I'm hearing is that the biggest bottleneck in your process is the conversion... pooling / object mapping / direct sql doesn't sound as important as the conversion bottleneck.
There are several ways to solve this depending on your environment and what your constraints are. Good, fast, and cheap... pick 2.
As far as "best practice" go, there are large scale solutions (ESBs, Message Queue based etc), there are small scale solutions (console apps, batch files, powershell scripts on Windows scheduler, etc) and the in-between solutions (this is typically where the "Good Enough" answer is found). The volume of the stuff you need to process should decide which one is the best fit for you.
Regardless of which you choose above...
My first step will be to eliminate variables...
How much volume will you be processing? If you wrote something that's not optimized but works, will that be enough to process your current volume? (e.g. a console app to be run using the Windows Task Scheduler every 10 - 15 seconds and gets killed if it runs for more than say 15 minutes)
Does the third party dll support multi-threading? If no, that eliminates all your multi-threading related questions and narrows down your options on how to scale out.
You can then determine what approach will fit your problem domain...
will it be the same service deployed on several machines, each hitting the database every 10-15 seconds?, or
will it be one service on one machine using multi-threading?
will it be something else (pooling might or might not be in play)?
Once you get the answer above, the next question is.
will the work required fit within the allocated budget and time constraint of your project? if not, go back to the questions above and see if there questions above that you can answer differently that would change the answer to this question to yes.
I hope that these questions help you get to a good answer.

Related

Using TPL in .NET

I have to refactor a fairly time-consuming process in one of my applications and after doing some research I think it's a perfect match for using TPL. I wanted to clarify my understanding of it and ask if there are any more issues which I should take into account.
In few words, I have a windows service, which runs overnight and sends out emails with data updates to around 10000 users. At presence, the whole process takes around 8 hrs to complete. I would like to reduce it to 2 hrs max.
Application workflow follows steps below:
1. Iterate through all users list
2. Check if this user has to be notified
3. If so, create an email body by calling external service
4. Send an email
Analysis of the code has shown that step 3 is the most time-consuming one and takes around 3,5 sec to complete. It means, that when processing 10000 users, my application waits well over 6 hrs in total for a response from the external service! I think this is a reason good enough to try to introduce some asynchronous and parallel processing.
So, my plan is to use Parallel class and ForEach method to iterate through users in step 1. As I can understand this should distribute processing each user into a separate thread, making them run in parallel? Processes are completely independent of each other and each doesn't return any value. In the case of any exception being thrown it will be persisted in logs db. As with regards to step 3, I would like to convert a call to external service into an async call. As I can understand this would release the resources on the thread so it could be reused by the Parallel class to start processing next user from the list?
I had a read through MS documentation regarding TPL, especially Potential Pitfalls in Data and Task Parallelism document and the only point I'm not sure about is "Avoid Writing to Shared Memory Locations". I am using a local integer to count a total number of emails processed. As with regards to all of the rest, I'm quite positive they're not applicable to my scenario.
My question is, without any implementation as yet. Is what I'm trying to achieve possible (especially the async await part for external service call)? Should I be aware of any other obstacles that might affect my implementation? Is there any better way of improving the workflow?
Just to clarify I'm using .Net v4.0
Yes, you can use the TPL for your problem. If you cannot influence your external problem, then this might be the best way.
However, you can make the best gains if you can get your external source to accept batches. Because this source could actually optimize the performance. Right now you have a message overhead of 10000 messages to serialize, send, work on, receive and deserialize. This is stuff that could be done once. In addition, your external source might be able to optimize the work they do if they know they will get multiple records.
So the bottom line is: if you need to optimize locally, the TPL is fine. If you want to optimize your whole process for actual gains, try to find out if your external source can help you, because that is where you can make some real progress.
You didn't show any code, and I'm assuming that step 4 (send an e-mail) is not that fast either.
With the presented case, unless your external service from step 3 (create an email body by calling external service) processes requests in parallel and supports a good load of simultaneous requests, you will not gain much with this refactor.
In other words, test the external service and the e-mail server first for:
Parallel request execution
The way to test this is to send at least 2 simultaneous requests and observe how long it takes to process them.
If it takes about double the time of a single, the requests have some serial processing, either they're queued or some broad lock is being taken.
Load test
Go up to 4, 8, 12, 16, 20, etc, and see where it starts to degrade.
You should set a limit on the amount of simultaneous requests to something that keeps execution time above e.g. 80% of the time it takes to process a single request, assuming you're the sole consumer
Or a few requests before it starts degrading (e.g. divide by the number of consumers) to leave the external service available for other consumers.
Only then can you decide if the refactor is worth. If you can't change the external service or the e-mail server, you must weight it they offer enough parallel capability without degrading.
Even so, be realistic. Don't let your service push the external service and the e-mail server to their limits in production.

Request timeout error while processing long tasks

I have an c# asp.net management system with a button that calls a SQL Server Query to get 90,000 strings of text in multiple languages and categorized into sections. This in turn is sorted and 150 Binary files made before saving as a .ZIP and emailing the user with the results. The total time to process this and email the results is about 6 minutes. In this time the Web Page is sat waiting for the whole process to complete. I would like to be able to press the start process button and then allow this to work away in the background while I continue using the web management system, but I am unsure what is the most efficient method for doing this. I initially created an asmx file thinking this would work but the result is the same and so I am now looking at async and await. Can anyone give me any pointers on this and let me know if I am on the right track. I am currently not getting anything back to let me know the process has completed successfully as I can handle this by emailing the user to say something went wrong. The reason for this is the user could be on any number of pages.
There are probably a few ways to go about tackling this problem. Your options will vary based on what version of .NET you are using, so I'll not post code directly; however, the you can implement the concept I describe using ASMX web services, WCF, MVC, and so on.
Start-and-poll Approach
The classic response for this kind of problem is to implement a StartSomething() method and a GetProgress() method. A very-simple example of this approach using ASMX-based web services is presented here.
In the example, one service method is used to start a process on a background thread. Myself, I would change the example by having the start method return a value to the client to identify which background process was started, as you could feasibly have several going on at a time.
The client then can call a separate method to get progress updates, and continue polling until the process is complete.
There are a number of reasons why you should prefer to do lengthy background processing in a non-IIS service. I recommend using a Windows service to protect yourself from IIS somewhat-randomly restarting your application pool in the middle of a big job.
WebSockets
Another option worth some exploration on your part is to use WebSockets, which allow the server to contact a modern browser when the process is complete. The main advantage of this approach is that the client does not need to busily poll the service for updates. Its primary disadvantage is that WebSockets are new enough that there are still plenty of browsers that could not be clients for such a service.
Good luck!

How would I create a long running process in C#?

This may seem very elementary, but I don't really have any experience in this realm - all my experience has been on the web side of things.
I need to create a process of some sort that will repeatedly query an API (around 5 times a second), get the results from the API (in JSON format), and then my process will do what it needs to do with the results (in my case, insert them into a SQL database). These details don't really matter to the scope of the question I have, I just want to give you an idea on what I'm trying to achieve in case someone wants to recommend a better way of doing it.
My first thought was to create a console app that basically never quits (unless I specifically tell it to). Is a console app the way to go for this? The idea is I'll have a VM set up which will host my solution, including this "process" I create. I'm not all too familiar with Windows Services, or Windows Tasks, but I probably need to write some custom code so therefore I imagine I can't use the Windows Task Scheduler, am I right?
Once options would be to create a Windows Service which is the OS-level implementation of a long running process. To do so in C# you may wish to read through some tutorials online, perhaps starting with the MSDN Walkthrough. You should also read about Windows Services in general and the differences between a service and a regular user process (mainly the fact that services have no UI and can't interact with the user directly, and some of the other security considerations).
Other options may be to leverage a framework such as WCF or similar.

How should I complete this type of notification?

I am basically creating a site for recruiters. One of the functionality in my application requires posting to Facebook periodically. The posting frequency can be from 0(Never) to 4(High)
For Eg. If a recruiter has 4 open jobs and he has posting frequency set to 4, each job should be posted as per it's turn: 1st job on 1st day, 2nd job on 2nd, 3rd job on 3rd etc, on 5th day again 1st job (round robin fashion).
Had he set the posting frequency to 2, two jobs would be posted daily (thus each job would be posted every 2 days)
My only question is what type of threading should I create for this since this is all dynamic!! Also, any guidelines on what type of information should I store in database?
I need just a general strategy to solve this problem. No code..
I think you need to seperate it from your website, I mean its better to run the logic for posting jobs in a service hosted on IIS ( I am not sure such a thing exists or not, but I guess there is).
Also you need to have table for job queue to remember which jobs need to be posted, then your service would pick them up and post them one by one.
To decide if this is the time for posting a job you can define a timer with a configurable interval to check if there is any job to post or not.
Make sure that you keep the verbose log details if posting fails. It is important because it is possible that Facebook changes its API or your API key becomes invalid or anything else then you need to know what happened.
Also I strongly suggest to have a webpage for reporting the status of jobs-to-post queue, if they failed what was the causes of problem.
If you program runs non-stop, you can just use one of the Timer classes available in .NET framework, without the need to go for full-blown concurrency (e.g. via Task Parallel Library).
I suspect, though, that you'll need more than that - some kind of mechanism to detect which jobs were successfully posted and which were "missed" due program not running (or network problems etc.), so they can be posted the next time the program is started (or network becomes available). A small local database (such as SQLite or MS SQL Server Compact) should serve this purpose nicely.
If the requirements are as simple as you described, then I wouldn't use threading at all. It wouldn't even need to be a long-running app. I'd create a simple app that would just try to post a job and then exit immediately. However, I would scheduled it to run once every given period (via Windows Task Scheduler).
This app would check first if it hasn't posted any job yet for the given posting frequency. Maybe put a "Last-Successful-Post-Time" setting in your datastore. If it's allowed to post, the app would just query the highest priority job and then post it to Facebook. Once it successfully posts to Facebook, that job would then be downgraded to the lowest priority.
The job priority could just be a simple integer column in your data store. Lower values mean higher priorities.
Edit:
I guess what I'm suggesting is if you have clear boundaries in your requirements, I would suggest breaking your project into multiple applications. This way there is a separation of concerns. You wouldn't then need to worry how to spawn your Facebook notification process inside your web site code.

Queue file operations for later when file is locked

I am trying to implement file based autoincrement identity value (at int value stored in TXT file) and I am trying to come up with the best way to handle concurrency issues. This identity will be used for unique ID for my content. When saving new content this file gets opened, the value gets read, incremented, new content is saved and the incremented value is written back to the file (whether we store the next available ID or the last issued one doesn't really matter). While this is being done another process might come along and try to save new content. The previous process opens the file with FileShare.None so no other process will be able to read the file until it is released by the first process. While the odds of this happening are minimal it could still happen.
Now when this does happen we have two options:
wait for the file to become available -
Emulate waiting on File.Open in C# when file is locked
we are talking about miliseconds here, so I guess this wouldn't be an issue as long as something strange happens and file never becomes available, then this solution would result in an infinite loop, so not an ideal solution
implement some sort of a queue and run all operations on files within a queue. My user experience requirements are such that at the time of saving/modifying files user should never be informed about exceptions or that something went wrong - he would get informed about them through a very friendly user interface later when operations would fail on the queue too.
At the moment of writing this, the solution should work within ASP.NET MVC application (both synchronously and async thru AJAX) but, if possible, it should use the concepts that could also work in Silverlight or Windows Forms or WPF application.
With regards to those two options which one do you think is better and for the second option what are possible technologies to implement this?
The ReaderWriterLockSlim class seems like a good solution for synchronizing access to the shared resource.

Categories