I am using Windows Workflow Foundation 4and as part of more complicated scenario I have those two states:
The idea is that the users are allowed to upload files which is part of the whole workflow. There may be several reason which my lead to the fact that an uploaded file can not be processed immediately. One of the reasons may be that currently there is a file uploaded from a certain user which is being processed so every other file uploaded during the processing should be in state WaitingProcessing. However when I enter the WaitingProcessing state I need to check this. In order to do that I have to implement something like this:
where generally the HasFileToProcess is a function which will call a stored procedure from the database to check if currently there is a file for this user which is in state Processing.
Almost all parts of the tasks are clear, one thing that I don't know how to do is how to call e function inside the condition field. In fact I have almost zero experience with Windows Workflow at all so I'm not even sure that this is the way to go so as a sub-question I would appreciate if someone knows and show me if there is better way to implement this kind of logic.
Related
EDIT: Context
I have to develop a web asp.net application which will allow user to create "Conversion request" for one or several CAO files.
My application should just upload files to a directory where I can convert them.
I then want to have a service that will check the database updated by the web application to see if a conversion is waiting to be done. Then I have to call a batch command on this file with some arguments.
The thing is that those conversion can freeze if the user give a CAO file which has been done wrongly. In this case, we want the user or an admin to be able to cancel the conversion process.
The batch command use to convert is a third party tool that I can't change. It need a token to convert, and I can multithread as long as I have more than one token available. An other application can use those token at the same moment so I can't just have a pre-sized pool of thread according to the number of token I have.
The only way I have to know if I can convert is to start the conversion with the command and see if in the logs it tells me that I have a licence problem which mean either "No token available" or "Current licence doesn't accept the given input format". As I allow only available input formats on the web application, I can't have the second problem.
The web application is almost done, I mean that I can upload file and download results and conversion logs at the end. Now I need to do the service that will take input files, convert them, update convert status in database and lastly add those files in the correct download dirrectory.
I have to work on a service which will look in a database at a high frequency (maybe 5 or 10 seconds) if a row is set as "Ready to convert" or "Must be canceled".
If the row is set to "ready to convert" I must try to convert it, but I do it using a third party dll that have a licence token system that allow me to do only 5 converts simultaneously atm.
If the row is set to "Must be canceled" I must kill the conversion because it's either freeze and the admin had to kill it or because the user canceled his own task.
Also, conversion times can be very long, from 1 or 2 seconds to several hours depending on the file size and how it has been created.
I was thinking about a pooling system, as I saw here :
Stackoverflow answer
Pooling system give me the advantage to isolate the reading database part to the conversion process. But I have the feeling that I loose a kind of control on background process. Which is maybe juste because I'm not used to them.
But I'm not very used to services and even if this pool system seems good, I don't know how I can cancel any task if needed ?
The tool I use to convert work as a simple batch command that will just return me an error if no licence are available now, but using a pool how can I make the convert thread wait for the convert to be done if No licence are available is a simple infinite while loop an appropriate answer ? It seems quite bad to me.
Finally, I can't just use a "5 threads pool" as thoses licences are also used by 2 others applications which doesn't let me know at any time how many of them are available.
The idea of using pool can also be wrong, as I said, I'm not very used to services and before starting something stupid, I prefer ask abotu the good way to do it.
Moreover, about the database reading/writing, even if I think that the second option is better, should I:
Use big models files that I already use on my ASP.NET web application which will create a lot of objects (one for each row as it's entities models).
Don't use entities models but lighter models which will be less object entities oriented but will probably be less ressources demanding. This will also be harder to maintain.
So I'm more asking about some advices on how I should do it than a pure code answer, but some example could be very useful.
EDIT: to be more precise, I need to find a way to:
(For the moment, I stay with only one licence available, I will evolve it later if needed)
Have a service that run as a loop and will if possible start a new thread for the given request. The service must still be running as the status can be set to "Require to be cancel".
At the moment I was thinking about a task with a cancellation token. Which would probably achive this.
But if the task find that the token isn't currently available, how can I say to my main loop of the service that it can't process now ? I was thinking about having just an integer value as a return where the return code would be an indicator on the ending reason: Cancellation / No token / Ended... Is that a good way to do ?
What I'm hearing is that the biggest bottleneck in your process is the conversion... pooling / object mapping / direct sql doesn't sound as important as the conversion bottleneck.
There are several ways to solve this depending on your environment and what your constraints are. Good, fast, and cheap... pick 2.
As far as "best practice" go, there are large scale solutions (ESBs, Message Queue based etc), there are small scale solutions (console apps, batch files, powershell scripts on Windows scheduler, etc) and the in-between solutions (this is typically where the "Good Enough" answer is found). The volume of the stuff you need to process should decide which one is the best fit for you.
Regardless of which you choose above...
My first step will be to eliminate variables...
How much volume will you be processing? If you wrote something that's not optimized but works, will that be enough to process your current volume? (e.g. a console app to be run using the Windows Task Scheduler every 10 - 15 seconds and gets killed if it runs for more than say 15 minutes)
Does the third party dll support multi-threading? If no, that eliminates all your multi-threading related questions and narrows down your options on how to scale out.
You can then determine what approach will fit your problem domain...
will it be the same service deployed on several machines, each hitting the database every 10-15 seconds?, or
will it be one service on one machine using multi-threading?
will it be something else (pooling might or might not be in play)?
Once you get the answer above, the next question is.
will the work required fit within the allocated budget and time constraint of your project? if not, go back to the questions above and see if there questions above that you can answer differently that would change the answer to this question to yes.
I hope that these questions help you get to a good answer.
I'm currently working on a C# project of an application we'd like to develop. We're brainstorming over the question of sharing the data between users. We'd like to be able to specify a folder where all the files of the application are going to be saved and we'd like to be able to save them on a shared folder (server, different PC or Mac, Nas, etc.).
The deployment would be like so :
Installation on the first PC, we choose a network drive, share, whatever and create all the files for the application in this location.
On the second PC we install the application and we choose the same location (on the network), the application doesn't create anything, it sees that it's already existing and it uses these files as the application's data
Same thing on the other clients
The application's files are going to be documents (most likely XML formatted documents) and when opening the application we want to show all the existing documents. The thing is, we don't only want to have the list of documents and be able to edit their content, we also would like to be able to edit the document's property, so in a way we'd like a file (Sqlite, XML, whatever) representing the list of all the documents and their attributes. Same thing for a list of addresses.
I know all that looks exactly like a client / server with database solution, but this solution is out of the question. I was first looking at SQLite for my data files, but I know concurrency can be a real problem and file lock doesn't work well. The thing is, I would have the same problem with simple XML files (refreshing the content when several users are working, accessing locked files).
So I guess my final question is : Is it feasable? Is there an alternative I didn't see which would allow us to do that more easily?
EDIT :
OK I'm not responding to every post or comment, because I'm currently testing concurrency with SQLite. What I did, and please correct me if the way I test this is wrong, is launch X BackgroundWorker which are all going to insert record in a sample database (which is recreated everytime I start the application). I tried launching 100 iterations of INSERT in the database via these backgroundWorkers.
Of course concurrency is working with one application running, it's simply waiting for the last BackgroundWorker to do it's job and then writing the next record. I also tried inserting at (almost) the same time, meaning I put a loop in every BackgroundWorker waiting for a modulo 5 timestamp (every 5 seconds, every BackgroundWorker runs). Again, it's waiting for the previous insert query to end before doing the next and everything's working fine. I even tried it with 500 BackgroundWorkers and it worked fine.
I then tried launching my app several times and running them simultaneously. When doing this I did have some issue. With two instances of my app it was still working fine, but when trying this with 4-5 instances, it got really buggy and I got two types of error : 1. database is locked 2. disk I/O failure. But mostyle locked databases.
What I did was pretty intensive, in the scenario of my application, it will never ever come to 5 processes trying to simultaneously insert 500 hunded rows at the same time (maybe I'll get a concurrency of two or three connections). But what really bugged me and what makes me think my testing method is not really a good one, is that I got these errors trying to work on a database on a shared network, on a NAS AND on my own HDD. Everytime it worked for maybe 30-40 queries then throwing me "database is locked" error.
Am I testing it wrong? Maybe I shouldn't be trying so hard to make this work, but I'm still not convinced that SQLite is not a good alternative to what I'm trying to do, since the concurrency is going to be really small.
With your optimistic/pessimistic locking, you are ultimately trying to build a database. Also, you WILL have issues with consistency while trying to keep multiple files in sync with each other. Think about if you update the "metadata" file, and the write fails half-way through because of a network blip. File corruption will ensue, and you will be left trying to reconstruct things from backups.
I would suggest a couple of likely solutions:
1) Host the content yourselves, and let them be pure clients (cloud based deployments are ideal for this). Most network/firewall issues can be circumvented by using HTTP as your transport (web services).
2) Have one of the workstations be the "server", which keeps it data files on the NFS. This will give you transactional integrity, incremental backups, etc. There are lots of good embedded database managements systems to help you manage this complexity. MS SQL Server even has some great options for this.
You right, Sqlite uses file locks on database file, so storing all data files in database would bring write-starvation problem for editing your documents.
May be it's better choice to implement simple optimistic/pessimistic locking by yourself on particular-file level? For example, in case of using pessimistic lock you just don't allow anyone to edit particular file, if somebody already in process of editing it. In this case you will hold lock just on one file, but not on the entire database. If possibility of conflict(editing particular file at the same time) is pretty low, it is better to go with optimistic locking.
Simple optimistic locking implementation:
When user get file for reading - it's OK, no problem here. If user get file for editing, you could calculate hash for this file(or get timestamp of last updated time of the file), and then, when user tries to save edited file, compare current(at the moment of saving) hash/timestamp to make sure that file has not been changed by somebody else. If file has not been changed then it's ok to save it. IF file has been changed, then current user is out of luck, you need to inform him about it. This optimistic scenario is nice when possibility of this "out of luck" is pretty low. Otherwise it's better to stick with pessimistic locking, when you do not allow user even to start file editing if somebody else is doing it.
As a result of a previous post (Architecture: simple CQS) I've been thinking how I could build a simple system that is flexible enough to be extended later.
In other words: I don't see the need for a full-blown CQRS now, but I want it to be easy to evolve to it later, if needed.
So I was thinking to separate commanding from querying, but both based on the same database.
The query part would be easy: a WCF data service based on views to that it's easy to query for data. Nothing special there.
The command part is something more difficult, and here's an idea: commands are of course executed in an asynchronous way, so they don't return a result. But, my ASP.NET MVC site's controllers often need feedback from a command (for example if a registration of a member succeeded or not). So if the controller sends a command, it also generates a transaction ID (a guid) that is passed together with the command properties. The command service receives this command, puts it into a transactions table in the database with state 'processing', and is executed (using DDD principles). After execution, the transactions table is updated, so that state becomes 'completed' or 'failed', and other more detailed information like the primary key that was generated.
Meanwhile the site is using the QueryService to poll for the state of this transaction, until it receives 'completed' or 'failed', and then it can continue its work based on this result. If the transactions table is polled and the result was 'completed' or 'failed', the entry is deleted.
A side effect is that I don't need guid's as keys for my entities, which is a good thing for performance and size.
In most cases this polling mechanism is probably not needed, but is possible if needed. And the interfaces are designed with CQS in mind, so open for the future.
Do you think of any flaws in this approach? Other ideas or suggestions?
Thanks!
Lud
I think you are very close to a full CQRS system with your approach.
I have a site that I used to do something similar to what you are describing. My site, braincredits.com, is architected using CQRS, and all commands are async in nature. So, as a result, when I create an entry, there is really no feedback to the user other than the command was successfully submitted for processing (not that it processed).
But I have a user score on the site (a count of their "credits") that should change as the user submits more items. But I don't want the user to keep hitting F5 to refresh the browser. So I am doing what you are proposing -- I have an AJAX call that fires off every second or two to see if the user's credit count has changed. If it has, the new amount is brought back and the UI is updated (with a little bit of animation to catch the user's attention -- but not too flashy).
What you're talking about is eventual consistency -- that the state of the application that the user is seeing will eventually be consistent with the system data (the system of record). That concept is pretty key to CQRS, and, in my opinion, makes a lot of sense. As soon as you retrieve data in a system (whether it's a CQRS-based one or not), the data is old. But if you assume that and assume that the client will eventually be consistent, then your approach makes sense and you can also design your UI to account for that AND take advantage of that.
As far as suggestions, I would watch how much polling you do and how much data you're sending up and back. Do go overboard with polling, which is sounds like you're not. But target what should be updated on a regular basis on your site and I think you'll be good.
The WCF Data Service layer for the query side is a good idea - just make sure it's only read-enabled (which I'm sure you've done).
Other than that, it sounds like you're off to a good start.
I hope this helps. Good luck!
I need some clarification on how these 2 entities interact...
If I use the BackgroundTransferService only to upload some files, the moment I move away from the application, the upload will stop. When I come back to the application, the upload will resume. Is that correct? Or is the upload lost?
However, if I wanted to make sure that the file will upload regardless of whether the user moves away from the application, I should kick off the BackgroundTransferService upload inside of a class that implements the ScheduledTaskAgent. Is that correct? If this is the case, how can that be done? BackgroundTransferService reports its progress via Events, thus I can't call NotifyComplete from the OnInvoke method of the ScheduleTaskAgent.
Am I going about it the wrong way?
No, this is not correct. If the background transfer is initiated, it is inserted in a queue that is dependent on a set of multiple factors, including other pending background transfers (from other third-party application) and the general network speed. You can find additional details here. That queue is processed, even if the application is tombstoned.
I am trying to implement file based autoincrement identity value (at int value stored in TXT file) and I am trying to come up with the best way to handle concurrency issues. This identity will be used for unique ID for my content. When saving new content this file gets opened, the value gets read, incremented, new content is saved and the incremented value is written back to the file (whether we store the next available ID or the last issued one doesn't really matter). While this is being done another process might come along and try to save new content. The previous process opens the file with FileShare.None so no other process will be able to read the file until it is released by the first process. While the odds of this happening are minimal it could still happen.
Now when this does happen we have two options:
wait for the file to become available -
Emulate waiting on File.Open in C# when file is locked
we are talking about miliseconds here, so I guess this wouldn't be an issue as long as something strange happens and file never becomes available, then this solution would result in an infinite loop, so not an ideal solution
implement some sort of a queue and run all operations on files within a queue. My user experience requirements are such that at the time of saving/modifying files user should never be informed about exceptions or that something went wrong - he would get informed about them through a very friendly user interface later when operations would fail on the queue too.
At the moment of writing this, the solution should work within ASP.NET MVC application (both synchronously and async thru AJAX) but, if possible, it should use the concepts that could also work in Silverlight or Windows Forms or WPF application.
With regards to those two options which one do you think is better and for the second option what are possible technologies to implement this?
The ReaderWriterLockSlim class seems like a good solution for synchronizing access to the shared resource.