I'm looking for some feedback in regards to the best option for a problem I am working on.
To give you some background I recently inherited a broken business application (our project was using it, so we gained responsibility to fix it), I come from a SharePoint development background so a little C#, ASP.NET and SQL.
Currently we have an issue with the application where we continually receive timeout errors, I have narrowed it down to the web application calling a bunch of stored procedures to update status fields in other tables when something changes that might affect the status of other objects.
Without completely overhauling this application I have determined our best option is to offload these stored procedures to run in the background and not be tied to the UI. I've looked at a couple of options including:
Creating a separate thread to handle the execution. (Still times out)
Using BackgroundWorker (still times out, obviously it shouldn't but I can't seem to find out what is causing it to wait for the BackgroundWorker to finish)
Moving the Stored Proc execution to a job, which I then call from another SP. (This works, but the limitation is that I can only have one job running at once, and if multiple users update objects they then receive an exception because the job won't start)
Right now we have moved these stored procedures into a twice a day script, which updates all objects, however this is only a temporary fix.
I have two options that I'm looking at, and I'm hoping to get some guidance on the implementation of whatever you consider to be the best option:
Continue using the job and have the executing stored proc queue up items in a db which the job will loop through until empty. The executing stored proc will have to check if the job is running when it adds a new entry and then act accordingly.
It's been recommended that I look at using the Service Broker, but I am not familiar with it's use at all. I understand that it would likely be a better overall solution, as it allows me to queue up these updates in a more transactional way.
I think both these options are viable although I need some help in understanding the implementation of the second option. My other dilemma is with these stored procedures running anywhere from 45s to 20m how can I notify the user that changed the object that his/her updates have been made? This is where I fallback to using the job because i could simply add a user field into the 'queue' and have the stored proc send a quick email at the end.
Thoughts, suggestions? Maybe I'm over-thinking this?
If you are on .NET 4.5 and C# 5.0 use async and if you are on .NET 4.0 use TPL. They have the same underlying (almost) and async feature is built upon TPL (with some extra internals).
In any case TPL would be a proper choice.
Sounds like Service Broker would be an excellent solution to this problem. It's true that there is a bit of a learning curve to climb to get your head round how it works, but it's fundamentally pretty simple especially when your implementation is in a single database.
There's a good (and mercifully short) intro to how it works at http://msdn.microsoft.com/en-US/library/ms345108(v=SQL.90).aspx
Have a look at Asynchronous Procedure Execution. But I would look first if the updates can be improved, perhaps a simple index can eliminate the timeouts, and/or try to leverage snapshot isolation. These would be much simpler to try out w/o committing to 'major overhaul' of the application code.
I must also urge you to read Waits and Queues. This is a SQL Server methodology for identifying performance bottlenecks. Is a great way of narrowing down the problems of 'timeouts' to something more actionable (blocking, IO, indexes etc).
Related
In my client-server architecture I have few API functions which usage need to be limited.
Server is written in .net C# and it is running on IIS.
Until now I didn't need to perform any synchronization. Code was written in a way that even if client would send same request multiple times (e.g. create sth request) one call will end with success and all others with error (because of server code + db structure).
What is the best way to perform such limitations? For example I want no more that 1 call of API method: foo() per user per minute.
I thought about some SynchronizationTable which would have just one column unique_text and before computing foo() call I'll write something like foo{userId}{date}{HH:mm} to this table. If call end with success I know that there wasn't foo call from that user in current minute.
I think there is much better way, probably in server code, without using db for that. Of course, there could be thousands of users calling foo.
To clarify what I need: I think it could be some light DictionaryMutex.
For example:
private static DictionaryMutex FooLock = new DictionaryMutex();
FooLock.lock(User.GUID);
try
{
...
}
finally
{
FooLock.unlock(User.GUID);
}
EDIT:
Solution in which one user cannot call foo twice at the same time is also sufficient for me. By "at the same time" I mean that server started to handle second call before returning result for first call.
Note, that keeping this state in memory in an IIS worker process opens the possibility to lose all this data at any instant in time. Worker processes can restart for any number of reasons.
Also, you probably want to have two web servers for high availability. Keeping the state inside of worker processes makes the application no longer clustering-ready. This is often a no-go.
Web apps really should be stateless. Many reasons for that. If you can help it, don't manage your own data structures like suggested in the question and comments.
Depending on how big the call volume is, I'd consider these options:
SQL Server. Your queries are extremely simple and easy to optimize for. Expect 1000s of such queries per seconds per CPU core. This can bear a lot of load. You can use a SQL Express for free.
A specialized store like Redis. Stack Overflow is using Redis as a persistent, clustering-enabled cache. A good idea.
A distributed cache, like Microsoft Velocity. Or others.
This storage problem is rather easy because it fits a key/value store model well. And the data is near worthless so you don't even need to backup.
I think you're overestimating how costly this rate limitation will be. Your web-service is probably doing a lot more costly things than a single UPDATE by primary key to a simple table.
At the minute I am trying to put together an asynchronous tcp server to receive data which I then want to process, extracting values and inserting to sql server.
The basic concept I thought would be best is once the data is received and confirmed as the entire message, the message should then be passed of to some sort of collection to await processing on a FIFO basis, which will parse the values and insert them to sql server. I suppose this is whats known as the consumer/producer pattern.
I have been doing some looking into the best collection / way of doing this and have so far seen the BlockingCollection,ConcurrentCollection and BufferBlock using async/await and i think this may be the way to go but to be honest im not sure.
The best example i have found is on Stephen Cleary's blog in particular this article,
http://blog.stephencleary.com/2012/11/async-producerconsumer-queue-using.html
My main reservations are that I in no way want to slow down or interrupt the receiving of messages which to me would suggest using the multiple producer/consumer example which can be seen at the above link, but what i want to know is;
Am i correct in this assumption or is there a more suitable way of doing this in my scenario.
And if im correct in my assumption could anyone suggest the best way of implementing this taking into consideration my use case.
Any and all help is much appreciated.
At the minute I am trying to put together an asynchronous tcp server to receive data which I then want to process, extracting values and inserting to sql server.
There's a common pitfall with this kind of scenario. It is usually wrong to report success back to the client when the work has yet to be done. Most of the time I've seen this design, it's because of an efficiency "requirement" self-imposed by the developer, not by the client or for technical reasons. So first, take a step back and make absolutely sure that you do want to return a "successful completion" message to the client when the operation has not actually completed yet.
If you are sure that's what you want to do, then there's another question you must ask: is it acceptable to lose requests? That is, after you tell the client that the operation successfully completed, will the system still be stable if the operation does not actually ever complete?
The answer to that question is usually "no." At that point, the most common architectural solution is to have an out-of-process reliable queue (such as an Azure queue or MSMQ), with an independent backend (such as an Azure worker role or Win32 service) that processes the queue messages. This definitely complicates the architecture, but it is a necessary complication if the system must return completion messages early and must not lose messages.
On the other hand, if losing messages is acceptable, then you can keep them in-memory. It is only in this case that you can use one of the in-memory producer/consumer types mentioned on my blog. This is a very rare situation, but it does happen from time to time.
In general, I would avoid using BlockingCollection and friends for this sort of work. Doing so encourages you to architect the entire system into a single process, which is the enemy of scalability and reliability.
I second Stephen Cleary's suggestion of using an out-of-process queue to manage the work. I disagree that this necessarily complicates the architecture, though - in fact, I think it can make things quite a bit simpler. Specifically, a major complication of the original requirement ("put together an asynchronous tcp server") disappears. Asynchronous TCP servers are a pain in the butt to write and easy to screw up - why not just skip that part altogether and be free to focus all of your energy on the post-processing code?
When I built a system like this, I used a Redis List as the task queue. Tasks were serialized to JSON, and clients would add their task to the queue with a RPUSH command. Worker processes retrieve the next task from the queue BLPOP, do their thing, then go back to waiting for the next task.
Advantages:
No locks. All synchronization comes for free from Redis (or whatever task queue you choose).
Everything in the system is single-threaded. Multi-threading is hard.
I'm free to spin up as many worker processes as I want, across as many nodes as I want.
I know several topics on the subject have been discussed, because I have been reading a lot to try to resolve my issue, but somehow they happen to not fulfill my needs (maybe for the lack of detail). Anyway, if you think some specific 'topic' might be useful, please link it.
I'm developing a desktop application with WPF (and MVVM) and I'm using NHibernate. After researching about possible ways to manage my session, I have decided to use the session-per-form approach. By this way, I think I can fully use the features of NHibernate like lazy-loading, cache and so on.
As I'm working with a database, I don't want to freeze my UI while I'm loading or saving my entities, so I thought I should use a dedicated thread (in each form, which I think simplifies the development) to handle the database interaction. The problem, though, is how I should 'reuse' the thread (supposing I have a session associated with that thread) to make my 'database calls'.
I think I couldn't use TPL because I'm not guaranteed that the two tasks would run in the same thread (it's not even guaranteed that they will be run in different threads than the invoker)
I would prefer to use session-per-form, as I have seen similar discussions that end by using session-per-conversation or something like that. But anyway, if you find that session-per-conversation would be better, please tell me (and hopefully explain why)
Threads don't provide a way to directly run more than one method, so I think I would have to 'listen' for requests, but I'm still unsure if I really have to do this and how I would 'use' the session (and save it) only inside the thread.
EDIT:
Maybe I'm having this problem because I'm confusing thread-safety with something else.
When the NHibernate documentation says that ISession instances are not thread-safe, does it means that I will (or could) get into trouble if two threads attempt to use it at the same time, right? In my case, if I use TPL, different threads could use the same session, but I wouldn't perform more than one operation in the same session at the same time. So, would I get into trouble in that situation?
If I may make a suggestion, desktop applications are poorly suited to interact with the database directly. The communication is not encrypted and it's really easy for someone with even the slightest amount of know-how to grab the database password and begin messing with records using a SQL connection and corrupt your database.
It would be better to create a web service with authentication that stands between the desktop application and the database as you could create credentials for each person and every transaction would be forcibly subjected to your various business rules.
This would also take care of your threading issue as you would be able to create HTTP connections on another thread with little to no trouble concerning session management. A cookie value is likely all that would be required and RestSharp makes this fairly trivial.
We ran into strange sql / linq behaviour today:
We used to use a web application to perform some intensive database actions on our system. Recently we moved to a winforms interface for various reasons.
We found out that performance has seriously decreased: an action that used to take about 15 minutes now takes as long as one whole hour. The strange thing is that It's the exact same method being called. The method performs quite a bit of read / write using linq2sql, and profiling on the client machine showed that the problematic section is on the SQL action itself, in the linq's "Save" method.
The only difference between the cases is that on one case the method is called from a web application's code behind (MVC in this case), and on the other from a windows form.
The one idea I could come up with is that SQL performance has something to do with the identity of the user accessing the db, but I could not find any support for that assumption.
Any ideas?
Did you run both tests from the same machine? If not hardware differences could be the issue... or network... one could be in a higher speed section of your network... like in the same vlan as the sql server. Try running the client code on the same server the web app was running on.
Also if your app is updating progress in a sycronous manner the app could be waiting a long time for display to update... as apposed to working with a stream ala response.write.
If you are actually outputting progress as you go you should make sure that the progress updates are events and that the display of those happens on another thread so that the processing isn't waiting on display. Actually you probably should put the processing on its own thread... and just have an event handler take care of the updates... that is a whole different discussion. The point is that your app could be waiting to update the display of progress.
It's a very old issue but I happened to run into the question just now. So for whom is may concern nowadays, the solution (and there-before the problem) was frustratingly silly. Linq2SQL was configured on the dev machines to constantly write a log to console.
This was causing a huge delay due to the simple act of outputing large amount of text to the console. On the web server the log was not being written, and therefore - no performance drawback. There was a colossal face-palming once we figured this one out. Thanks for the helpers, I hope this answer will help someone solve it faster next time.
Unattended logging. That was the problem.
I have a requirement to monitor the Database rows continuously to check for the Changes(updates). If there are some changes or updates from the other sources the Event should be fired on my application (I am using a WCF). Is there any way to listen the database row continuously for the changes?
I may be having more number of events to monitor different rows in the same table. is there any problem in case of performance. I am using C# web service to monitor the SQL Server back end.
You could use an AFTER UPDATE trigger on the respective tables to add an item to a SQL Server Service Broker queue. Then have the queued notifications sent to your web service.
Another poster mentioned SqlDependency, which I also thought of mentioning but the MSDN documentation is a little strange in that it provides a windows client example but also offers this advice:
SqlDependency was designed to be used
in ASP.NET or middle-tier services
where there is a relatively small
number of servers having dependencies
active against the database. It was
not designed for use in client
applications, where hundreds or
thousands of client computers would
have SqlDependency objects set up for
a single database server.
Ref.
I had a very similar requirement some time ago, and I solved it using a CLR SP to push the data into a message queue.
To ease deployment, I created an CLR SP with a tiny little function called SendMessage that was just pushing a message into a Message Queue, and tied it to my tables using an AFTER INSERT trigger (normal trigger, not CLR trigger).
Performance was my main concern in this case, but I have stress tested it and it greatly exceeded my expectations. And compared to SQL Server Service Broker, it's a very easy-to-deploy solution. The code in the CLR SP is really trivial as well.
Monitoring "continuously" could mean every few hours, minutes, seconds or even milliseconds. This solution might not work for millisecond updates: but if you only have to "monitor" a table a few times a minute you could simply have an external process check a table for updates. (If there is a DateTime column present.) You could then process the changed or newly added rows and perform whatever notification you need to. So you wouldn't be listening for changes, you'd be checking for them. One benefit of doing the checking in this manner would be that you wouldn't risk as much of a performance hit if a lot of rows were updated during a given quantum of time since you'd bulk them together (as opposed to responding to each and every change individually.)
I pondered the idea of a CLR function
or something of the sort that calls
the service after successfully
inserting/updating/deleting data from
the tables. Is that even good in this
situation?
Probably it's not a good idea, but I guess it's still better than getting into table trigger hell.
I assume your problem is you want to do something after every data modification, let's say, recalculate some value or whatever. Letting the database be responsible for this is not a good idea because it can have severe impacts on performance.
You mentioned you want to detect inserts, updates and deletes on different tables. Doing it the way you are leaning towards, this would require you to setup three triggers/CLR functions per table and have them post an event to your WCF Service (is that even supported in the subset of .net available inside sql server?). The WCF Service takes the appropriate actions based on the events received.
A better solution for the problem would be moving the responsibility for detecting data modification from your database to your application. This can actually be implemented very easily and efficiently.
Each table has a primary key (int, GUID or whatever) and a timestamp column, indicating when the entry was last updated. This is a setup you'll see very often in optimistic concurrency scenarios, so it may not even be necessary to update your schema definitions. Though, if you need to add this column and can't offload updating the timestamp to the application using the database, you just need to write a single update trigger per table, updating the timestamp after each update.
To detect modifications, your WCF Service/Monitoring application builds up a local dictionay (preferably a hashtable) with primary key/timestamp pairs at a given time interval. Using a coverage index in the database, this operation should be really fast. The next step is to compare both dictionaries and voilá, there you go.
There are some caveats to this approach though. One of them is the sum of records per table, another one is the update frequency (if it gets too low it's ineffective) and yet another pinpoint is if you need access to the data previous to modification/insertion.
Hope this helps.
Why don't you use SQL Server Notification service? I think that's the exact thing you are looking for. Go through the documentation of notification services and see if that fits your requirement.
I think there's some great ideas here; from the scalability perspective I'd say that externalizing the check (e.g. Paul Sasik's answer) is probably the best one so far (+1 to him).
If, for some reason, you don't want to externalize the check, then another option would be to use the HttpCache to store a watcher and a callback.
In short, when you put the record in the DB that you want to watch, you also add it to the cache (using the .Add method) and set a SqlCacheDependency on it, and a callback to whatever logic you want to call when the dependency is invoked and the item is ejected from the cache.