I am quite confused on which approach to take and what is best practice.
Lets say i have a C# application which does the following:
sends emails from a queue. Emails to send and all the content is stored in the DB.
Now, I know how to make my C# application almost scalable but I need to go somewhat further.
I want some form of responsibility of being able to distribute the tasks across say X servers. So it is not just 1 server doing all the processing but to share it amoungst the servers.
If one server goes down, then the load is shared between the other servers. I know NLB does this but im not looking for an NLB here.
Sure, you could add a column of some kind in the DB table to indicate which server should be assigned to process that record, and each of the applications on the servers would have an ID of some kind that matches the value in the DB and they would only pull their own records - but this I consider to be cheap, bad practice and unrealistic.
Having a DB table row lock as well, is not something I would do due to potential deadlocks and other possible issues.
I am also NOT indicating using threading "to the extreme" here but yes, there will be threading per item to process or batching them up per thread for x amount of threads.
How should I approach and what do you recommend on making a C# application which is scalable and has high availability? The aim is to have X servers, each with the same application and for each to be able to get records and process them but have the level of processing/items to process shared amoungst the servers so incase if one server or service fails, the other can take on that load until another server is put back.
Sorry for my lack of understanding or knowledge but have been thinking about this quite alot and had lack of sleep trying to think of a good robust solution.
I would be thinking of batching up the work, so each app only pulled back x number of records at a time, marking those retrieved records as taken with a bool field in the table. I'd amend the the SELECT statement to pull only records not marked as taken/done. Table locks would be ok in this instance for very short periods to ensure there is no overlap of apps processing the same records.
EDIT: It's not very elegant, but you could have a datestamp and a status for each entry (instead of a bool field as above). Then you could run a periodic Agent job which runs a sproc to reset the status of any records which have a status of In Progress but which have gone beyond a time threshold without being set to complete. They would be ready for reprocessing by another app later on.
This may not be enterprise-y enough for your tastes, but I'd bet my hide that there are plenty of apps out there in the enterprise which are just as un-sophisticated and work just fine. The best things work with the least complexity.
Related
I have researched a lot and I haven't found anything that meets my needs. I'm hoping someone from SO can throw some insight into this.
I have an application where the expected load is thousands of jobs per customer and I can have 100s of customers. Currently it is 50 customers and close to 1000 jobs per each. These jobs are time sensitive (scheduled by customer) and can run up to 15 minutes (each job).
In order to scale and match the schedules, I'm planning to run this as multi threaded on a single server. So far so good. But the business wants to scale more (as needed) by adding more servers into the mix. Currently the way I have it is when it becomes ready in the database, a console application picks up first 500 and uses Task Parallel library to spawn 10 threads and waits until they are complete. I can't scale this to another server because that one could pick up the same records. I can't update a status on the db record as being processed because if the application crashes on one server, the job will be in limbo.
I could do a message queue and have multiple machines pick from it. The problem with this is the queue has to be transactional to support handling for any crashes. MSMQ supports only MS DTC transaction since it involves database and I'm not really comfortable with DTC transactions, especially with multi threads and multiple machines. Too much maintenance and set up and possibly unknown issues.
Is SQL service broker a good approach instead? Has anyone done something like this in a production environment? I also want to keep the transactions short (A job could run for 15,20 minutes - mostly streaming data from a service). The only reason I'm doing a transaction is to keep the message integrity of queue. I need the job to be re-picked if it crashes (re-appear in the queue)
Any words of wisdom?
Why not having an application receive the jobs and insert them in a table that will contain the queue of jobs. Each work process can then pick up a set of jobs and set the status as processing, then complete the work and set the status as done. Other info such as server name that processed each job, start and end time-stamp could also be logged. Moreover, instead of using multiple threads, you could use independent work processes so as to make your programming easier.
[EDIT]
SQL Server supports record level locking and lock escalation can also be prevented. See Is it possible to force row level locking in SQL Server?. Using such mechanism, you can have your work processes take exclusive locks on jobs to be processed, until they are done or crash (thereby releasing the lock).
In my client-server architecture I have few API functions which usage need to be limited.
Server is written in .net C# and it is running on IIS.
Until now I didn't need to perform any synchronization. Code was written in a way that even if client would send same request multiple times (e.g. create sth request) one call will end with success and all others with error (because of server code + db structure).
What is the best way to perform such limitations? For example I want no more that 1 call of API method: foo() per user per minute.
I thought about some SynchronizationTable which would have just one column unique_text and before computing foo() call I'll write something like foo{userId}{date}{HH:mm} to this table. If call end with success I know that there wasn't foo call from that user in current minute.
I think there is much better way, probably in server code, without using db for that. Of course, there could be thousands of users calling foo.
To clarify what I need: I think it could be some light DictionaryMutex.
For example:
private static DictionaryMutex FooLock = new DictionaryMutex();
FooLock.lock(User.GUID);
try
{
...
}
finally
{
FooLock.unlock(User.GUID);
}
EDIT:
Solution in which one user cannot call foo twice at the same time is also sufficient for me. By "at the same time" I mean that server started to handle second call before returning result for first call.
Note, that keeping this state in memory in an IIS worker process opens the possibility to lose all this data at any instant in time. Worker processes can restart for any number of reasons.
Also, you probably want to have two web servers for high availability. Keeping the state inside of worker processes makes the application no longer clustering-ready. This is often a no-go.
Web apps really should be stateless. Many reasons for that. If you can help it, don't manage your own data structures like suggested in the question and comments.
Depending on how big the call volume is, I'd consider these options:
SQL Server. Your queries are extremely simple and easy to optimize for. Expect 1000s of such queries per seconds per CPU core. This can bear a lot of load. You can use a SQL Express for free.
A specialized store like Redis. Stack Overflow is using Redis as a persistent, clustering-enabled cache. A good idea.
A distributed cache, like Microsoft Velocity. Or others.
This storage problem is rather easy because it fits a key/value store model well. And the data is near worthless so you don't even need to backup.
I think you're overestimating how costly this rate limitation will be. Your web-service is probably doing a lot more costly things than a single UPDATE by primary key to a simple table.
I am basically creating a site for recruiters. One of the functionality in my application requires posting to Facebook periodically. The posting frequency can be from 0(Never) to 4(High)
For Eg. If a recruiter has 4 open jobs and he has posting frequency set to 4, each job should be posted as per it's turn: 1st job on 1st day, 2nd job on 2nd, 3rd job on 3rd etc, on 5th day again 1st job (round robin fashion).
Had he set the posting frequency to 2, two jobs would be posted daily (thus each job would be posted every 2 days)
My only question is what type of threading should I create for this since this is all dynamic!! Also, any guidelines on what type of information should I store in database?
I need just a general strategy to solve this problem. No code..
I think you need to seperate it from your website, I mean its better to run the logic for posting jobs in a service hosted on IIS ( I am not sure such a thing exists or not, but I guess there is).
Also you need to have table for job queue to remember which jobs need to be posted, then your service would pick them up and post them one by one.
To decide if this is the time for posting a job you can define a timer with a configurable interval to check if there is any job to post or not.
Make sure that you keep the verbose log details if posting fails. It is important because it is possible that Facebook changes its API or your API key becomes invalid or anything else then you need to know what happened.
Also I strongly suggest to have a webpage for reporting the status of jobs-to-post queue, if they failed what was the causes of problem.
If you program runs non-stop, you can just use one of the Timer classes available in .NET framework, without the need to go for full-blown concurrency (e.g. via Task Parallel Library).
I suspect, though, that you'll need more than that - some kind of mechanism to detect which jobs were successfully posted and which were "missed" due program not running (or network problems etc.), so they can be posted the next time the program is started (or network becomes available). A small local database (such as SQLite or MS SQL Server Compact) should serve this purpose nicely.
If the requirements are as simple as you described, then I wouldn't use threading at all. It wouldn't even need to be a long-running app. I'd create a simple app that would just try to post a job and then exit immediately. However, I would scheduled it to run once every given period (via Windows Task Scheduler).
This app would check first if it hasn't posted any job yet for the given posting frequency. Maybe put a "Last-Successful-Post-Time" setting in your datastore. If it's allowed to post, the app would just query the highest priority job and then post it to Facebook. Once it successfully posts to Facebook, that job would then be downgraded to the lowest priority.
The job priority could just be a simple integer column in your data store. Lower values mean higher priorities.
Edit:
I guess what I'm suggesting is if you have clear boundaries in your requirements, I would suggest breaking your project into multiple applications. This way there is a separation of concerns. You wouldn't then need to worry how to spawn your Facebook notification process inside your web site code.
I have a requirement to monitor the Database rows continuously to check for the Changes(updates). If there are some changes or updates from the other sources the Event should be fired on my application (I am using a WCF). Is there any way to listen the database row continuously for the changes?
I may be having more number of events to monitor different rows in the same table. is there any problem in case of performance. I am using C# web service to monitor the SQL Server back end.
You could use an AFTER UPDATE trigger on the respective tables to add an item to a SQL Server Service Broker queue. Then have the queued notifications sent to your web service.
Another poster mentioned SqlDependency, which I also thought of mentioning but the MSDN documentation is a little strange in that it provides a windows client example but also offers this advice:
SqlDependency was designed to be used
in ASP.NET or middle-tier services
where there is a relatively small
number of servers having dependencies
active against the database. It was
not designed for use in client
applications, where hundreds or
thousands of client computers would
have SqlDependency objects set up for
a single database server.
Ref.
I had a very similar requirement some time ago, and I solved it using a CLR SP to push the data into a message queue.
To ease deployment, I created an CLR SP with a tiny little function called SendMessage that was just pushing a message into a Message Queue, and tied it to my tables using an AFTER INSERT trigger (normal trigger, not CLR trigger).
Performance was my main concern in this case, but I have stress tested it and it greatly exceeded my expectations. And compared to SQL Server Service Broker, it's a very easy-to-deploy solution. The code in the CLR SP is really trivial as well.
Monitoring "continuously" could mean every few hours, minutes, seconds or even milliseconds. This solution might not work for millisecond updates: but if you only have to "monitor" a table a few times a minute you could simply have an external process check a table for updates. (If there is a DateTime column present.) You could then process the changed or newly added rows and perform whatever notification you need to. So you wouldn't be listening for changes, you'd be checking for them. One benefit of doing the checking in this manner would be that you wouldn't risk as much of a performance hit if a lot of rows were updated during a given quantum of time since you'd bulk them together (as opposed to responding to each and every change individually.)
I pondered the idea of a CLR function
or something of the sort that calls
the service after successfully
inserting/updating/deleting data from
the tables. Is that even good in this
situation?
Probably it's not a good idea, but I guess it's still better than getting into table trigger hell.
I assume your problem is you want to do something after every data modification, let's say, recalculate some value or whatever. Letting the database be responsible for this is not a good idea because it can have severe impacts on performance.
You mentioned you want to detect inserts, updates and deletes on different tables. Doing it the way you are leaning towards, this would require you to setup three triggers/CLR functions per table and have them post an event to your WCF Service (is that even supported in the subset of .net available inside sql server?). The WCF Service takes the appropriate actions based on the events received.
A better solution for the problem would be moving the responsibility for detecting data modification from your database to your application. This can actually be implemented very easily and efficiently.
Each table has a primary key (int, GUID or whatever) and a timestamp column, indicating when the entry was last updated. This is a setup you'll see very often in optimistic concurrency scenarios, so it may not even be necessary to update your schema definitions. Though, if you need to add this column and can't offload updating the timestamp to the application using the database, you just need to write a single update trigger per table, updating the timestamp after each update.
To detect modifications, your WCF Service/Monitoring application builds up a local dictionay (preferably a hashtable) with primary key/timestamp pairs at a given time interval. Using a coverage index in the database, this operation should be really fast. The next step is to compare both dictionaries and voilá, there you go.
There are some caveats to this approach though. One of them is the sum of records per table, another one is the update frequency (if it gets too low it's ineffective) and yet another pinpoint is if you need access to the data previous to modification/insertion.
Hope this helps.
Why don't you use SQL Server Notification service? I think that's the exact thing you are looking for. Go through the documentation of notification services and see if that fits your requirement.
I think there's some great ideas here; from the scalability perspective I'd say that externalizing the check (e.g. Paul Sasik's answer) is probably the best one so far (+1 to him).
If, for some reason, you don't want to externalize the check, then another option would be to use the HttpCache to store a watcher and a callback.
In short, when you put the record in the DB that you want to watch, you also add it to the cache (using the .Add method) and set a SqlCacheDependency on it, and a callback to whatever logic you want to call when the dependency is invoked and the item is ejected from the cache.
I have one BIG table(90k rows, size cca 60mb) which holds info about free rooms capacities for about 50 hotels. This table has very few updates/inserts per hour.
My application sends async requests to this(and joined tables) at max 30 times per sec.
When i start 30 threads(with default AppPool class at .NET 3.5 C#) at one time(with random valid sql query string), only few(cca 4) are processed asynchronously and other threads waits. Why?
Is it becouse of SQL SERVER 2008 table locking, or becouse of .NET core? Or something else?
If it is a SQL problem, can help if i split this big table into one table per each hotel model?
My goal is to have at least 10 threads servet at a time.
This table is tiny. It's doesn't even qualify as a "medium sized" table. It's trivial.
You can be full table scanning it 30 times per second, you can be copying the whole thing in ram, no server is going to be the slightest bit bothered.
If your data fits in ram, databases are fast. If you don't find this, you're doing something REALLY WRONG. Therefore I also think the problems are all on the client side.
It is more than likely on the .NET side. If it were table locking more threads would be processing, but they would be waiting on their queries to return. If I remember correctly there's a property for thread pools that controls how many actual threads they create at once. If there are more pending threads than that number, then they get in line and wait for running threads to finish. Check that.
Have you tried changing the transaction isolation level?
Even when reading from a table Sql Server will lock the table
try setting the isolation level to read uncommitted and see if that improves the situation,
but be advised that its feasible that you will read 'dirty' data make sure you understand the ramifications if this is the solution
this link explains what it is.
link text
Rather than ask, measure. Each of your SQL queries that is actually submitted by your application will create a request on the server, and the sys.dm_exec_requests DMV shows the state of each request. When the request is blocked the wait_type column shows a non-empty value. You can judge from this whether your requests are blocked are not. If they are blocked you'll also know the reason why they are blocked.