In my client-server architecture I have few API functions which usage need to be limited.
Server is written in .net C# and it is running on IIS.
Until now I didn't need to perform any synchronization. Code was written in a way that even if client would send same request multiple times (e.g. create sth request) one call will end with success and all others with error (because of server code + db structure).
What is the best way to perform such limitations? For example I want no more that 1 call of API method: foo() per user per minute.
I thought about some SynchronizationTable which would have just one column unique_text and before computing foo() call I'll write something like foo{userId}{date}{HH:mm} to this table. If call end with success I know that there wasn't foo call from that user in current minute.
I think there is much better way, probably in server code, without using db for that. Of course, there could be thousands of users calling foo.
To clarify what I need: I think it could be some light DictionaryMutex.
For example:
private static DictionaryMutex FooLock = new DictionaryMutex();
FooLock.lock(User.GUID);
try
{
...
}
finally
{
FooLock.unlock(User.GUID);
}
EDIT:
Solution in which one user cannot call foo twice at the same time is also sufficient for me. By "at the same time" I mean that server started to handle second call before returning result for first call.
Note, that keeping this state in memory in an IIS worker process opens the possibility to lose all this data at any instant in time. Worker processes can restart for any number of reasons.
Also, you probably want to have two web servers for high availability. Keeping the state inside of worker processes makes the application no longer clustering-ready. This is often a no-go.
Web apps really should be stateless. Many reasons for that. If you can help it, don't manage your own data structures like suggested in the question and comments.
Depending on how big the call volume is, I'd consider these options:
SQL Server. Your queries are extremely simple and easy to optimize for. Expect 1000s of such queries per seconds per CPU core. This can bear a lot of load. You can use a SQL Express for free.
A specialized store like Redis. Stack Overflow is using Redis as a persistent, clustering-enabled cache. A good idea.
A distributed cache, like Microsoft Velocity. Or others.
This storage problem is rather easy because it fits a key/value store model well. And the data is near worthless so you don't even need to backup.
I think you're overestimating how costly this rate limitation will be. Your web-service is probably doing a lot more costly things than a single UPDATE by primary key to a simple table.
Related
I'm working on a web application that uses a number of external data sources for data that we need to display on the front end. Some of the external calls are expensive and some also comes with a monetary cost so we need a way to persist the result of these external requests to survive ie a app restart.
I've started with some proof of concept and my current solution is a combination of a persistent cache/storage (stores serialized json in files on disk) and a runtime cache. When the app starts it will populate runtime cache from the persistent cache, if the persistent cache is empty it would go ahead and call the webservices. Next time the app restarts we're loading from the persistent cache - avoiding the call to the external sources.
After the first population we want the cache to be update in the background with some kind of update process on a given schedule, we also want this update process to be smart enough to only update the cache if the request to the webservice was successful - otherwise keep the old version. Theres also a twist here, some webservices might return a complete collection while others requires one call per entity - so the update-process might differ depending on the concrete web service.
I'm thinking that this senario can't be totally unique, so I've looked around and done a fair bit of Googleing but I haven't fund any patterns or libraries that deals with something like this.
So what I'm looking for is any patterns that might be useful for us, if there is any C#-libraries or articles on the subject as well? I don't want to "reinvent the wheel". If anyone have solved similar problems I would love to hear more about how you approached them.
Thank you so much!
I'm looking for some feedback in regards to the best option for a problem I am working on.
To give you some background I recently inherited a broken business application (our project was using it, so we gained responsibility to fix it), I come from a SharePoint development background so a little C#, ASP.NET and SQL.
Currently we have an issue with the application where we continually receive timeout errors, I have narrowed it down to the web application calling a bunch of stored procedures to update status fields in other tables when something changes that might affect the status of other objects.
Without completely overhauling this application I have determined our best option is to offload these stored procedures to run in the background and not be tied to the UI. I've looked at a couple of options including:
Creating a separate thread to handle the execution. (Still times out)
Using BackgroundWorker (still times out, obviously it shouldn't but I can't seem to find out what is causing it to wait for the BackgroundWorker to finish)
Moving the Stored Proc execution to a job, which I then call from another SP. (This works, but the limitation is that I can only have one job running at once, and if multiple users update objects they then receive an exception because the job won't start)
Right now we have moved these stored procedures into a twice a day script, which updates all objects, however this is only a temporary fix.
I have two options that I'm looking at, and I'm hoping to get some guidance on the implementation of whatever you consider to be the best option:
Continue using the job and have the executing stored proc queue up items in a db which the job will loop through until empty. The executing stored proc will have to check if the job is running when it adds a new entry and then act accordingly.
It's been recommended that I look at using the Service Broker, but I am not familiar with it's use at all. I understand that it would likely be a better overall solution, as it allows me to queue up these updates in a more transactional way.
I think both these options are viable although I need some help in understanding the implementation of the second option. My other dilemma is with these stored procedures running anywhere from 45s to 20m how can I notify the user that changed the object that his/her updates have been made? This is where I fallback to using the job because i could simply add a user field into the 'queue' and have the stored proc send a quick email at the end.
Thoughts, suggestions? Maybe I'm over-thinking this?
If you are on .NET 4.5 and C# 5.0 use async and if you are on .NET 4.0 use TPL. They have the same underlying (almost) and async feature is built upon TPL (with some extra internals).
In any case TPL would be a proper choice.
Sounds like Service Broker would be an excellent solution to this problem. It's true that there is a bit of a learning curve to climb to get your head round how it works, but it's fundamentally pretty simple especially when your implementation is in a single database.
There's a good (and mercifully short) intro to how it works at http://msdn.microsoft.com/en-US/library/ms345108(v=SQL.90).aspx
Have a look at Asynchronous Procedure Execution. But I would look first if the updates can be improved, perhaps a simple index can eliminate the timeouts, and/or try to leverage snapshot isolation. These would be much simpler to try out w/o committing to 'major overhaul' of the application code.
I must also urge you to read Waits and Queues. This is a SQL Server methodology for identifying performance bottlenecks. Is a great way of narrowing down the problems of 'timeouts' to something more actionable (blocking, IO, indexes etc).
I am quite confused on which approach to take and what is best practice.
Lets say i have a C# application which does the following:
sends emails from a queue. Emails to send and all the content is stored in the DB.
Now, I know how to make my C# application almost scalable but I need to go somewhat further.
I want some form of responsibility of being able to distribute the tasks across say X servers. So it is not just 1 server doing all the processing but to share it amoungst the servers.
If one server goes down, then the load is shared between the other servers. I know NLB does this but im not looking for an NLB here.
Sure, you could add a column of some kind in the DB table to indicate which server should be assigned to process that record, and each of the applications on the servers would have an ID of some kind that matches the value in the DB and they would only pull their own records - but this I consider to be cheap, bad practice and unrealistic.
Having a DB table row lock as well, is not something I would do due to potential deadlocks and other possible issues.
I am also NOT indicating using threading "to the extreme" here but yes, there will be threading per item to process or batching them up per thread for x amount of threads.
How should I approach and what do you recommend on making a C# application which is scalable and has high availability? The aim is to have X servers, each with the same application and for each to be able to get records and process them but have the level of processing/items to process shared amoungst the servers so incase if one server or service fails, the other can take on that load until another server is put back.
Sorry for my lack of understanding or knowledge but have been thinking about this quite alot and had lack of sleep trying to think of a good robust solution.
I would be thinking of batching up the work, so each app only pulled back x number of records at a time, marking those retrieved records as taken with a bool field in the table. I'd amend the the SELECT statement to pull only records not marked as taken/done. Table locks would be ok in this instance for very short periods to ensure there is no overlap of apps processing the same records.
EDIT: It's not very elegant, but you could have a datestamp and a status for each entry (instead of a bool field as above). Then you could run a periodic Agent job which runs a sproc to reset the status of any records which have a status of In Progress but which have gone beyond a time threshold without being set to complete. They would be ready for reprocessing by another app later on.
This may not be enterprise-y enough for your tastes, but I'd bet my hide that there are plenty of apps out there in the enterprise which are just as un-sophisticated and work just fine. The best things work with the least complexity.
how much traffic is heavy traffic? what are the best resources for learning about heavy traffic web site development?.. like what are the approaches?
There are a lot of principles that apply to any web site, irrelevant of the underlying stack:
use HTTP caching facilities. For one there is the user agent cache. Second, the entire web backbone is full of proxies that can cache your requests, so use this to full advantage. A request that does even land on your server will add 0 to your load, you can't optimize better than that :)
corollary to the point above, use CDNs (Content Delivery Network, like CloudFront) for your static content. CSS, JPG, JS, static HTML and many more pages can be served from a CDN, thus saving the web server from a HTTP request.
second corollary to the first point: add expiration caching hints to your dynamic content. Even a short cache lifetime like 10 seconds will save a lot of hits that will be instead served from all the proxies sitting between the client and the server.
Minimize the number of HTTP requests. Seems basic, but is probably the best overlooked optimization available. In fact, Yahoo best practices puts this as the topmost optimization, see Best Practices for Speeding Up Your Web Site. Here is their bets practices list:
Minimize HTTP Requests
Use a Content Delivery Network
Add an Expires or a Cache-Control Header
Gzip Components
... (the list is quite long actually, just read the link above)
Now after you eliminated as much as possible from the superfluous hits, you still left with optimizing whatever requests actually hit your server. Once your ASP code starts to run, everything will pale in comparison with the database requests:
reduce number of DB calls per page. The best optimization possible is, obviously, not to make the request to the DB at all to start with. Some say 4 reads and 1 write per page are the most a high load server should handle, other say one DB call per page, still other say 10 calls per page is OK. The point is that fewer is always better than more, and writes are significantly more costly than reads. Review your UI design, perhaps that hit count in the corner of the page that nobody sees doesn't need to be that accurate...
Make sure every single DB request you send to the SQL server is optimized. Look at each and every query plan, make sure you have proper covering indexes in place, make sure you don't do any table scan, review your clustered index design strategy, review all your IO load, storage design, etc etc. Really, there is no short cut you can take her, you have to analyze and optimize the heck out of your database, it will be your chocking point.
eliminate contention. Don't have readers wait for writers. For your stack, SNAPSHOT ISOLATION is a must.
cache results. And usually this is were the cookie crumbles. Designing a good cache is actually quite hard to pull off. I would recommend you watch the Facebook SOCC keynote: Building Facebook: Performance at Massive Scale. Somewhere at slide 47 they show how a typical internal Facebook API looks like:
.
cache_get (
$ids,
'cache_function',
$cache_params,
'db_function',
$db_params);
Everything is requested from a cache, and if not found, requested from their MySQL back end. You probably won't start with 60000 servers thought :)
On the SQL Server stack the best caching strategy is one based on Query Notifications. You can almost mix it with LINQ...
I will define heavy traffic as traffic which triggers resource intensive work. Meaning, if one web request triggers multiple sql calls, or they all calculate pi with a lot of decimals, then it is heavy.
If you are returning static html, then your bandwidth is more of an issue than what a good server today can handle (more or less).
The principles are the same no matter if you use MVC or not when it comes to optimize for speed.
Having a decoupled architecture
makes it easier to scale by adding
more servers etc
Use a repository
pattern for data retrieval (makes
adding a cache easier)
Cache data
which is expensive to query
Data to
be written could be written thru a
cache, so that the client don't have
to wait for the actual database
commit
There's probably more ground rules as well. Maybe you can you say something about the architecture of your application, and how much load you need to plan for?
MSDN has some resources on this. This particular article is out of date, but is a start.
I would suggest also not limiting yourself to reading about the MVC stack: many principles are cross-platform.
The solution we developed uses a database (sqlserver 2005) for persistence purposes, and thus, all updated data is saved to the database, instead of sent to the program.
I have a front-end (desktop) that currently keeps polling the database for updates that may happen anytime on some critical data, and I am not really a fan of database polling and wasted CPU cycles with work that is being redone uselessly.
Our manager doesn't seem to mind us polling the database. The amount of data is small (less than 100 records) and the interval is high (1 min), but I am a coder. I do. Is there a better way to accomplish a task of keeping the data on memory as synced as possible with the data on the database? The system is developed using C# 3.5.
Since you're on SQL2005, you can use a SqlDependency to be notified of changes. Note that you can use it pretty effortlessly with System.Web.Caching.Cache, which, despite it's namespace runs just fine in a WinForms app.
First thought off the top of my head is a trigger combined with a message queue.
This may probably be overkill for your situation, but it may be interesting to take a look at the Microsoft Sync Framework
SQL Notification Services will allow you to have the database callback to an app based off a number of protocols. One method of implementation is to have the notification service create (or modify) a file on an accessible network share and have your desktop app react by using a FileSystemWatcher.
More information on Notification Services can be found at: http://technet.microsoft.com/en-us/library/aa226909(SQL.80).aspx
Please note that this may be a sledgehammer approach to a nut type problem though.
In ASP.NET, http://msdn.microsoft.com/en-us/library/ms178604(VS.80).aspx.
This may also be overkill but maybe you could implement some sort of caching mechanism. That is, when the data is written to the database, you could cache it at the same time and when you're trying to fetch data back from the DB, check the cache first.