I have a C# Web App. I have multiple databases where data is the same, so I can use a Round Robin method to distribute the Database calls.
I plan to read in each connection string, and iterate through each DB and return the data for the first call that passes.
I would like to record the last database that was used, so I can try the next database in the list for the next call that comes in.
A database seems overkill for this, so could I use a static List to track this and lock the read and update of the list?
In terms of doing a round robin approach, you can definitely use a List with a Lock, but I make some general comments below which might be helpful.
You are trying to implement a Network Load Balancer here. You will face a couple of problems. IIS will happily spin up multiple threads of calls to your website if it receives multiple requests before the first one has completed. Secondly if your website is in a WebGarden (multiple instances of IIS on same computer) or runnning as a WebFarm (multiple instances of the OS) or is on Azure or some other cloud platform, then those multiple isntances of your web call might not even be on the same machine (or VM), so you need to be clear that it will be almost impossible to generate a true round robin series of database hits on a properly scalable website.
I'm not sure creating a new synchronisation point between all your web threads is a good idea for scalability. Round Robin is also not the best use of resources - if you want your website to run as fast as possible using as few resources as possible (generally why a NLB system is put in place) then use a Pool based approach to leasing an open database connection rather than iterating around a set of open database connections. The calling code gets handed the next connection which has not been released back into the pool.
Related
In my client-server architecture I have few API functions which usage need to be limited.
Server is written in .net C# and it is running on IIS.
Until now I didn't need to perform any synchronization. Code was written in a way that even if client would send same request multiple times (e.g. create sth request) one call will end with success and all others with error (because of server code + db structure).
What is the best way to perform such limitations? For example I want no more that 1 call of API method: foo() per user per minute.
I thought about some SynchronizationTable which would have just one column unique_text and before computing foo() call I'll write something like foo{userId}{date}{HH:mm} to this table. If call end with success I know that there wasn't foo call from that user in current minute.
I think there is much better way, probably in server code, without using db for that. Of course, there could be thousands of users calling foo.
To clarify what I need: I think it could be some light DictionaryMutex.
For example:
private static DictionaryMutex FooLock = new DictionaryMutex();
FooLock.lock(User.GUID);
try
{
...
}
finally
{
FooLock.unlock(User.GUID);
}
EDIT:
Solution in which one user cannot call foo twice at the same time is also sufficient for me. By "at the same time" I mean that server started to handle second call before returning result for first call.
Note, that keeping this state in memory in an IIS worker process opens the possibility to lose all this data at any instant in time. Worker processes can restart for any number of reasons.
Also, you probably want to have two web servers for high availability. Keeping the state inside of worker processes makes the application no longer clustering-ready. This is often a no-go.
Web apps really should be stateless. Many reasons for that. If you can help it, don't manage your own data structures like suggested in the question and comments.
Depending on how big the call volume is, I'd consider these options:
SQL Server. Your queries are extremely simple and easy to optimize for. Expect 1000s of such queries per seconds per CPU core. This can bear a lot of load. You can use a SQL Express for free.
A specialized store like Redis. Stack Overflow is using Redis as a persistent, clustering-enabled cache. A good idea.
A distributed cache, like Microsoft Velocity. Or others.
This storage problem is rather easy because it fits a key/value store model well. And the data is near worthless so you don't even need to backup.
I think you're overestimating how costly this rate limitation will be. Your web-service is probably doing a lot more costly things than a single UPDATE by primary key to a simple table.
I am quite confused on which approach to take and what is best practice.
Lets say i have a C# application which does the following:
sends emails from a queue. Emails to send and all the content is stored in the DB.
Now, I know how to make my C# application almost scalable but I need to go somewhat further.
I want some form of responsibility of being able to distribute the tasks across say X servers. So it is not just 1 server doing all the processing but to share it amoungst the servers.
If one server goes down, then the load is shared between the other servers. I know NLB does this but im not looking for an NLB here.
Sure, you could add a column of some kind in the DB table to indicate which server should be assigned to process that record, and each of the applications on the servers would have an ID of some kind that matches the value in the DB and they would only pull their own records - but this I consider to be cheap, bad practice and unrealistic.
Having a DB table row lock as well, is not something I would do due to potential deadlocks and other possible issues.
I am also NOT indicating using threading "to the extreme" here but yes, there will be threading per item to process or batching them up per thread for x amount of threads.
How should I approach and what do you recommend on making a C# application which is scalable and has high availability? The aim is to have X servers, each with the same application and for each to be able to get records and process them but have the level of processing/items to process shared amoungst the servers so incase if one server or service fails, the other can take on that load until another server is put back.
Sorry for my lack of understanding or knowledge but have been thinking about this quite alot and had lack of sleep trying to think of a good robust solution.
I would be thinking of batching up the work, so each app only pulled back x number of records at a time, marking those retrieved records as taken with a bool field in the table. I'd amend the the SELECT statement to pull only records not marked as taken/done. Table locks would be ok in this instance for very short periods to ensure there is no overlap of apps processing the same records.
EDIT: It's not very elegant, but you could have a datestamp and a status for each entry (instead of a bool field as above). Then you could run a periodic Agent job which runs a sproc to reset the status of any records which have a status of In Progress but which have gone beyond a time threshold without being set to complete. They would be ready for reprocessing by another app later on.
This may not be enterprise-y enough for your tastes, but I'd bet my hide that there are plenty of apps out there in the enterprise which are just as un-sophisticated and work just fine. The best things work with the least complexity.
I would like to maintain a list of objects that is distributed between N load balanced servers: whenever a client changes the list on one server, I would like these changes to migrate to the other servers. So, I guess this is a case of master-master replication.
What is the simplest way of handling this? One simplifying fact is that each change to an object in the list has an associated increasing version number attached to it. So, it is possible to resolve conflicts if an item was changed on two different servers, and these two deltas make their way to a third server.
Edit: clarification: I am quite familiar with distributed key-value stores like Memcached and Redis. That is not the issue here; what I am interested in is a mechanism to resolve conflicts in a shared list: if server A changes an item in the list, and server B removes the item, for example, how to resolve the conflict programmatically.
I suggest memcached. It's a distributed server cache system that seems to fit your needs perfectly. Check out this link:
Which .NET Memcached client do you use, EnyimMemcached vs. BeITMemcached?
If passing the entire list doesn't suit you (I don't know if memcached is smart enough to diff your lists) then I would suggest giving the old DataSet object a look, as its diff grams should be well suited for passing about just deltas if your data set is large.
Put your changes in a queue. Have each server look at the queue, and act upon it.
For example, queue could have:
add item #33
remove item #55
update item #22
and so on
Upon doing a change, write to the queue, and have each server pick up items from the queue and update its list according to that.
I did in-memory database with such method, and it worked perfectly on multiple 'servers'.
EDIT:
When servers want to update each other, that has to happen:
Each server that updates will put an UPDATE (or ADD or DELETE) request into the queue for all other servers. Each server should also store the list of queued requests that originated from it so it will not load its own updates from the queue.
Does each server have it's own version of List locally cached or do you plan to use a centralized caching layer?
As suggested, you can have a centralized "push" process which works off a centralized queue. Any changes submitted by any server are en-queued, and the "push" process can push updates to all the servers via some remoting / WebService mechanism.
This offers the advantage of any changes/updates/deletes being applied at once (or close in time) to all the servers, centralized validation or logging if needed. This also solves the problem of multiple updates - the latest one takes precedence.
I've seen this implemented as a windows service which has an internal queue (can be persisted to DB async for resiliency) which manages the queue and simply takes items one by one, validates the item, loggs change/content and finally pushes it to local Lists via WebService calls to each web server (servers maintain in-memory list which simply gets updated/added/deleted as needed).
There are algorithms that can be used to syncronize Distributed systems.
In your case you need an algorithms that given two events on the system tells you wich one of them happened firts. If you can decide for any two events wich is the first one then all the conflicts could be resolved.
I recommend you to use Lamport Clocks.
If you're on a Windows platform, I suggest you take a look at "Windows Server AppFabric", and especially the Caching feature. The name is funky, but I think it's exactly what you're looking for, I quote:
A distributed in-memory cache that provides .NET applications with
high-speed access, scale, and high availability to application data.
I have an application that once started will get some initial data from my database and after that some functions may update or insert data to it.
Since my database is not on the same computer of the one running the application and I would like to be able to freely move the application server around, I am looking for a more flexible way to insert/update/query data as needed.
I was thinking of using an website API on a separated thread on my application with some kinda of list where this thread will try to update the data every X minutes and if a given entry is updated it will be removed from the list.
This way instead of being held by the database queries and the such the application would run freely queuing what has to be update/inserted etc
The main point here is so I can run the functions without worrying about connectivity issues to the database end, or issues related, since all the changes are queued to be updated on it.
Is this approach ok ? bad ? are the better recommendations for this scenario ?
On "can access DB through some web server instead of talking directly to DB server": yes this is very common and recommended approach. It is much easier to limit set of operations exposed through custom API (web services, REST services, ...) than restrict direct communication with DB.
On "sync on separate thread..." - you need to figure out what are requirements of the synchronization. Delayed sync may be ok if you don't need to know latest data and not care if updates from client are commited to storage immediately.
I have a require ment to read data from a table(SQL 2005) and send that data to other application for every 5 seconds. I am looking for the best approach to do the same.
Right now I am planning to write a console application(.NET and C#) which will read the data from sql server 2005(QUEUE table which will be filled through different applications) and send to other application through TCP/IP(Central server). Run that console application under schedule task for every 5 seconds. I am assuming scheduled task will take care to discard new run event if task is already running(avoid to run concurrent executions).
Does any body come accross similar situation? Please share your experience and advice me for best approach.
Thanks in advance for your valuable time spending for my request.
-Por-hills-
We have done simliar work. If you are going to query a sql database every 5 seconds, be sure to use a stored procedure that is optimized to be very fast. It should not update data unless aboslutely necessary. This approach is typically called 'polling' and I've found that it is acceptable if your sqlserver is not otherwise bogged down with too many other calls.
In approaches we've used, a Windows Service that does the polling works well.
To communicate results to another app, it all depends on what your other app is doing and what type of interface you can make into it, and how quickly you need the results. The WCF class libraries from Microsoft provide many workable approaches for real time communication. My preference is to write to the applications database, and then have the application read the data (if it works for that app). If you need something real time, WCF is the way to go, and I'd suggest using a stateless protocol like http if < 5 sec response time is required, (using standard HTTP posts), or TCP/IP if subsecond response time is required.
since I assume your central storage is also SQL 2005, have you considered using what SQL Server 2005 offers out of the box to achieve your requirements? Rather than pool every 5 seconds, marshal and unmarshal TCP/IP, implement authentication and authorization for the TCP/IP pipe, scale TCP transmission with boxcaring, manage message acknowledgments and retries, deal with central site availability, fragment large messages, implement fairness in transmission and so on and so forth, why not simply use Service Broker? It does all you need and more, out of the box, already tested, already tuned for performance and scalability.
Getting reliable messaging right is not trivial and you should focus your efforts in meeting your business specifics, not reiventing the wheel.
I would recommend writing a Windows Service (since you are C#) that has some timer which runs every 5 seconds. That way you wont be starting and stopping an application all the time, it can run even when there is no one logged into the machine, and it will automatically start when the machine is restarted.
For one of my projects, I needed to do something periodically. I opted for a service and set up a timer that takes care of reading the data. You might consider that solution. It has worked well for me.
I suggest to create a windows service and not an application and to perform the timing yourself - create a timer and execute one step on each timer event. For the communication you have many choices - I would consider using standard technologies like a webservice or Winows Communication Foundation.
Besides this custom solution I would evaluate if the task can be solved using Microsoft Integration Services .
Finally other question comes to mind - why do you need this application? Why doesn't/don't the application(s) consuming the data query the database? Is the expensive polling required? Is it possible for the data producers to signal the availibilty of new data directly to the data consumers?
I am not sure about the details of your project, specifically related to security but maybe it would be better to create an SSIS package and schedule it as a job?