I'm looking at putting together a fairly straight-forward WCF-based service, and I have a question about how best to decouple it from the database.
Background: The service I'm going to be implementing is highly critical, geographically distributed, and needs to be as available as possible through a disaster or database failure. The business logic is pretty simple; it receives events from an external source, maintains a state table, and broadcasts processed updates to connected clients. I'm replacing a service that currently handles 400-600 incoming events per second, and approximately 10-20 concurrently connected clients. There will be multiple instances of the service running in multiple locations across the US. All instances host the same state data and share events. There is one instance of a master (SQL Server 2008) database in one location.
Challenge: I've built a number of applications similar to this in the past, and I have most of the architectural hurdles behind me. But there's one challenge I've come across to which I can't help but imagine there's a better solution: in my design, the database (MSSQL) is used only for persistence; the database is only read when the first instance of the service starts and for offline reporting. During normal operation, the application only ever writes historical data to the DB.
To fully decouple the application from the database, in the past I've used SQL Service Broker: On each server running the service, I install an instance of SQL Server Express that essentially just acts as a queue for Service Broker messages to the core (SSB "target") database. In normal operating conditions, the application executes all its SQL operations against the local instance, which queues/forwards them to the target DB via SSB. This works pretty well, and to be honest I'm fairly happy with it... As long as the local instance of SQL Server Express is up, the application will obviously stay unaware of problems at the target DB, network issues between it and the target DB, etc., and it's highly survivable in the case of a localized disaster. It's easy to monitor, not too horribly ugly to set up, and it's all supported. In short, it works, and I'm content to live with it if I have to.
But it strikes me as a bit of a kludge. It feels like there should a better way to do that.
Obviously one option is to just queue the database operations in process. I don't like that because if I'm going to decouple things at all, I'd prefer to really decouple and keep my application itself as far away from the DB as possible. I could also write a Data Service that queues these operations... I actually briefly started down that path before thinking to myself, "Wait, isn't this what SSB already does?"
Due to unchangeable external constraints, a more robust/HA SQL Server architecture is not an option. I've been given my one DB cluster and that's that.
So I'm open to just about any thoughts and/or criticisms. Is there something obvious I'm missing? This feels like the kind of thing where there could be something stone-simple I've just somehow overlooked (though not for lack of searching.) Am I making some kind of wider architectural mistake here?
Thanks in advance!
My opinion is obviously biased, but for the record I can point to several fairly big projects that do (or did) it the same way, like High volumn contiguos real Time ETL, March Madness on Demand or MySpace SQL Server Service Broker.
But several things changed in later years, and the primary change is the rise of PaaS offerings. Today you can have a highly available, scalable database and messaging platform, eg. SQL Azure and Azure Queues/Azure Service Buss. Or DynamoDB and SQS if you're willing to step outside SQL/ACID. Arguably, the price point of a park of SQL Express instances pushing to a central SQL Server Standard Edition will be lower than a PaaS solution, but it will be hard to beat the PaaS in terms of availability, free maintenance and scale on-demand.
So aside from the PaaS oint of view above, I would argue that the solution you have is superior to pretty much anything else the MS stack has. WCF is sure easy to program against, unless you have the anti-SOAP fever, but has basically 0 (zero) to offer in terms of availability/reliability. Your process is gone === your data is gone, end of story. WCf over MSMQ is 'WCF' just in name, the programming model of queue channels is miles away from the http/net binding WCF programming model. And MSMQ has little to stand up agains Service Broker (aside from ubiquity). but then again, as you probably know, I am really biased in my opinion...
Related
I am trying to setup in Windows Azure a global counter which would keep track of the number of games started within a day. Each time a player starts a game, a Web Service call is made from the client to the server and a global counter would be incremented by one. This should be fairly simple to do with a database... But I wonder how I could efficiently do this. The database approach is good for a few hundreds clients simultaneously, but what will happen if I have 100,000 clients?
Thanks for your help/ideas!
A little over a year ago, this was a topic in a Cloud Cover episode: Cloud Cover Episode 43 - Scalable Counters with Windows Azure. They discussed how to create an Apaythy Button (similar to the Like Button on Facebook).
Steve Marx also discusses this in detail in a blog post with source code: Architecting Scalable Counters with Windows Azure. In this solution they're doing the following:
On each instance, keep track of a local counter
Use Interlock.Increment to modify the local counter
If the counter changed, save the new value in table storage (have a timer do this every few seconds). For each deployment/instance, you'll have 1 record in the counters table.
To display the total count, take the sum of all records in the counters table.
Well, there are a bunch of choices. And I don't know which is best for you. But I'll present them here with some pros and cons and you can come to your own conclusions given your requirements.
The simplest answer is "put it in storage." Both SQL Azure and the core Azure table or blog storage options are out there for you. One issue to contend with is performance in the face of large scale concurrency, but I'd also encourage you to think about correctness. You really want something that supports atomic increment to outsource this problem IMO.
Another variation of a storage oriented option would be a highly available VM. You could spin up your own VM on Azure, back a data drive on to Azure Drives, and then use something on top of the OS to do this (a database server, an app that uses the file system directly, whatever). This would be more similar to what you'd do at home but would have fairly unfortunate trade-offs...your entire cloud is now reliant on the availability of this one VM, cost is something to think about, scalability of the solution, and so on.
Splunk is also an option to consider, if you look at VMs.
As an earlier commenter mentioned, you could compute off of log data. But this would likely not be super real time.
Service Bus is another option to consider. You could pump messages over SB for these events and have a consumer that reads them and emits a "summary." There are a bunch of design patterns to consider if you look at this. The SB stack is pretty well documented. Another interesting element of SB is that you might be able to trade off 100% correctness for perf/scale/cost. This might be a worthy trade-off for you depending upon your goals.
Azure also exposes queues which might be a fit. I'll admit I think SB is probably a better fit but it is worth looking at both if you are going down this path.
Sorry I don't have a silver bullet but I hope this helps.
I would suggest you follow the pattern described in .NET Multi-Tier Application. This would help you decouple the Web role which faces your clients and the Worker role, which will store the data to a persistence medium (either SQL Server / Azure Storage) by using the Service Bus.
Also, this is an efficient model to scale as you can span new instances of web role or worker role or both. For the dashboard depending on the load you can Cache your data periodically and server it from the Cache. This would compromise on the accuracy of the data, but would still provide with an option for easy scaling. You can even invalidate the cache every 1 minute and get it loaded from the persistence medium to get the latest value.
Regarding to use SQL Server or Azure storage, if there is no need for relational capabilities like JOINS etc, you can very well go for the Azure storage.
I've got a C# service that currently runs single-instance on a PC. I'd like to split this component so that it runs on multiple PCs. Each PC should be assigned a certain part of the work. If one PC fails, its work should be moved to a backup machine.
Data synchronization can be done by the DB, so that should not be much of an issue. My current idea is to use some kind of load balancer that splits and sends the incoming requests to the array of PCs and makes sure the work is actually processed.
How would I implement such a functionality? I'm not sure if I'm asking the right question. If my understanding of how this goal should be achieved is wrong, please give me a hint.
Edit:
I wonder if the idea given above (load balancer splitswork packages to PCs and checks for result) is feasible at all. If there is some kind of already implemented solution so this seemingly common problem, I'd love to use that solution.
Availability is a critical requirement.
I'd recommend looking at a Pull model of load-sharing, rather than a Push model. When pushing work, the coordinating server(s)/load-balancer must be aware of all the servers that are currently running in your system so that it knows where to forward requests; this must either be set in config or dynamically set (such as in the Publisher-Subscriber model), then constantly checked to detect if any servers have gone offline. Whilst it's entirely feasible, it can complicate the scaling-out of your application.
With a Pull architecture, you have a central work queue (hosted in MSMQ, Sql Server Service Broker or similar) and each processing service pulls work off that queue. Expose a WCF service to accept external requests and place work onto the queue, safe in the knowledge that some server will do the work, even though you don't know exactly which one. This has the added benefits that each server monitors it's own workload and picks up work as-and-when it is ready, and you can easily add or remove servers to/from this model without any change in config.
This architecture is supported by NServiceBus and the communication between Windows Azure Web & Worker roles.
From what you said each PC will require a full copy of your service -
Each PC should be assigned a certain
part of the work. If one PC fails, its
work should be moved to a backup
machine
Otherwise you won't be able to move its work to another PC.
I would be tempted to have a central server which farms out work to individual PCs. This means that you would need some form of communication between each machine and and keep a record back on the central server of what work has been assigned where.
You'll also need each machine to measure it's cpu loading and reject work if it is too busy.
A multi-threaded approach to the service would make good use of those multiple processor cores that are ubiquitoius nowadays.
How about using a server and multi-threading your processing? Or even multi-threading on a PC as you can get many cores on a standard desktop now.
This obviously doesn't deal with the machine going down, but could give you much more performance for less investment.
you can check windows clustering, and you have to handle set of issues that depends on the behaviour of the service (you can put more details about the service itself so I can answer)
This depends on how you wanted to split your workload, this usually done by
Splitting the same workload by multiple services
Means same service being installed on
different servers and will do the
same job. Assume your service is reading huge data from the db servers and processing them to produce huge client specific datafiles and finally this datafile is been sent to the clients. In this approach all your services installed in diff servers will do the same work but they split the work to increaese the performance.
Splitting the part of the workload by multiple services
In this approach each service will be assigned to the indivitual jobs and works on different goals. in above example one serivce is responsible for reading data from db and generating huge data files and another service is configured only to read the data file and send it to clients.
I have implemented the 2nd approach in one of my work. Because this let me isolate and debug the errors in case of any failures.
The usual approach for load balancer is to split service requests evenly between all service instances.
For each work item (request) you can store relative information in database. Then each service should also have at least one background thread checking database for abandoned work items.
I would suggest that you publish your service through WCF (Windows Communication Foundation).
Then implement a "central" client application which can keep track of available providers of your service and dish out work. The central app will act as scheduler and load balancer of the tasks to be performed.
Check out Juwal Lövy's book on WCF ("Programming WCF Services") for a good introduction on this topic.
You can have a look at NGrid : http://ngrid.sourceforge.net/
or Alchemi : http://www.gridbus.org/~alchemi/index.html
both are grid computing framework with load balancers that will get you started in no time.
Cheers,
Florian
I have an eCommerce app, which is hosted on 2 geographically different servers
Server_A - hosted on our premises, contains product our ERP (Dynamic Navision) softare & database
Server_B - hosted in external data center - web application & database (not on same server - just for simplification within this question)
When someone places an order on the website, the order is written to the orders table on Server_B
These orders need to be placed into Server_A orders table.
Currently, there is a DTS script that runs and copies across any orders that are in Server_B, but not Server_A
Due to moving servers and application bits around, this has become difficult to manage.
My idea is to use MSMQ to transfer the orders "messages" between the two locations.
Is this a viable option?
What about WebService call right after storing order "locally"?
Your talking about msmq wcf right?
http://code.msdn.microsoft.com/msmqpluswcf
Yes, that sounds viable.
At an MS event I went to a few years ago, and scenario was almost exactly the case study one of the presenters used (i.e. major site had a tightly coupled process that couldn't scale and crashed during Valentines day ordering period - they then changed to use MSMQ so orders could be always be taken/queued up, and then processed later as the other machines were able to)
Only thing to remember with MSMQ is that it can't store messages over a certain size (~4MB if I recall). It doesn't sound like it'll matter to you, but was a hurdle I ran into building a system that had to take big reports along with purchase order messages.
I have a require ment to read data from a table(SQL 2005) and send that data to other application for every 5 seconds. I am looking for the best approach to do the same.
Right now I am planning to write a console application(.NET and C#) which will read the data from sql server 2005(QUEUE table which will be filled through different applications) and send to other application through TCP/IP(Central server). Run that console application under schedule task for every 5 seconds. I am assuming scheduled task will take care to discard new run event if task is already running(avoid to run concurrent executions).
Does any body come accross similar situation? Please share your experience and advice me for best approach.
Thanks in advance for your valuable time spending for my request.
-Por-hills-
We have done simliar work. If you are going to query a sql database every 5 seconds, be sure to use a stored procedure that is optimized to be very fast. It should not update data unless aboslutely necessary. This approach is typically called 'polling' and I've found that it is acceptable if your sqlserver is not otherwise bogged down with too many other calls.
In approaches we've used, a Windows Service that does the polling works well.
To communicate results to another app, it all depends on what your other app is doing and what type of interface you can make into it, and how quickly you need the results. The WCF class libraries from Microsoft provide many workable approaches for real time communication. My preference is to write to the applications database, and then have the application read the data (if it works for that app). If you need something real time, WCF is the way to go, and I'd suggest using a stateless protocol like http if < 5 sec response time is required, (using standard HTTP posts), or TCP/IP if subsecond response time is required.
since I assume your central storage is also SQL 2005, have you considered using what SQL Server 2005 offers out of the box to achieve your requirements? Rather than pool every 5 seconds, marshal and unmarshal TCP/IP, implement authentication and authorization for the TCP/IP pipe, scale TCP transmission with boxcaring, manage message acknowledgments and retries, deal with central site availability, fragment large messages, implement fairness in transmission and so on and so forth, why not simply use Service Broker? It does all you need and more, out of the box, already tested, already tuned for performance and scalability.
Getting reliable messaging right is not trivial and you should focus your efforts in meeting your business specifics, not reiventing the wheel.
I would recommend writing a Windows Service (since you are C#) that has some timer which runs every 5 seconds. That way you wont be starting and stopping an application all the time, it can run even when there is no one logged into the machine, and it will automatically start when the machine is restarted.
For one of my projects, I needed to do something periodically. I opted for a service and set up a timer that takes care of reading the data. You might consider that solution. It has worked well for me.
I suggest to create a windows service and not an application and to perform the timing yourself - create a timer and execute one step on each timer event. For the communication you have many choices - I would consider using standard technologies like a webservice or Winows Communication Foundation.
Besides this custom solution I would evaluate if the task can be solved using Microsoft Integration Services .
Finally other question comes to mind - why do you need this application? Why doesn't/don't the application(s) consuming the data query the database? Is the expensive polling required? Is it possible for the data producers to signal the availibilty of new data directly to the data consumers?
I am not sure about the details of your project, specifically related to security but maybe it would be better to create an SSIS package and schedule it as a job?
The solution we developed uses a database (sqlserver 2005) for persistence purposes, and thus, all updated data is saved to the database, instead of sent to the program.
I have a front-end (desktop) that currently keeps polling the database for updates that may happen anytime on some critical data, and I am not really a fan of database polling and wasted CPU cycles with work that is being redone uselessly.
Our manager doesn't seem to mind us polling the database. The amount of data is small (less than 100 records) and the interval is high (1 min), but I am a coder. I do. Is there a better way to accomplish a task of keeping the data on memory as synced as possible with the data on the database? The system is developed using C# 3.5.
Since you're on SQL2005, you can use a SqlDependency to be notified of changes. Note that you can use it pretty effortlessly with System.Web.Caching.Cache, which, despite it's namespace runs just fine in a WinForms app.
First thought off the top of my head is a trigger combined with a message queue.
This may probably be overkill for your situation, but it may be interesting to take a look at the Microsoft Sync Framework
SQL Notification Services will allow you to have the database callback to an app based off a number of protocols. One method of implementation is to have the notification service create (or modify) a file on an accessible network share and have your desktop app react by using a FileSystemWatcher.
More information on Notification Services can be found at: http://technet.microsoft.com/en-us/library/aa226909(SQL.80).aspx
Please note that this may be a sledgehammer approach to a nut type problem though.
In ASP.NET, http://msdn.microsoft.com/en-us/library/ms178604(VS.80).aspx.
This may also be overkill but maybe you could implement some sort of caching mechanism. That is, when the data is written to the database, you could cache it at the same time and when you're trying to fetch data back from the DB, check the cache first.