I have a little experience with WCF and would like to get your opinion/suggestion on how the following problem can be solved:
A web service needs to be accessible from multiple clients simultaneously and service needs to return a result from a shared data set. The concrete project I'm working on has to store a list of IP addresses/ranges. This list will be queried by a bunch of web servers for a validation purposes and we speak of a couple of thousand or more queries per minute.
My initial draft approach was to use Windows service as a WCF host with service contract implementing class that is decorated with ServiceBehavior(InstanceContextMode = InstanceContextMode.Single, ConcurrencyMode = ConcurrencyMode.Multiple) that has a list object and a custom locking for accessing it. So basically I have a WCF service singleton with a list = shared data -> multiple clients. What I do not like about it is that data and communication layers are merged into one and performance wise this doesn't feel "right".
What I really really (- want is Windows service running an instance of IP list holding container class object, a second service running WCF service contract implementation and a way the latter querying the former in a nice way with a minimal blocking. Using another WCF channel would not really take me far away from the initial draft implementation or would it?
What approach would you take? Project is still in a very early stage so complete design re-do is not out of question.
All ideas are appreciated. Thanks!
UPDATE: The data set will be changed dynamically. Web service will have a separate method to add IP or IP range and on top of that there will be a scheduled task that will trigger data cleanup every 10-15 minutes according to some rules.
UPDATE 2: a separate benchmark project will be kicked up that should use MySQL as a data backend (instead on in-memory list).
It depends how far it has to scale. If a single server will suffice, then fine; keep it conveniently in memory (as long as you can recreate the data if the server gets restarted). If the data-volume is low, then simple blocking (lock) should work fine to synchronize the data, or for higher throughput a ReaderWriterLockSlim. I would probably not store it directly in the WCF class instance, though.
I would avoid anything involving sessions (if/when this ties into the WCF life-cycle); this is rarely helpful to simple services.
For distributed load (over multiple servers) I would give consideration to a separate dedicated backend. A database or memcached / AppFabric / etc would be worth consideration.
Related
I have an ASP.Net webform application and ASP.Net WebApi, both are on the same IIS but in different sites and App pools. Both work with the same DB. I have stored some settings values from DB in the static class. Now I need to refresh this static class on the webform app when I change the settings via WebApi and vice versa. I'm using named pipes for sending the flag into the second app 'on setting change'. But I think that named pipes are not 100% reliable. Is there any other (better) mechanism for how to sync these two classes?
There are a number of solutions to this, which one you choose will depend on the frequency of the updates and how critical it is that the data is in sync.
Ideally you should look for a solution that supports your service instances being distributed across multiple physical locations, you will find the overall implementation simpler and it will allow you to scale your solution beyond the current single server
If it is critical that the many instances are in sync, then a WebSocket solution is a proven protocol and design pattern to orchestrate between multiple instances.
At a high level, you define a single server instance that will orchestrate messaging between all the client instances. The clients (your static class) establish a persistent Web Socket connection to the server that the server can use to send messages to the client when they need to refresh the config.
You can do this from first principals following this Asynchronous Server Socket Example but there are implementation frameworks like Signal R that you might find useful as well.
A simpler but less efficient pattern is to simply poll a single source frequently to determine when you need to refresh. The source could be a single timestamp value in a SQL database, or you could use a reliable cloud based storage like MS Azure Tables or Blob storage.
If the call to check for the update is simple and efficient you can usually get away with this without too much effort or causing too much trouble.
Polling can even be more effient in scenarios where the update frequency is high, especially if the updates are more frequent than the times you need to check if the values have changed.
You could also look into a distributed cache, either to replace the whole static class or just to manage the refresh token. Redis Cache is a reliable pattern that is easy to plugin to ASP.Net, you can setup a local Redis server as explained here or you could use a cloud hosted implementation like that offered by Azure
I would need some help to point me in the right direction.
We want to expose service functionality (which consists of reading + updating a SQL Server database) via WebHTTP end points as per-call services to users.
We don't want to use SOAP if avoidable, as we have trouble to make this interoperate on other platforms.
This must be scalable to 1000+ users, which, in this scenario, are unlikely to submit many concurrent requests. It is estimated that at any given time there should be max 25 concurrent requests.
(That's why per-session services were ruled out, since that would meant to keep 1000+ sessions open while only 25 actions are performed.)
By experience with a test service, we find however, that using pure Per-Call WCF services over HTTP perform poorly, with the largest time lapse being the initialization of the SQL server connection.
It's sort of a similar scenario to what a web server normally would encounter.
Therefore it appeared sensible to use a similar approach as web servers do - for performance reasons they keep a pool of HTTP engines active, and incoming requests are being assigned one of the engines in the pool.
So we want to keep a pool of 25-30 "Business Logic Objects" (i.e. classes with the actual service logic decoupled from mere service interfaces) open which should be instantiated when the service host starts.
Seems that WCF does not have a scenario built in which supports this out of the box.
How would I go about it?
When I am self hosting, I can derive a custom class from ServiceHost and add a Dictionary with the Business objects. This would incur threading issues I guess, which I would have to handle with manual synchronization, correct?
If we decide to host in IIS, how would I do it then, since IIS automatically takes care of creating an instance of the ServiceHost class, and thus I have not much of a chance to throw my own custom host in-between, do I?
Or is this a bad approach altogether. Any other ideas appreciated.
Is there actually a bottleneck with the stateless, session-free approach?
The pool of "business logic objects" doesn't look like a good idea to me. You'll face hard-to-debug concurrency issues.
Have you actually tested the following pattern?
one business logic object per request, shortest lifetime as possible
one SQL connection per business logic object
stateless services
By experience with a test service, we find however, that using pure
Per-Call WCF services over HTTP perform poorly, with the largest time
lapse being the initialization of the SQL server connection.
Really, the SQL server connection shouldn't be a bottleneck because of SQL Server connection pooling.
I dont think their would be much cost associated with instantiating business logic object. you may enable pooling on sql connection object as pointed by ken. Better to go for caching business object rather pooling business logic object.
I have a web service that looks like this:
public class TheService : System.Web.Services.WebService
{
[WebMethod(EnableSession = true)]
public string GetData(string Param1, string Param2) { ... }
}
In other words, it's contained in one class and in there, I have one public method and there is another private method that does a read to the database.
The issue I'm facing is in terms of scalability. I'm building a web app that should work for 1,000 daily users and each user will do about 300-500 calls a day to the web service and so that's about 300,000 to 500,000 requests per day. I need to add 9 more calls to the web service. Some of these calls will involve database writes.
My question is this: am I better off creating 9 separate web services or continue with the one service I have and add the other methods. Or may be something different and better. I'm planning to deploy the application on Azure so I'm not really concerned about hardware, just the application side of things.
I wouldn't base my decision off the volume, or for performance/scalability reasons. You won't get much if any performance benefit from keeping them lumped together or separating them. Any grouping or filtering that can be done while the services are grouped one way can also be done with the services grouped the other way. The ability to partition between servers will be the same, too.
Design
Instead I would focus on trying to make your code understandable and maintainable. Group your services how they make the most sense architecturally within your program. Keep them logically grouped how they make the most sense to be grouped, from a problem-domain perspective (as opposed to a solution domain perspective).
Since you're free to group them how you want, I recommend you read up on SOLID, which is a set of guiding principles for creating software architecture.
One of the principles listed that is particularly important is the Interface Segregation Principle, which can be defined by the notion that "many client specific interfaces are better than one general purpose interface."
Performance and scalability
Since you mentioned performance and scalability being a concern, I recommend you follow this plan:
Determine how long you can wait until you can patch/maintain the software
Determine your expected load, including both average and peak load-per-time (you've determined the average), and how much you expect this traffic to grow over time (specifically over the period you can go without patching/maintaining the software)
Create a model describing exactly which calls will be done and in which ratios (per time and per server)
Create automation that mirrors these models as closely as you can. Try to model both average and peak traffic, and surpassing your highest scale traffic
Profile your code, DB, network traffic, and disk traffic while running this automation
Determine the bottlenecks, and if they are within acceptable tolerance
Optimize your bottlenecks (as required), and repeat from the profiling step forward
The next release of your software, repeat from the top to add scenarios/load/automation
Perform regression testing using your existing tests, altered to fit the new scale
Splitting the web methods into several web services won't help you here; load balancing will.
The number of web services will not have any affect on scalability of the app.
Finding your bottlenecks will help scalability. If you're bottleneck is the DB, you may need to find ways to tune your queries, partition your data across more stores, etc... If you're bottleneck is CPU on the web services (web roles in azure), then adding more than one web role to your cluster will help. Azure supports that.
But, simply don't start adding roles. Understand where your bottlenecks are. Measure, profile and tune.
Azure has devfabric and IIS locally to help you profile locally as well.
Splitting the web-services into multiple web roles because of physical constraints and not necessarily due to logical layout may be worth considering because:
Using Azure you can scale out your Roles independently of one another. This means that IF different web methods need to scale in different patterns (ie: your first web method has the biggest volume in the mornings and after lunch and your other two web methods have the biggest volume in the evening and during the night), and the last 2 web methods are usually flat throughout the day, it very well maybe worth it to split your methods across Roles by scalability constraints and not by logical constraints.
By increasing/decreasing the servers allocated to each method independently you maybe able to fine-tune your optimal power vs. need with a much greater precision.
HTH
Actually, creating separate Web Services, as Igorek suggested, will provide much more granular scale-out. In that scenario, you can deploy different Web Services to different Roles, each role getting its own set of instances (along with the option to create different instance sizes per role). Windows Azure will load-balance across all the instances of a Role.
So from a granularity standpoint:
Least granular: Combine all methods into a single Web Service, hosted on a single Role. As you scale out to multiple instances, all service method requests are load-balanced across all instances. Because you're combining everything into one Role, you will find this to be optimized for cost: You can run all Web Services code in a single instance (really 2 instances to give yourself SLA).
More granular: Create separate Web Services, each with their own methods, and host on the same Role (allows you to exercise SOLID principles, as Merlyn described). Same basic performance characteristics as the first option, as all requests are still load-balanced across the same set of instances.
Most granular: Create separate Web Services, each with their own methods, and host each Web Service endpoint on a separate Role, allowing for independent VM sizing and scale-out of each Web Service endpoint. This option has a higher runtime cost to it, as you now have a minimum of one instance per Web Service endpoint (again, 2 instances in a real world, live application).
I am not sure about exact your case, but moving expensive (from CPU/DB point of view) tasks to separate Worker Role usually are good solution for Azure. In that case you will have one WebRole with services that will receive requests (it will be light weight, so you sjould not have many Instances for it) and create tasks for Worker Roles and one or few Worker Roles that will process that tasks - #1 Worker Roles can be created per kind of task (to group similar actions like reading/writing data to DB) or #2 one Worker Role can handle any type of task. I don't see any benefits in #2, because to get the same behavior you can just create one WebRole with many instances and handle all there. So you will have ability to control processing time by adding/removing Worker Roles.
As other people suggested - using Azure platform by itself will not make app scalable, especially if you are using SQL Azure, you will need to implement sharding or add many DBes to avoid one big DB for all requests.
I don't know if that's related to this questing, but just to let you know - Azure is dropping connections which are not active during 60 sec (I did not find some way to increase that timeout, you can Google this problem). This may be an issue is you are porting web-services to Azure and your responses can reach 60 seconds. One way to avoid it is keeping connection active, which is pretty simple if clients know about this "feature".
I need to process thousands of user details from different (clients) web applications. I have finished a console app that does the actual processing. I have also decided to use MSMQ (the console app will get the user details from a Queue).
I need help deciding how the client web applications will pass data to the Queue. I am thinking I can add a WCF service that will receive data from the client apps and pass it on to the Queue.
Would this be the best way to go? Or is there a better way(s)?
If the whole architecture is Microsoft based I can suggest you to push messages to MSMQ using an InProc dll which is much faster than access via WCF (which add one more layer to the architecture and it slow down the process as it need to serialize/deserialize) the objects. If you design this component in a proper way (SOLID principles) and you make it not coupled to the code you can easily switch to WCF(if you need it) adding a data contract and an End Point to expose your component as a service(at the end of the day WCF exposes an Interface)
Yes it would be the best - in that it's what WCF is for; as it's config driven you'll be able to use different binding types to suit the environment you're in (sending the data across).
The assumption is that the web clients are all (mostly) out on the public internet; being on a private network would give you more options.
WCF can use a queue as a binding type, not sure if that gives you any advantage since you're going to put them into a queue anyway. A synchronous WCF call using an http binding will be fine performance wise as the act of giving it to the MSMQ you have should be pretty quick.
Take a look at NServiceBus
I've got a C# service that currently runs single-instance on a PC. I'd like to split this component so that it runs on multiple PCs. Each PC should be assigned a certain part of the work. If one PC fails, its work should be moved to a backup machine.
Data synchronization can be done by the DB, so that should not be much of an issue. My current idea is to use some kind of load balancer that splits and sends the incoming requests to the array of PCs and makes sure the work is actually processed.
How would I implement such a functionality? I'm not sure if I'm asking the right question. If my understanding of how this goal should be achieved is wrong, please give me a hint.
Edit:
I wonder if the idea given above (load balancer splitswork packages to PCs and checks for result) is feasible at all. If there is some kind of already implemented solution so this seemingly common problem, I'd love to use that solution.
Availability is a critical requirement.
I'd recommend looking at a Pull model of load-sharing, rather than a Push model. When pushing work, the coordinating server(s)/load-balancer must be aware of all the servers that are currently running in your system so that it knows where to forward requests; this must either be set in config or dynamically set (such as in the Publisher-Subscriber model), then constantly checked to detect if any servers have gone offline. Whilst it's entirely feasible, it can complicate the scaling-out of your application.
With a Pull architecture, you have a central work queue (hosted in MSMQ, Sql Server Service Broker or similar) and each processing service pulls work off that queue. Expose a WCF service to accept external requests and place work onto the queue, safe in the knowledge that some server will do the work, even though you don't know exactly which one. This has the added benefits that each server monitors it's own workload and picks up work as-and-when it is ready, and you can easily add or remove servers to/from this model without any change in config.
This architecture is supported by NServiceBus and the communication between Windows Azure Web & Worker roles.
From what you said each PC will require a full copy of your service -
Each PC should be assigned a certain
part of the work. If one PC fails, its
work should be moved to a backup
machine
Otherwise you won't be able to move its work to another PC.
I would be tempted to have a central server which farms out work to individual PCs. This means that you would need some form of communication between each machine and and keep a record back on the central server of what work has been assigned where.
You'll also need each machine to measure it's cpu loading and reject work if it is too busy.
A multi-threaded approach to the service would make good use of those multiple processor cores that are ubiquitoius nowadays.
How about using a server and multi-threading your processing? Or even multi-threading on a PC as you can get many cores on a standard desktop now.
This obviously doesn't deal with the machine going down, but could give you much more performance for less investment.
you can check windows clustering, and you have to handle set of issues that depends on the behaviour of the service (you can put more details about the service itself so I can answer)
This depends on how you wanted to split your workload, this usually done by
Splitting the same workload by multiple services
Means same service being installed on
different servers and will do the
same job. Assume your service is reading huge data from the db servers and processing them to produce huge client specific datafiles and finally this datafile is been sent to the clients. In this approach all your services installed in diff servers will do the same work but they split the work to increaese the performance.
Splitting the part of the workload by multiple services
In this approach each service will be assigned to the indivitual jobs and works on different goals. in above example one serivce is responsible for reading data from db and generating huge data files and another service is configured only to read the data file and send it to clients.
I have implemented the 2nd approach in one of my work. Because this let me isolate and debug the errors in case of any failures.
The usual approach for load balancer is to split service requests evenly between all service instances.
For each work item (request) you can store relative information in database. Then each service should also have at least one background thread checking database for abandoned work items.
I would suggest that you publish your service through WCF (Windows Communication Foundation).
Then implement a "central" client application which can keep track of available providers of your service and dish out work. The central app will act as scheduler and load balancer of the tasks to be performed.
Check out Juwal Lövy's book on WCF ("Programming WCF Services") for a good introduction on this topic.
You can have a look at NGrid : http://ngrid.sourceforge.net/
or Alchemi : http://www.gridbus.org/~alchemi/index.html
both are grid computing framework with load balancers that will get you started in no time.
Cheers,
Florian