Can anyone help me with a question about webservices and scalability? I have written a webservice as a facade into our document management system and need to think about scalability issues. What areas should I be looking at to ensure performance and availability?
Thanks in advance
Performance is separate from scalability. Scalability means that you can add more servers to linearly increase system throughput (i.e more client connections). The best way to start is having stateless webservices. That way any client can call any of the n webservice intance on n different machines. If there is a shared database at the end for persistence that will ultimately be your bottleneck. There are ways to reduce that with data partitioning and sharding, but only when you get to that point.
First of all, decide what is acceptable behaviour of your web service. What should it be able to cope - 1000 connections per second? What response time will each connection have?
Then you need to automate the usage of your web service so you can stress test the system.
What happens when you have 100 requests per second? 1000? 10000?
Then you can make a decision about if performance is ok, if the acceptable behaviour is too strict, or if you need to do heavy performance tuning based on actual profiling data.
You should be looking to host your WCF service in IIS. IIS has a lot of performance, scalability, security etc. mechanisms built in and is the best starting point to save you reinventing the wheel.
Some of the performance is certainly due to your own code, but lets assume that it's already optimized. At that point, the additional performance scaling issues involve the service host (e.g. IIS) the machines that host it, and their network (inter/intranet) connection speeds. You'll need to do some speed tests to be sure of things.
Well it really depends on what you're doing in your web service, but the only way you're going to find out is by simulating lots of users and measuring it.
Take a look at my answer to this question: Measuring performance
When we tested our code in this manor (where the web services were hosted in Windows service(s)), we found that the bottleneck was authenticating each user in the facade service. In particular the windows component LSASS was using most of the CPU.
Luckily we were able to create new machines, each with a facade service, which then called through to our main set of web services. This enable us to scale up to a large number of users (in the region of 100,000 users using our software normally).
Related
I'm considering using WCF or mormot as frameworks for RESTful service, where the code of business / legacy that needs to be accessed is written in Delphi. Performance is a premise in the project.
The application must be prepared for load balancing. The clients of REST service Desktops are Windows applications. These desktop clients allow the user to view large volumes of data, with huge resultsets in SQL statements. What is the best way to implement a service to cache a recordset and consume it slowly through the REST service. Can demonstrate a good example? The recordset must be cached in the session until the client completed the consultation or decided to do the full fetch. I'm looking for the right architecture?
Enabling load balancing will work in WCF? Due to the recordset being cached on a single server, with the row fetch requests, if any, must fall on the same server.
Both WCF and mORMot share the same high-performance kernel-mode http.sys server. Both feature IOCP and multi-threading.
For performance, mORMot will be lighter, will allocate (much) less memory, won't be affected by Garbage Collector freezes, and is able to get JSON content directly from the database engine (by-passing most temporary data conversion and allocation) - so that you can achieve amazing speed. In short, mORMot was designed for performance of serving REST/JSON content from the ground up - with a multi-threaded kernel (whereas e.g. node.js is mono-threaded). If your purpose is also to cache some data, mORMot works very well as 64 bit native services, giving access to all your system RAM if needed, and has built-in real-time content compression.
WCF is a great general-purpose communication library, which can be RESTful, but is not RESTful from its (historical) roots. The main issue I saw with WCF is the difficulty to configure it between applications (.exe.config tuning may be confusing), and that it is a big black box. For instance, it was not possible to implement Cross-origin resource sharing with WCF when the server is hosted as a Windows service (the Access-Control-Allow-Origin: HTTP headers are deleted by WCF!): you have to host it within IIS - and can't fix the issue, whereas with a full Open Source solution, you can fix any issue.
Load-balancing can be implemented in mORMot and WCF with the same algorithm. Instead of using a round-robbin algorithm in your case, a simple routing based on the content may be enough.
Using WCF to serve business logic written in Delphi will be slow, error prone and difficult to maintain. Mixing technologies induces unneeded complexity. I would not go into this direction.
If you have an existing Delphi code base, and some Delphi skills, I guess mORMot may be a better choice. It was reported e.g. that a single server on production is able to hande more than one million requests per day, serving thousands of concurrent clients, with dedicated JavaScript process on the server side. One of the mORMot design goals was to help working with existing code and legacy projects. But I'm not 100% fair, since I'm the main maintainer of this open source project. :)
I have an ASP.NET MVC application that runs on IIS 7. It is setup as a web garden and the number of worker processes matches the number of my processors. I tend to experience some heavy load at times and this setup as worked best.
I have implemented some caching using System.Web.Cache. I will occasionally need to invalidate some of items in my cache however I cannot clear the cache across all processes.
Does the .NET 4 System.Runtime.Caching features make this any easier? Here is a similar question but I hoping there is better advice with .NET 4.
Flush HttpRuntime.Cache objects across all worker processes on IIS webserver
System.Web.Cache and System.Runtime.Caching provide almost the same features, it is just a simple memory cache where items in the cache can have an expiration time, dependencies etc...
If you want to run your site on multiple physical machines or in your case you run it as web garden, caching data in any in process cache doesn't make a lot of sense because it would cache it for each process again. This will let the memory consumption grow pretty quickly I guess...
In those scenarios a distributed cache system is the best choice, because all processes can leverage the already cached data...
I worked with 2 pretty popular distributed in memory cache systems, one is memcached which was also mentioned in the your link.
The other one is the app fabric cache, here is a good example of how to use it
Memchached is a "simple" cache, it doesn't care about security and all this stuff in the first place. But it is very easy to implement and there are good .Net clients which are really simple to use, almost exactly as the .Net build in crap.
If you want to encrypt the data transfers of your cache and have all this high secured, you might want to go with app fabric...
is it good idea realize OLTP system using WCF?
System must process 5-8k request per sec.
As noted by #nonnb in a comment, WCF is a great platform to build service oriented or distributed applications. This includes using WCF in OLTP applications (we do that here). With WCF you could theoretically keep adding servers to scale and handle the load but usually you will end up hitting some database contention (e.g. locking).
5K-8K requests per second is a large number. That translates to 300K-~500K requests per minute. To put this in perspective, if you take a look at the TPC-C benchmark results the top end of your range is almost in the top 50 results with the lower end being in (maybe) the top third of results.
Note that the Microsoft TPC-C results are C++ running in COM+ and do not involve .NET or WCF.
In terms of WCF some reading of interest would be Creating high performance WCF services and A Performance Comparison of Windows Communication Foundation. The latter is almost 4 years old so some of those performance benchmarks may have been improved over the years.
I have a project that I have recently started working on seriously but had a bit of a design discussion with a friend and I think he raised some interesting points.
The project is designed to be highly scalable and easy to maintain the business objects completely independently. Ease of scalability has forced some of the design decisions that impede the project's initial efficiency.
The basic design is as follows.
There is a "core" that is written in ASP.NET MVC and manages all interactions JSON API and HTML web. It however doesn't create or manage "business objects" like Posts, Contributors etc. Those are all handled in their own separate WCF web services.
The idea of the core is to be really simple leveraging individual controls that use management objects to retrieve the business data/objects from the web services. This in turn means that the core could be multithreaded and could call the controls on the page simultaneously.
Each web service will manage the relevant business object and their data in the DB. Any business specific processing will also be in here such as mapping data in the tables to useful data structures for use in the controls. The whole object will be passed to the core, and the core should only be either retrieving or setting a business object once per transaction. If multi-affecting operations are necessary in the future then I will need to make that functionality available.
Also the web services can perform their own independent caching and depending on the request and their own knowledge of their specific area (e.g. Users) could return a newly created object or a pre-created one.
After the talk with my friend I have the following questions.
I appreciate that WCF isn't as fast as DLL calls or something similar. But how much overhead will there be given that the whole system is based on them?
Creating a thread can be expensive. Will it cost more to do this than just calling all the controls one after another?
Are there any other inherent pit falls that you can see with this design?
Do you have any other clients for the web service beyond your web site? If so, then I think that the web service isn't really needed. A service interface is reasonable, but that doesn't mean that it needs to be a web service. Using a web service you'll incur the extra overhead of serialization and one more network transfer of the data. You gain, perhaps, some automatic caching capabilities for your service, but it sounds like you are planning to implement this on your own in any case. It's hard to quantify the amount of overhead because we don't know how complex your objects are nor how much data you intend to transfer, but I would wager that it's not insignificant.
It it were me, I would simplify the design: go single-threaded, use an embedded service interface. Then, if performance were an issue I'd look to see where I could address the existing performance problems via caching, multiprocessing, etc. This lets the actual application drive the design, though you'd still apply good patterns and practices when the performance issue crops up. In the event that performance doesn't become an issue, then you haven't built a lot of complicated infrastructure -- YAGNI! You are not gonna need it!
It depends on the granularity of your service calls. One principle in SOA is to make your interfaces less chatty, i.e. have one call perform a whole bunch of actions. If you designed your service Interface as if it was a reguler Business object, then it is very likely it will be too chatty.
It depends on your usage pattern. Also regarding threads, granularity is a key factor.
It looks very much like you're overdesigning the system. Changing a service interface is much more cumbersome than changing a simple method signature. If all your business objects are exposed as services, you are up for a debugging nightmare.
1.
Web Service oriented design is reasonable if you have one or more non-native clients (that cannot access to you logic directly). For example AJAX, Flash, another web application from different domain, etc. But using WCF for you application when you can make calls to you logic directly is very bad idea.
If later you will need Web Services you can easily wrap you domain model with Service Layer.
2.
Use thread pool to minimize thread creation calls when necessary. And answer on this question depends on what you need to achieve, it is not clear from you explanation.
3.
Main pit fall is that you are trying to use to many things. Overdesigning probably a good term.
If you are worried about the overhead in calling a WCF service then you can use the null transport. This avoids all the necessary serialization and deserialization that would happen if the client and server were on separate machines.
It doesn't sound like something that'll be highly scalable; at least, not to lots of users per second. Slapping in WCF all over the place will slow things down, by creating far more threads than you need. If the WCF calls don't do much work, then the thread overhead will hurt you hard. Although it'll be multithreaded, multiple calls to ASPX pages are already multithreaded. You might speed up your system when just one person is running, but hit performance hard if lots of users are running. Eg, if one user requests the page, then ten sepearate WCF calls may gain from multithreading. However, if you have 100 page reqests per second, that's 1000 WCF calls per second. That's a lot of overhead.
I currently have an application that sends XML over TCP sockets from windows client to windows server.
We are rewriting the architecture and our servers are going to be in Java. One architecture we are looking at is a REST architecture over http. So C# WinForm clients will send info using this. We are looking for high throughput and low latency.
Does anyone have any performance metrics on this approach versus some other C# client to Java server communication options.
This isn't really well-enough defined to make any metric statements; how big are the messages, how often would you be hitting the REST service, is it straight HTTP or do you need to secure it with SSL? In other words, what can you tell us about the workload parameters?
(I say this over and over again on performance questions: unless you can tell me something about the workload, I can't -- nobody really can -- tell you what will give better performance. That's why they used to say you couldn't consider performance until you had an implementation: it's not that you can't think about performance, it's that people often couldn't or at least wouldn't think about workload.)
That said, though, you can make some good estimates simply by looking at how many messages you want to exchange, because setup time for TCP/IP often dominates REST. REST offers two advantages here: first, the TCP/IP time often does dominate the message transmission, and that's pretty well optimized in production web servers like Apache or lighttpd; second, a RESTful architecture enhances scalability by eliminating session state. That means you can scale freely using just a simple TCP/IP load balancer.
I would set up a test to try it and see. I understand that the only part of your application you're changing is the client/server communication. So analyse what you're sending now, and put together a test client/server setup sending messages which are representative of what you think your final solution is going to be doing (perhaps representative only in terms of size/throughput).
As noted in the previous post, there's not enough detail to really judge what the performance is going to be like. e.g.
is your message structure/format going to be the same, but merely over HTTP rather than raw sockets ?
are you going to be sending subsets of XML data ? Processing large quantities of XML can be memory intensive (e.g. if you're using DOM-based approach).
What overhead is your chosen REST framework going to be introducing (hopefully very little, but at the moment we don't know).
The best solution is to set something up using (say) Jersey and spend some time testing various scenarios. If you're re-architecting a solution, it's going to be worth a few days investigating performance (let alone functionality, ease of development etc.)
It's going to be plenty fast, unless you have a very, very large number of concurrent clients hitting those servers. The XML shredding keeps getting faster in both Java and .NET. If you are on CLR2 and Java 5 or above, you will be fine. But of course you still need to do the tests to verify.
We've tested in our lab, REST and SOAP transactions, and they are faster than you might think. Tens of thousands of messages per second. Small numbers of modern CPUs generating XML messages can easily saturate a gigabit network. In other words, the network is the bottleneck (transmission of data), not the CPU (serializing & de-serializing XML).
AND, If you do your software design properly, in the very unlikely situation where REST is not sufficient, then swapping out the message format layer (REST => protobufs) will get you better transmission perf, with minimal disruption.
But before you need to go there, you will be able to send some money to Cisco and get lots more headroom.