I have a project that I have recently started working on seriously but had a bit of a design discussion with a friend and I think he raised some interesting points.
The project is designed to be highly scalable and easy to maintain the business objects completely independently. Ease of scalability has forced some of the design decisions that impede the project's initial efficiency.
The basic design is as follows.
There is a "core" that is written in ASP.NET MVC and manages all interactions JSON API and HTML web. It however doesn't create or manage "business objects" like Posts, Contributors etc. Those are all handled in their own separate WCF web services.
The idea of the core is to be really simple leveraging individual controls that use management objects to retrieve the business data/objects from the web services. This in turn means that the core could be multithreaded and could call the controls on the page simultaneously.
Each web service will manage the relevant business object and their data in the DB. Any business specific processing will also be in here such as mapping data in the tables to useful data structures for use in the controls. The whole object will be passed to the core, and the core should only be either retrieving or setting a business object once per transaction. If multi-affecting operations are necessary in the future then I will need to make that functionality available.
Also the web services can perform their own independent caching and depending on the request and their own knowledge of their specific area (e.g. Users) could return a newly created object or a pre-created one.
After the talk with my friend I have the following questions.
I appreciate that WCF isn't as fast as DLL calls or something similar. But how much overhead will there be given that the whole system is based on them?
Creating a thread can be expensive. Will it cost more to do this than just calling all the controls one after another?
Are there any other inherent pit falls that you can see with this design?
Do you have any other clients for the web service beyond your web site? If so, then I think that the web service isn't really needed. A service interface is reasonable, but that doesn't mean that it needs to be a web service. Using a web service you'll incur the extra overhead of serialization and one more network transfer of the data. You gain, perhaps, some automatic caching capabilities for your service, but it sounds like you are planning to implement this on your own in any case. It's hard to quantify the amount of overhead because we don't know how complex your objects are nor how much data you intend to transfer, but I would wager that it's not insignificant.
It it were me, I would simplify the design: go single-threaded, use an embedded service interface. Then, if performance were an issue I'd look to see where I could address the existing performance problems via caching, multiprocessing, etc. This lets the actual application drive the design, though you'd still apply good patterns and practices when the performance issue crops up. In the event that performance doesn't become an issue, then you haven't built a lot of complicated infrastructure -- YAGNI! You are not gonna need it!
It depends on the granularity of your service calls. One principle in SOA is to make your interfaces less chatty, i.e. have one call perform a whole bunch of actions. If you designed your service Interface as if it was a reguler Business object, then it is very likely it will be too chatty.
It depends on your usage pattern. Also regarding threads, granularity is a key factor.
It looks very much like you're overdesigning the system. Changing a service interface is much more cumbersome than changing a simple method signature. If all your business objects are exposed as services, you are up for a debugging nightmare.
1.
Web Service oriented design is reasonable if you have one or more non-native clients (that cannot access to you logic directly). For example AJAX, Flash, another web application from different domain, etc. But using WCF for you application when you can make calls to you logic directly is very bad idea.
If later you will need Web Services you can easily wrap you domain model with Service Layer.
2.
Use thread pool to minimize thread creation calls when necessary. And answer on this question depends on what you need to achieve, it is not clear from you explanation.
3.
Main pit fall is that you are trying to use to many things. Overdesigning probably a good term.
If you are worried about the overhead in calling a WCF service then you can use the null transport. This avoids all the necessary serialization and deserialization that would happen if the client and server were on separate machines.
It doesn't sound like something that'll be highly scalable; at least, not to lots of users per second. Slapping in WCF all over the place will slow things down, by creating far more threads than you need. If the WCF calls don't do much work, then the thread overhead will hurt you hard. Although it'll be multithreaded, multiple calls to ASPX pages are already multithreaded. You might speed up your system when just one person is running, but hit performance hard if lots of users are running. Eg, if one user requests the page, then ten sepearate WCF calls may gain from multithreading. However, if you have 100 page reqests per second, that's 1000 WCF calls per second. That's a lot of overhead.
Related
At the moment we have a Genetic Algorithm (GA) that runs quite a while, and I thought we could distribute it using Fabric because theoretically it fits nicely as a microservice. This is my first try at Fabric.
How should we do it ? Should we have a stateful service that runs and aggregates other actors tasks ? It's kinda similar to this project: https://github.com/Azure-Samples/service-fabric-dotnet-data-streaming-websockets
I'm not really sure how to go and there is not much documented on this subjects. This GA is really extensive and our goal is to distribute it's calculations.
I implemented a basic genetic algorithm app with Service Fabric as an app building exercise. Not sure if my approach is the best way to do things for your scenario, but I can describe what I did.
My app consisted of only actors, both stateful and stateless. I had a Processor stateful actor which provided all the management tasks and drove the algorithm. Because it was stateful, it maintained the history of all the genetic state across each of the generations that were produced. I also had a FitnessEvalTask stateless actor. This task was simply responsible for evaluating the fitness of an entity. Its input was the gene representation and its output was the fitness value. The idea was that you'd be spinning up instances of this actor at a high rate and they'd be distributed appropriately. The Processor app, being responsible for driving the algorithm, would create the necessary instances of the FitnessEvalTask actors and provide their input and have them report back with their fitness values and do the necessary processing afterwards. My client process, just a simple console app, would communicate with the Processor actor to initiate the algorithm and perform any necessary management tasks.
In general I think Service Fabric can accommodate a long-running, distributed genetic algorithm like you describe, and would be a reasonable solution.
You would likely use SF actors to represent candidate solutions in your population, and also (as you describe) a SF reliable service to perform data aggregation, manage the population and generations, etc.
The choice of whether to use stateful vs. stateless actors/services largely depends on whether you want (or need) to manage state yourself (say, if you're integrating with a custom datastore) or if you're okay with SF managing state on your behalf. A "stateless" SF service can still have durable state... you are simply responsible for managing it yourself.
The nice thing about using SF is that it formally separates the logic + state of your solution from the low-level resource management needed to execute it. You define your application in code and separately configure a SF cluster with whatever resources you wish, and SF takes care of distributing the work efficiently and reliably across the cluster. Certainly you can do that yourself but its challenging to do correctly.
Sounds like a fun problem... best of luck!
I'm considering using WCF or mormot as frameworks for RESTful service, where the code of business / legacy that needs to be accessed is written in Delphi. Performance is a premise in the project.
The application must be prepared for load balancing. The clients of REST service Desktops are Windows applications. These desktop clients allow the user to view large volumes of data, with huge resultsets in SQL statements. What is the best way to implement a service to cache a recordset and consume it slowly through the REST service. Can demonstrate a good example? The recordset must be cached in the session until the client completed the consultation or decided to do the full fetch. I'm looking for the right architecture?
Enabling load balancing will work in WCF? Due to the recordset being cached on a single server, with the row fetch requests, if any, must fall on the same server.
Both WCF and mORMot share the same high-performance kernel-mode http.sys server. Both feature IOCP and multi-threading.
For performance, mORMot will be lighter, will allocate (much) less memory, won't be affected by Garbage Collector freezes, and is able to get JSON content directly from the database engine (by-passing most temporary data conversion and allocation) - so that you can achieve amazing speed. In short, mORMot was designed for performance of serving REST/JSON content from the ground up - with a multi-threaded kernel (whereas e.g. node.js is mono-threaded). If your purpose is also to cache some data, mORMot works very well as 64 bit native services, giving access to all your system RAM if needed, and has built-in real-time content compression.
WCF is a great general-purpose communication library, which can be RESTful, but is not RESTful from its (historical) roots. The main issue I saw with WCF is the difficulty to configure it between applications (.exe.config tuning may be confusing), and that it is a big black box. For instance, it was not possible to implement Cross-origin resource sharing with WCF when the server is hosted as a Windows service (the Access-Control-Allow-Origin: HTTP headers are deleted by WCF!): you have to host it within IIS - and can't fix the issue, whereas with a full Open Source solution, you can fix any issue.
Load-balancing can be implemented in mORMot and WCF with the same algorithm. Instead of using a round-robbin algorithm in your case, a simple routing based on the content may be enough.
Using WCF to serve business logic written in Delphi will be slow, error prone and difficult to maintain. Mixing technologies induces unneeded complexity. I would not go into this direction.
If you have an existing Delphi code base, and some Delphi skills, I guess mORMot may be a better choice. It was reported e.g. that a single server on production is able to hande more than one million requests per day, serving thousands of concurrent clients, with dedicated JavaScript process on the server side. One of the mORMot design goals was to help working with existing code and legacy projects. But I'm not 100% fair, since I'm the main maintainer of this open source project. :)
I'm working on a web application framework, which uses MSSQL for data storage, mostly just does CRUD operations (but on arbitrarly complex structures), provides a WCF interface for rich Silverlight admin and has an MVC3 display (and some basic forms like user settings, etc).
It's getting quite good at being able to load, display, edit and save any (reasonably) complex data structure, in a user-friendly way.
But, I'm looking towards the future, and want to expand my capabilities (and it would be fun to learn new things along the way as well...) - so I've decided (in the light of what's coming for C#5...) to try to get some parallel/async optimalization... Now, I haven't even learned TPL and PLinq yet, so I'm happy for any advice there as well.
So my question is, what are possible areas where parallel processing maybe of help, and where does TPL and PLinq help me on that?
My guts tell me, I could try saving branches of a data structure in a parallel way to the database (this is where I'd expect the biggest peformance optimalization), I could perform some complex operations (file upload, mail sending maybe?) in a multithreaded enviroment, etc. Can I build complex SL UI views in parallel on the client? (Creating 60 data-bound fields on a view can cause "blinking"...) Can I create partial views (menus, category trees, search forms, etc) in MVC at once?
ps: If this turns into "Tell me everything about parallel stuffs" thread, I'm happy to make it community-wiki...
Remember that an asp.net web application is intrinsically a parallel application in any case. Requests can be serviced in parallel and this will all be managed by the asp.net framework. So there are two cases:
You have lots of users all hitting the site at once. In which case the parallel processing capability of the server is probably being used to capacity in any case.
You don't have lots of users all hitting the site at once. In which case the server is probably quite capable of dealing with the responses without parallel processing in a suitable fast response time.
Any time you start thinking about optimising something just because it might be fun, or because you just think you should make stuff faster then you are almost certainly guilty of premature optimization. Your efforts could almost certainly be better spent enriching the functionality of the framework, rather than making what is probably a plenty fast enough solution a little bit faster (at the cost of significantly increase complexity).
In answer to the question of where can TPL and PLINQ really help. In my opinion the main advantage of these technologies is in places in the application where you really do have a lot of long running blocking processes. For example if you have a situation where you call out several times to an external web service - it can be a significant advantage to make these calls in parallel. I would strongly question whether writing to a local database - or even a database on a different box on a local network would count as being a long running blocking process to the extent that this kind of parallelisation is of any significant value.
Pretty much all the examples you list fall in to the category of getting the PC to do something in parallel that it was previously doing in sequence. How many CPUs are on your server - how many are really free when the website is under load. Making something parallel does not necessarily equate to making it faster unless the process involved has some measure of time when you PC is sitting around doing nothing waiting for an external event.
First question is to ask the users / testers which bits seem slow. The only way to know for sure what's slowing you down is to use a profiler like dottrace. The results are sometimes surprising.
If you do find something, parallel processing may not be the answer. You need to remember that there is an overhead in splitting tasks up, so if the task is fairly quick in the first place, it could end up being slower. You also have to consider the added complexity, e.g. what happens if half a task succeeds, and half fails? (Although TPL and PLINQ hide you from this to an extend)
Have fun, but I wondering whether this is a case of 1) solution chasing a problem, and 2) premature optimization.
Can anyone help me with a question about webservices and scalability? I have written a webservice as a facade into our document management system and need to think about scalability issues. What areas should I be looking at to ensure performance and availability?
Thanks in advance
Performance is separate from scalability. Scalability means that you can add more servers to linearly increase system throughput (i.e more client connections). The best way to start is having stateless webservices. That way any client can call any of the n webservice intance on n different machines. If there is a shared database at the end for persistence that will ultimately be your bottleneck. There are ways to reduce that with data partitioning and sharding, but only when you get to that point.
First of all, decide what is acceptable behaviour of your web service. What should it be able to cope - 1000 connections per second? What response time will each connection have?
Then you need to automate the usage of your web service so you can stress test the system.
What happens when you have 100 requests per second? 1000? 10000?
Then you can make a decision about if performance is ok, if the acceptable behaviour is too strict, or if you need to do heavy performance tuning based on actual profiling data.
You should be looking to host your WCF service in IIS. IIS has a lot of performance, scalability, security etc. mechanisms built in and is the best starting point to save you reinventing the wheel.
Some of the performance is certainly due to your own code, but lets assume that it's already optimized. At that point, the additional performance scaling issues involve the service host (e.g. IIS) the machines that host it, and their network (inter/intranet) connection speeds. You'll need to do some speed tests to be sure of things.
Well it really depends on what you're doing in your web service, but the only way you're going to find out is by simulating lots of users and measuring it.
Take a look at my answer to this question: Measuring performance
When we tested our code in this manor (where the web services were hosted in Windows service(s)), we found that the bottleneck was authenticating each user in the facade service. In particular the windows component LSASS was using most of the CPU.
Luckily we were able to create new machines, each with a facade service, which then called through to our main set of web services. This enable us to scale up to a large number of users (in the region of 100,000 users using our software normally).
I have a core .NET application that needs to spawn an arbitrary number of subprocesses. These processes need to be able to access some form of state object in the core application.
What is the best technique? I'll be moving a large amount of data between processes (Bitmaps), so it needs to be fast.
WCF would probably fit the bill.
Here's a really good article on .NET Remoting for performing distributed intensive analysis. Though Remoting has been replaced with the WCF, the article is relevant and shows how to make the calls asynchronously, etc.
This article contrasts WCF to .NET Remoting—the key takeaway here shows that WCF throughput outperforms Remoting for small data, but it approaches Remoting performance as data size increases.
I have similar requirements and am using Windows Communication Foundation to do that right now. My data sizes are probably a bit smaller though.
For reference I'm doing about 30-60 requests of about 5 KB-30 KB per second on a quad-core machine. WCF has been holding up quite well so far.
With WCF you have the added advantages of choosing a transport protocol and security mode that is suitable for your application.
I'd be hesitant to move large data around. I'd be inclined to move pointers to large data around instead, i.e., memory mapped files.
If you truly need to have separate processes there is always named pipes which would perform quite well.
However, would a application domain boundary suffice? Then you could do object marshalling and things would be a lot easier. You application could work shared instances of the same object by using the MarshalByRefObject attribute.
You can use .NET Remoting for inter-process communication (IPC) with IpcChannel. Otherwise you can search for shared memory wrappers and other IPC forms.
There is an MSDN article comparing WCF to a variety of methods including Remoting. However, unless I am reading the bar graph wrong, it shows Remoting to be the same or slightly better (unlike the other comment said).
There is also a blog post about WCF vs. Remoting. The blog post clearly shows Remoting is faster for binary objects and if you are passing Bitmaps (binary objects) then it seems Remoting or shared memory or other IPC option might be faster, although WCF might not be a bad choice anyway.