I'm after a couple of ideas or opinions if you do not mind. I'm a trying to understand the best approach for a solution which needs to process near real time events received over a Web API using REST and JSON. There could be several events that are received every second.
As an event is received its processed against a number of rules which could be computationally expense to perform. Each event would be processed against 100s of rules to find a match. A rule might be based on multiple events, and as such I need to store state in memory, not disk or database as performance will become key. The Rules will be pushed in from a Database as a one time exercise and again will be held in memory. If a rule is changed, it will be re-pushed.
Would it be best to write this is a single C# WebAPI Application that receives and correlates the Events. Or a WebAPI, and windows service?
If the later how do I get the API and Windows Service to pass data between each other?These could be on the same or separate servers
With the Windows Service rather than start a new thread for every event respecified, im thinking I should create an event queue or buffer (some sort of FIFO Array). Id have several buffers assigned to different threads or processes to achieve some level of parallelism.
Similarly if I produced this as just a WebAPI, is it possible to create the Queuing/threading approach?
This question might be too big to provide single answer for. Designing system like this depends on multitude of factors such as requirements of the system.
For general event processing solution it's good idea to have web api which saves the events into queue system that has stores the events for later processing. Queue can be external service such as Azure Storage Queue or custom any custom queue implementation that you're able to communicate with and that satisfies your requirements.
Then you would have single or multiple event processors that retrieve events for processing from queue. The event processor could be custom program written by you. Generally the queue should have a way to lease an event so that in case processor fails (crashes) the event is returned to the queue for another processor to process. Once processor has processed the event it can be permanently removed from the queue.
Having that type of architecture is good starting point for building reliable and possibly even scalable solution for processing events. Of course one has to consider the performance of the queue as it could become bottleneck if the amount of events per second is huge.
Related
I am using below code receive the events from Azure Event-Hub
https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-dotnet-framework-getstarted-send#receive-events
I want to Handle the requests coming to event-hub Sequentially. For example someone sent 5 events in very quick time, I want to complete request 1 processing then i want take the second request to process.
How can i handle the events coming to event-hub Sequentially?
Event Hub uses partitions to enable horizontal scaling of event processing. You can specify number of partitions during the event hub creation from 1 to 32. Message order is guaranteed only within a partition but not across all partitions.
If you need the order to be maintained, you need to write events only to a specific partition and read only from the same partition. In Azure Event Hub, partitions are distributed across different instances for high availability. Which means, a partition may go offline for maintenance and come online later. So if you wanted to manage order, you need to write and read to a single partition and you may need to manually manage situations such as partition go offline from your application logic.
If you need to manage order, I would recommend using Azure Service Bus queue which order & availability is managed by Service bus it self.
From the docs,
In order to make it sequential you need to select the proper partitionKey
If you don't specify a partition key when publishing an event, a
round-robin assignment is used. In many cases, using a partition key
is a good choice if event ordering is important. When you use a
partition key, these partitions require availability on a single node,
and outages can occur over time; for example, when compute nodes
reboot and patch.
In the bundle of events you receive from eventhub, there is an attribute called sequence_number, as this bundle is a list, you can sort by sequence_number and then process the events.
I want to use SqlDependency in my project, but the table that I want dependency is being used by several programs for very important purposes. So they have to be able to insert this table while SqlDependency in action. Is that possible?
I've read this question but didn't find my answer.
To answer your question, SqlDependency will not 'lock' the table, but may increase lock contention in high-write environments as it uses the same mechanism as indexed views to detect changes to underlying data.
However, it should be a good fit unless:
The frequency of changes is likely to be high. To define 'high', you really need to test your ecosystem, but a suggested guideline is that if your data changes many times per second, it's probably not a good fit as you: the response time is not guaranteed for SqlDependency, and the callback mechanism is not designed to reliably handle many concurrent changes where you need to be notified of every change. In addition, the SqlDependency can increase blocking/contention on the underlying table as the index used to keep track of changes can form a bottle-neck with a high frequency of writes.
You are intending to build the SqlDependency into a client application (e.g. desktop app) which accesses the database directly, and of which there will be many instances. In this case, the sheer volume of listeners, queues and messages could impact database performance and is just inefficient. In this case you need to put some middleware in between your database and your app before thinking about SqlDependency.
You need to be reliably notified of every single change. The mechanism underlying SqlDependency within SQL Server will generate a notification for every change, but the .NET side of things is not inherently designed to handle them in a multi-threaded way: if a notification arrives while the SqlDependency's worker thread is already handling another notification, it will be missed. In this case, you may be able to use SqlNotificationRequest instead.
You need to be notified immediately of the change (i.e. guaranteed sub-second). SqlDependency is not designed to be low-latency; it's designed for a cache-invalidation scenario.
If SqlDependency is not a good fit, have a look at the Planning for Notifications and underlying Query Notifications pages on MSDN for more guidance and suggestions of alternatives. Otherwise see below for a bit more detail on how to assess performance based on the underlying technologies at play.
SqlDependency largely relies upon two key SQL Server technologies: query notifications (based on indexed views), and service broker. It effectively hooks into the mechanism that updates an indexed view whenever the underlying data changes. It adds a message to a queue for each change, and service broker handles the messaging and notifications. In cases where the write frequency is very high, SQL Server will work hard to handle the writes, keep its 'indexed view' up-to-date, as well as queueing and serving up the many resulting messages. If you need near-instant notification, this may still be the best approach, otherwise have a look at either polling, or using an After Update trigger which perhaps uses Service Broker as suggested on MSDN.
At the minute I am trying to put together an asynchronous tcp server to receive data which I then want to process, extracting values and inserting to sql server.
The basic concept I thought would be best is once the data is received and confirmed as the entire message, the message should then be passed of to some sort of collection to await processing on a FIFO basis, which will parse the values and insert them to sql server. I suppose this is whats known as the consumer/producer pattern.
I have been doing some looking into the best collection / way of doing this and have so far seen the BlockingCollection,ConcurrentCollection and BufferBlock using async/await and i think this may be the way to go but to be honest im not sure.
The best example i have found is on Stephen Cleary's blog in particular this article,
http://blog.stephencleary.com/2012/11/async-producerconsumer-queue-using.html
My main reservations are that I in no way want to slow down or interrupt the receiving of messages which to me would suggest using the multiple producer/consumer example which can be seen at the above link, but what i want to know is;
Am i correct in this assumption or is there a more suitable way of doing this in my scenario.
And if im correct in my assumption could anyone suggest the best way of implementing this taking into consideration my use case.
Any and all help is much appreciated.
At the minute I am trying to put together an asynchronous tcp server to receive data which I then want to process, extracting values and inserting to sql server.
There's a common pitfall with this kind of scenario. It is usually wrong to report success back to the client when the work has yet to be done. Most of the time I've seen this design, it's because of an efficiency "requirement" self-imposed by the developer, not by the client or for technical reasons. So first, take a step back and make absolutely sure that you do want to return a "successful completion" message to the client when the operation has not actually completed yet.
If you are sure that's what you want to do, then there's another question you must ask: is it acceptable to lose requests? That is, after you tell the client that the operation successfully completed, will the system still be stable if the operation does not actually ever complete?
The answer to that question is usually "no." At that point, the most common architectural solution is to have an out-of-process reliable queue (such as an Azure queue or MSMQ), with an independent backend (such as an Azure worker role or Win32 service) that processes the queue messages. This definitely complicates the architecture, but it is a necessary complication if the system must return completion messages early and must not lose messages.
On the other hand, if losing messages is acceptable, then you can keep them in-memory. It is only in this case that you can use one of the in-memory producer/consumer types mentioned on my blog. This is a very rare situation, but it does happen from time to time.
In general, I would avoid using BlockingCollection and friends for this sort of work. Doing so encourages you to architect the entire system into a single process, which is the enemy of scalability and reliability.
I second Stephen Cleary's suggestion of using an out-of-process queue to manage the work. I disagree that this necessarily complicates the architecture, though - in fact, I think it can make things quite a bit simpler. Specifically, a major complication of the original requirement ("put together an asynchronous tcp server") disappears. Asynchronous TCP servers are a pain in the butt to write and easy to screw up - why not just skip that part altogether and be free to focus all of your energy on the post-processing code?
When I built a system like this, I used a Redis List as the task queue. Tasks were serialized to JSON, and clients would add their task to the queue with a RPUSH command. Worker processes retrieve the next task from the queue BLPOP, do their thing, then go back to waiting for the next task.
Advantages:
No locks. All synchronization comes for free from Redis (or whatever task queue you choose).
Everything in the system is single-threaded. Multi-threading is hard.
I'm free to spin up as many worker processes as I want, across as many nodes as I want.
I'm doing a project with some timing constraints right now. Setup is: A web service accepts (tiny) xml files and I have to process these, fast.
First and most naive idea was to handle this processing in the request dispatcher itself, but that didn't scale and was doomed from the start.
So now I'm looking at a varying load of incoming requests that each produce ~ 50 jobs on my side. Technologies available for use are limited due to the customers' rules. If it's not Sql Server or MS MQ it probably won't fly.
I thought about going down the MS MQ route (Web service just submitting messages, multiple consumer processes lateron) and small proof of concept modules worked like a charm.
There's one problem though: The priority of these jobs might change a lot, in the queue. The system is fairly time critical, so if we - for whatever reasons - cannot process incoming jobs in a timely fashion, we need to prefer the latest ones.
Basically the usecase changes from reliable messaging in general to LIFO under (too) heavy load. Old entries still have to be processed, but just lost all of their priority.
Is there any manageable way to build something like this in MS MQ?
Expanding the business side, as requested:
The processing of the incoming job is bound to some tracks, where physical goods are moved around. If I cannot process the messages in time, the things are "gone".
I still want the results for statistical purpose, but really need to focus on the newer messages now.
Think of me being able to influence mechanical things and reroute things moving on a track - if they didn't move past point X yet..
So, if i understand this, you want to be able to switch between sorting the queue by priority OR by arrival time, depending on the situation. MSMQ can only sort the queue by priority AND by arrival time.
Although I understand what you are trying to do, I don't quite see the business justification for it. Can you expand on this?
I would propose using a service to move messages from the incoming queue to a number of work queues for processing. Under normal load, there would be a several queues, each with a monitoring thread.
Under heavy load, new traffic would all go to just one "panic" queue under the load dropped. The threads on the other work queues could be paused if necessary.
CheersJohn Breakwell
I'm working on an application that may generate thousands of messages in a fairly tight loop on a client, to be processed on a server. The chain of events is something like:
Client processes item, places in local queue.
Local queue processing picks up messages and calls web service.
Web service creates message in service bus on server.
Service bus processes message to database.
The idea being that all communications are asynchronous, as there will be many clients for the web service. I know that MSMQ can do this directly, but we don't always have that kind of admin capability on the clients to set things up like security etc.
My question is about the granularity of the messages at each stage. The simplest method would mean that each item processed on the client generates one client message/web service call/service bus message. That's fine, but I know it's better for the web service calls to be batched up if possible, except there's a tradeoff between large granularity web service DTOs, versus short-running transactions on the database. This particular scenario does not require a "business transaction", where all or none items are processed, I'm just looking to achieve the best balance of message size vs. number of web service calls vs. database transactions.
Any advice?
Chatty interfaces (i.e. lots and lots of messages) will tend to have a high overhead from dispatching the incoming message (and, on the client, the reply) to the correct code to process the message (this will be a fixed cost per message). While big messages tend to use the resources in processing the message.
Additionally a lot of web service calls in progress will mean a lot of TCP/IP connections to manage, and concurrency issues (including locking in a database) might become an issue.
But without some details of the processing of the message it is hard to be specific, other than the general advice against chatty interfaces because of the fixed overheads.
Measure first, optimize later. Unless you can make a back-of-the-envelope estimate that shows that the simplest solution yields unacceptably high loads, try it, establish good supervisory measurements, see how it performs and scales. Then start thinking about how much to batch and where.
This approach, of course, requires you to be able to change the web service interface after deployment, so you need a versioning approach to deal with clients which may not have been redesigned, supporting several WS versions in parallel. But not thinking about versioning almost always traps you in suboptimal interfaces, anyway.
Abstract the message queue
and have a swappable message queue backend. This way you can test many backends and give yourself an easy bail-out should you pick the wrong one or grow to like a new one that appears. The overhead of messaging is usually packing and handling the request. Different systems are designed for different levels traffic and different symmetries over time.
If you abstract out the basic features you can swap the mechanics in and out as your needs change, or are more accurately assessed.
You can also translate messages from differing queue types at various portions of the application or message route as the recipient's stresses change because they are handling, for example 1000:1/s vs 10:1/s on a higher level.
Good Luck