I'm a bit mixed up about the difference between a Redis transaction and pipeline and ultimately how to use pipelines with Booksleeve. I see that Booksleeve has support for the Redis transaction feature (MULTI/EXEC), but there is no mention in its API/tests about a pipelining feature. However, it's clear in other implementations that there is a distinction between pipelines and transactions, namely in atomicity, as evidenced in the redis-ruby version below, but in some places the terms seem to be used interchangeably.
redis-ruby implementation:
r.pipelined {
# these commands will be pipelined
r.get("insensitive_key")
}
r.multi {
# these commands will be executed atomically
r.set("sensitive_key")
}
I'd just use MULTI/EXEC instead but they seem to block all other users until the transaction has completed (not necessary in my case), so I worry about their performance. Has anyone used pipelines with Booksleeve or have any ideas about how to implement them?
In BookSleeve, everything is always pipelined. There are no synchronous operations. Not a single one. As such, every operation returns some form of Task (could be a vanilla Task, could be a Task<string>, Task<long>, etc), which at some point in the future (i.e. when redis responds) will have a value. You can use Wait at your calling code to perform a synchronous wait, or ContinueWith / await (C# 5 language feature) to perform an asynchronous callback.
Transactions are no different; they are pipelined. The only subtle change with transactions is that they are additionally buffered at the call-site until complete (since it is a multiplexer, we can't start pipelining transaction-related messages until we have a complete unit-of-work, as it would adversely impact other callers on the same multiplexer).
So: the reason there is no explicit .pipelined is that everything is pipelined and asynchronous.
Pipelining is a protocol level communication strategy and has nothing to do with atomicity. It is entirely orthogonal to notion of 'transactions'. (For example, you can use MULTI .. EXEC in a pipelined connection.)
What is pipelining?
The most basic connector to redis would be a synchronous client interacting in a request-reply manner. Client sends a request, and then waits for response from Redis before sending the next request.
In pipelining, the client can keep sending requests without pausing to see the Redis response for each request. Redis is, of course, a single threaded server and a natural serialization point, and thus request order is preserved and reflected in the response order. This means, the client can have one thread sending requests (typically by dequeuing from a request queue) and another thread is constantly processing responses from Redis. Note that of course you can still use pipelining with a single threaded client, but you do lose some of the efficiencies. The two threaded model allows for full utilization of your local CPU and the network bandwidth (e.g. saturation).
If you are following this so far, you must ask yourself: well, how are the request and responses matched on the client side? Good question! There are various ways to approach this. In JRedis, I wrap requests in a (java) Future object, to deal with the asynchrony of the request/response processing. Everytime a request is sent, a corresponding Future object is wrapped by a pending response object and is queued. The response listener simply dequeues from this queue 1 item at a time and parses the response (stream) and updates the future object.
Now the end user of the client can either be exposed to a synchronous or an asynchronous interface. If the interface is synchronous, the implementation naturally must block on the Future's response.
If you have followed so far, then it should be clear that a single threaded app using synchronous semantics with pipelining defeats the entire purpose of pipelining (since the app is blocking on the response and is not busy feeding the client additional requests.) But if the app is multithreaded, a synchronous interface to the pipeline allows you to use a single connection while processing N client-app threads. (So here, it is a implementation strategy to help build a thread-safe connection.)
If the interface to pipeline is asynchronous, then even a single threaded client app can benefit. Throughput increases at least by an order of magnitude.
(Caveats with pipelining: It is non-trivial to write a fault-tolerant pipelined client.)
Ideally I should use a diagram, but pay attention to what happens at the end of the clip:
http://www.youtube.com/watch?v=NeK5ZjtpO-M
Here is the link to Redis Transactions Documentation
Regarding BookSleeve, please refer to this post from Marc.
"CreateTransaction() creates a staging area to build commands (using exactly the same API) and capture future results. Then, when Execute() is called the buffered commands are assembled into a MULTI/EXEC unit and sent down in a contiguous block (the multiplexer will send all of these together, obviously)."
If you create your commands inside a transaction they will automatically be "pipelined".
Related
I'm implementing an API. The front-end will likely be REST/HTTP, back-end MSSQL, with a lightweight middle-tier in between. Probably IIS hosted.
Each incoming request will have a non-unique Id attached. Any requests that share the same Id must be processed serialised (and FIFO); whereas requests with different Id's can (and should) be processed concurrently for performance and efficiency.
Assume that when clients call my API a new thread is created to process the request (thread per call model).
Assume that every request is about the same size and same amount of computational work
Current Design
My current design and implementation is very simple and straightforward. A Monitor is created on a per-Id basis, and all requests are processed through a critical section of code that enforces serialisation. In other words, the Monitor for Id X will block all threads carrying a request with Id X until the current thread carrying Id X has completed its work.
An advantage of this current design is Simplicity (it was easy to implement).
A second advantage is that there is none of the cost that comes with switching data between threads - each thread carries a request all the way from its initial arrival at the API through to sending a response back to the client.
One possible disadvantage is that where many requests arrive sharing the same Id, there will be lots of threads blocking (and, ultimately, unblocking again)
Another possible disadvantage is that this design does not lend itself easily to Asynchronous code for potentially increasing scalability (scalability would likely have to be realised in other ways).
Alternate Design
Another design might consist of a more complex arrangement:
Create a dedicated BlockingCollection for each Id encountered Create
a single, dedicated long-running consumer thread for each
BlockingCollection
Each thread that processes a request acts as a producer by enqueing the request to the relevant BlockingCollection
The producing thread then waits (in Async style) until a response is
ready to be collected
Each consumer thread processes items from its BlockingCollection Queue in a serialised manner, and signals the thread that is awaiting a response once the response is ready
A disadvantage of this design is complexity
A second disadvantage is that there will be overhead due to switching data between threads (at least twice per request)
However, I think that on a reasonably busy system, where lots of requests are coming in, this might be better at reducing the number of blocking threads;
And possibly the number of threads required overall will be fewer.
It also lends itself better to Asynchronous code than the original design, which might make it scale better.
Questions
Given the effort and complexity of a re-implementation using the Alternate Design, is it likely to be worthwhile?
(At the moment I am leaning towards a NO and sticking with my current design: but any views or general thoughts would be much appreciated.)
If there is no straightforward answer to the above question, then what are the key considerations that I need to factor in to my decision?
Your current solution will scale terribly if the amount of requests gets too high (requests start queuing). Each request is spawning a new thread and will allocate the necessary resources as well.
Have a look at the Actor Model.
You would spawn a thread per request ID and just push the requests via "message" to the actor.
Consider to use lazy initialization for the actors, meaning you only spawn a thread if there is actually a request for the ID going on. If the message queue of an actor is empty you can terminate them and only spawn them again, if a new request with its ID is coming in.
An implementation made with Threadpool should also help with performance in the future.
I am using TcpListener (Clase) example https://msdn.microsoft.com/es-es/library/system.net.sockets.tcplistener(v=vs.110).aspx in order to process TCP requests.
But It seems like at the same time this TCP Listener is gonna accept multiple requests that should be processed later in a couple of Web Services together and result must be returned to the TCP client.
I am thinking to do following:
Get a stream object for reading and writing NetworkStream stream = client.GetStream(); and save it in special container class.
Put this class to special Queue helper class like this one C#: Triggering an Event when an object is added to a Queue.
When Queue is changed fire implemented event to process the next queue item asynchronously using Task.
Within a Task communicate with Web Services, and send the response to TCP Client.
Please, let me know this architecture is vital and able to resolve the multiple requests to TCP Listener.
I'd recommend you netmq. Have a look https://github.com/zeromq/netmq
Using queue is definetly viable idea, but consider what purpose it serves. It limits how many requests you can process in parallel. You may need to limit that in several cases, and most usual is if each request processing performs CPU-bound work (heavy computations). Then your ability to process a lot of them in parallel is limited, and you may to use queue approach.
In your request processing performs IO-bound work (waiting of web request to complete). This does not consume much server resources, and you can process a lot of such requests in parallel, so most likely in your case no queue is needed.
Even if you use queue, it's very rarely useful to process just one item at a time. Instead, process queue with X threads (where X again depends on if work is CPU or IO bound, for CPU you might be fine with X = number of cores, with IO you need more). If you use too few threads to process your queue - your clients will wait more for basically nothing, and can even fail by timeout.
I am implementing a component that reads all the messages off a specific queue as they are available but should only remove messages from the queue asynchronously, after the message contents has been processed and persisted. We read messages off faster than we acknowledge them (e.g. we could read have read 10 messages off before we are ready to Ack the first). The current implementation uses the XMS API, but we can switch to MQI if XMS is inappropriate for these purposes.
We have tried two approaches to try solve this problem, but both have drawbacks that make them unacceptable. I was hoping that someone could suggest a better way.
The first implementation uses an IMessageConsumer in a dedicated thread to read all messages and publish their content as they are available. When the message has been processed, the message.Acknowledge() method is called. The Session is created with AcknowledgeMode.ClientAcknowledge. The problem with this approach is that, as per the documentation, this acknowledges (and deletes) ALL unacknowledged messages that have been received. With the example above, that would mean that all 10 read messages would be acked with the first call. So this does not really solve the problem. Because of the reading throughput we need, we cannot really modify this solution to wait for the first message's ack before reading the second, etc.
The second implementation uses an IQueueBrowser in a decided thread to read all messages and publish their content. This does not delete the messages off the queue as it reads. A separate dedicated thread then waits (on a BlockingQueue) for JMS Message IDs of messages that have been processed. For each of these, it then constructs a dedicated IMessageConsumer (using a message selector with JMSMessageID) to read off the message and ack it. (This pairing of an IQueueBrowser with dedicated IMessageConsumer is recommend by the XMS documentation's section on Queue browsers.) This method does work as expected but, as one would imagine, it is too CPU-intensive on the MQ Server.
Both of the methods proposed in the question appear to rely on a single instance of the app. What's wrong with using multiple app instances, transacted sessions and COMMIT? The performance reports (these are the SupportPacs with names like MP**) all show that throughput is maximized with multiple app instances, and horizontal scaling is one of the most used approaches in your scenario.
The design for this would be either multiple application instances or multiple threads within the same application. The key to making it work correctly is to keep in mind that transactions are scoped to a connection handle. The implication is that a multi-threaded app must dispatch a separate thread for each connection instance and the messages are read in the same thread.
The process flow is that, using a transacted session, the app performs a normal MQGet against the queue, processes the message contents as required and then issues an MQCommit. (I'll use the MQ native API names in my examples because this isn't language dependent.) If this is an XA transaction the app would call MQBegin to start the transaction but for single-phase commit the transaction is assumed. In both cases, MQCommit ends the transaction which removes the messages from the queue. While messages are under syncpoint, no other app instance can retrieve them; MQ just delivers the next available message. If a transaction is rolled back, the next MQGet from any thread retrieves it, assuming FIFO delivery.
There are some samples in:
[WMQ install home]\tools\dotnet\samples\cs\xms\simple\wmq\
...and SimpleXAConsumer.cs is one example that shows the XA version of this. The non-XA version is simpler since you don't need the external coordinator, the MQBegin verbs and so forth. If you start with one of these, make sure that they do not specify exclusive use of the queue and you can fire up several instances using the same configuration. Alternatively, take the portion of the sample that includes creation of the connection, message handling, connection close and destroy, and wrap all that in a thread spawner class.
[Insert usual advice about using the latest version of the classes here.]
Since it's a long question, cliff notes come first.
Cliff notes:
One client sends input to several services and they keep on working and sending results until the client tells them to stop or they have reached a pre-set maximum number of results.
Do you know how one should go about implementing this, or do you have a C#-example for sth. like this? Is WCF & streaming the right toolset for this ? (Consider that results are custom objects, so it's not exactly the same as streaming a file)
More Detailed Problem Definition:
Situation:
I have full control over the code of the client and the services (iow not dependent on closed 3rd party stuff)
everything is in C#
We have one client who wants to get one task done and has several equal independent services for that.
(equal = equal service-software, the hardware on which each service runs can vary -> service-speeds can vary)
One task consists of "1000 pieces of work" which are all independent from one another.
Within one task all of the 1000 pieces of work are based upon the same piece of input data.
I mention solutions A+B since I think they help explaining the problem:
Solution (A) - The slow non-parallel way:
1. Client sends input to one service.
2. Service initializes based upon the input.
3. Service processes all 1000 pieces of work
(results get added up(super fast btw) so the result of 1000 pieces of work has the same size as the result of one)
4. Service sends result to the client.
5. Client receives result and is happy
Solution (B) - Parallel faster way:
Let's say ten services, so we evenly split it up and each should process 100.
The problem is some services may be much faster than others so giving each the same number(100) is slower than
necessary.
Furthermore we can't split up according to an a priori speed-test since the speed of one service can change and
some might even go down during processing, these are the reasons why I think the following would be best for my purpose.
Solution (C) - The way I would like to implement it:
Client sends out the same request to all services. (same request still implies that the task get's processed in parallel, parallelization is super easy for my problem 1000 pieces of work are so independent that doing 1000 times the "first" piece of work means we are done)
A service keeps working and sending results until it is told to stop or has processed 1000 pieces of work.
One result gets sent for 10 pieces of work done.
This means all services work parallel on the task and when the client has gotten a sum of 1000 results from all service replies combined it will send the stop signal.
That means normally no single service should reach 1000, but with having 1000 we have covered the situation where there is only one service and we have a fail-safe to avoid infinite loops if the stop signal gets lost. (client neither needs to wait nor to be absolutely sure that the stop signal has reached a service)
Throwing away additional results beyond our goal of 1000 is fine.
(The alternative of instead making follow-up requests to services that have responded faster than others would come
with the overhead of wasted time due to messages going back and forth and additional initializations.
(Add. inits could be avoided but it would be complicated and you still have the other overhead))
I basically have solutions/would know how to implement A+B but I have no clue how I would go about realizing (C).
How do you implement a client/service-architecture in C# where the service keeps sending results and doesn't just return one object/value? (Results are custom objects, btw)
Does someone know about C#-example-code where sth. like that is implemented? Would streaming be the right way?
I've found the "writing a custom stream"-example but it seems like it's a pretty long way from there to what I want. (As a WCF-noob I can easily be wrong on that though.)
Streaming in the WCF doesn't work in the way that you will open a stream, return the stream to the client and service will still generate results to the stream. If you want to work this way you must go deeper and use sockets directly. In WCF the stream must be written prior to returning it from the operation (I tried to write to returned stream from other thread but it didn't work). Streaming in WCF is only for data transport.
I don't like any of your solution. I would try:
Variant of B. But tasks will not be divided equally upfront. If you have 10 services and 1000 tasks you will send first 10 tasks (each to one service) only only after the service returns the result it will get another task. If tasks can be completed within reasonable time you will need only multiple async calls to services and wait for responses. If any service will not be able to complete task within defined timeout you will send the task to another service. If tasks can be completed fast you can send small batches instead of single task. If task completion takes long you will need duplex communication.
Use Transactional Message queue - MSMQ. You client will generate 1000 messages to "producer queue" and services will take these messages one by one and process them. They will send results as message to another "consumer queue" where client will take results and process them (each result must have correlation to the task). Transactional queue will ensure that each task can be processed only by single service but if service fails or timeout of the transaction will occur the task will be available for processing in another service. MSMQ also offers some additional features like queue for faulty tasks etc. This is little bit advanced scenario. The main problem of this scenario can be limitation in size of messages (max. 4MB per message).
Edit:
Ok because of your clarification it looks like you need to send the same task to multiple services and task will just trigger series of the same computation on the same data. You can achieve it in this way:
Build a duplex service using Net.tcp binding
Service will implement service contract which will have operations to start computation and to stop computation (you can use IsInitiating and IsTerminating properties of OperationContract)
Service will do computation in separate thread started in start operation
Stop operation will abort computation thread
Client will implement callback contract to receive results from the service
Service will call the client callback when the processing thread has result (or multiple results) to send back
Here is an example of using duplex services with WsDualHttpBinding - don't use this binding in your scenario because it is much more complicated if you want to have single client to communicate with multiple same services over duplex HTTP.
What you describe as Solution (C) sounds like a good use for Asynchronous WCF. Some of these might help...
Synchronous and Asynchronous Operations
Asynchronous Programming Design Patterns
How to: Call WCF Service Operations Asynchronously
Are client stubs generated from WSDL by .NET WSE thread-safe?
Of course, "thread-safe" isn't necessary a rigorously defined term, so I'm at least interested in the following:
Are different instances of the same stub class accessible concurrently by different threads, with the same effective behavior as single-threaded execution?
Is a single instance of the same stub class accessible concurrently by different threads, with the same effective behavior as the same calls interleaved in some arbitrary way in single-threaded execution?
You may also wish to use the terminology described here (and originating here) to discuss this more precisely.
Well, for the short answer of is it thread safe, is yes. The reason is that the server side of the service will have more to say then the client connection as to threading capabilities. The client is just a proxy that lays out the request in a fashion that the server can understand. It knows nothing. It is a basic class, no outside access other than the connection to a server. So as long as the server allows multiple connections you would be fine. Thus no resource contention (Except for the server being able to handle all your requests).
On the client side you can have multiple threads use the same class but different instances. This would probably be the preferred scenario so that each transaction can be atomic. Whereas the shared instance you would have to handle your own thread locking around the access of the class itself otherwise you may run into a race condition on the resource internal to your code.
There is also the ability to have a asynchronous call. The stubs generated by wsdl tool will create the begin, end invoke methods so that you can provide a callback method to effectively allow you to submit your request and continue your code without waiting for a reply. This would probably be the best for your second scenario with the single instance.
However it also depends on how the server component is coded. If it's a webservice you should be able to submit multiple requests simultaneously. However if it's a socket based service you may need to do some additional coding on your end in order to handle multiple incoming connections or even to create sockets for example.
So in short yes the different instances behave the same as single threaded execution within the limits of the server side being able to handle multiple concurrent connections.
As for the single instance if you use a callback process, which is provided you may be able to get what you are after without too much headache. However it is also restricted to the limits of the server side code.
The reason I state the server limits is that there are companies that will build webservices that restrict the number of connections coming from outbound hosts so your throughput is limited by this. Thus the number of effective threads you could use would be reduced or made obsolete.