I'm trying to write a producer/consumer system using Redis in C#. Each message produced must be consumed by only one consumer, and I want the consumers to wait for elements created by the consumer. My system must support many produced/consumer sets.
I am using StackExchange.Redis to communicate with Redis, and using lists where elements are added using ListLeftPush and removed with ListRightPop. What I am experiencing is that while the ListRightPop method should block until an element exists in the list (or after a defined timeout), it always returns automatically if there are no elements in the list. This is the test code I wrote to check this:
IDatabase cache = connection.GetDatabase();
Trace.TraceInformation("waiting "+DateTime.Now);
var res = cache.ListRightPop("test");
Trace.TraceInformation("Got "+res+", Ended" + DateTime.Now);
And I'm getting a nil result after less than 1 second.
The standard pop operations do not block: they return nil if the list is empty or does not exist.
SE.Redis is a multiplexer. Using a blocking pop is a very very bad idea. This is explained more, with workarounds discussed specifically for blocking pops, in the documentation: https://stackexchange.github.io/StackExchange.Redis/PipelinesMultiplexers
StackExchange.Redis is merely hitting the redis server's exposed API, the relevant method of which is BRPOP in your case. The documentation for that is:
http://redis.io/commands/blpop - blocking left pop
http://redis.io/commands/brpop - blocking right pop
While those methods do describe the blocking behavior you are looking for, I believe SE.Redis ListRightPop is calling
http://redis.io/commands/rpop - right pop
I may not be up to the latest SE.Redis package, but intellisense is not giving me an option to supply a timeout like you claim. Additionally, there does not appear to be any methods starting with .List in the IDatabase interface that has the word "block" in it, so I'm not sure SE.Redis exposes a Redis BRPOP API. You can either write your own or ask Marc Gravell nicely, but this is a pretty big request I think because of the blocking nature of the call and the way the multiplexer works.
Related
I have an endpoint which returns the response containing hotels and a flag which shows more results are available, the client needs to call this endpoint recursively till the time the server returns more results flag as false. What is the better way to implement this? Could anyone help me on this?
First Option: Avoid It If Possible
Please try to avoid calls on HTTP APIs so as to avoid network latency.
This is very important if you want to make multiple calls from a client which is supposed to be responsive.
e.g. if you are developing a web application / WPF application and you want user to click on something which triggers 10-20 calls to API, the operation may not complete quickly may result in poor user experience.
If it is a background job, then probably it multiple calls would make more sense.
Second Option: Optimize HTTP Calls From Client
If you still want to make multiple calls over HTTP, then you will have to somehow optimize the code in such a way that at least you avoid the network latency.
For avoiding network latency, you can bring all the data or major chunk of the data in one call on the client side. Then client can iterate over this set of data.
Even if you reduce half of the calls you buy much more time for client processing.
Another Option
You can also try to think if this can be a disconnected operation - client sending just one notification to server and then server performing all iterations.
Client can read status somewhere from database to know if this operation is complete.
That way your client UI would still say responsive and you will be able to offload all heavy processing to Server.
You will have to think and which of these options suits High Level Design of your product/project.
Hope I have given enough food for thoughts (although this may not be solving your issue directly).
Consider a web application that implemented every database action except querying (i.e. add, update, remove) as a NServiceBus message, so that whenever a user calls a web API, in the back-end it will be mapped to await endpointInstance.Request method to return the response in the same HTTP request connection.
The challenge is when a message handler needs to send some other messages and wait for their response to finish its job. NServiceBus does not allow to call Request inside a message handler.
I ended up using Saga to implement message handlers that are relied on some other message handler responses. But the problem with Saga is that I can't send back the result in the same HTTP request, because Saga uses publish/subscribe pattern.
All our web APIs need to be responded in the same HTTP request (connection should be kept open until the result is received or a timeout exception occurred).
Is there any clean solution (preferably without using Saga)?
An example scenario:
user call http://test.com/purchase?itemId=5&paymentId=133
web server calls await endpointInstance.Request<PurchaseResult>(new PurchaseMessage(itemId, paymentId));
PurchaseMessage handler should call await endpointInstance.Request<AddPaymentResult>(new AddPaymentMessage(paymentId));
if the AddPaymentResult was successfull, store the purchase details in the database and return true as PurchaseResult, otherwise return false
You're trying to achieve something that we (at Particular Software) are trying to actively prevent. Let me explain.
With Remote Procedure Calls (RPC) you call another component out-of-process. That what makes the procedure call 'remote'. Where with regular programming you do everything in-process and it is blazing fast, with RPC you have the overhead of serialization, latency and more. Basically, you have to deal with the fallacies of distributed computing.
Still, people do it for various reasons. Sometimes because you want to use a WebAPI (or 'old fashioned' web service) because it offers the functionality you don't want to develop. Oldest example in the book is searching for an address by postal code. Or deducting money from someone's bank account. If you're building a CRM, you can use these remote components. These days a lot of people build distributed monoliths because they are taught at conferences that this is a good thing. In an architecture diagram, it looks really nice, but there's still temporal coupling that can provide a lot of headaches.
Some of these headaches come from the fact that you're trying to do stuff in an atomic action. Back in the days, with in-process calling of code/classes/etc this was easy and fast. Until you hit limitations, like tons of locks on a database.
A solution to this is asynchronous communication. You send some information via fire-and-forget. This solves temporal coupling. Instead of having a database that is getting dozens and dozens of requests to update data, etc. and as a result, your website is grinding to a halt, you have various options to make sure this doesn't happen. This is a really good thing, because instead of a single atomic operation, you have various smaller operations and many ways to distributed work, scale your system, etc, etc.
It also brings additional challenges, because not everyone is able to work with fire-and-forget. Some systems that were already built, try to introduce asynchronous communication via messaging (and hopefully NServiceBus). Some parts can work flawlessly with this. But others parts can't. Mainly the user-interface (UI). Because it was built to get an immediate result. So when you send a message from the UI, you expect a result!
With NServiceBus we've built a package called "Client-Side Callbacks" to make exactly this a possibility. We highly recommend our customers not to use it, except for this specific scenario that I just described. It is much better to migrate your entire UI to be able to deal with the fact that you don't receive an immediate answer, but we understand this is so much work, that not many will be able to achieve this.
However once that first message was sent and the UI received a result, there is no need to use callbacks anymore. As a result I'd like to propose this scenario:
use call http://test.com/purchase?itemId=5&paymentId=133
web server calls await endpointInstance.Request<PurchaseResult>();
PurchaseMessage handler retrieves info it needs and sends or publishes a message to (an)other component(s) and then replies back to the web server with an answer.
The next handler works with the send/published message and continues the process
Let us know if you need more information. You can always contact us by sending an email to support#particular.net
Short Question:
Has anyone else encountered an issue in using a singleton .NET HttpClient where the application pegs the processor at 100% until it's restarted?
Details:
I'm running a Windows Service that does continuous, schedule-based ETL. One of the data-syncing threads occasionally either just dies, or starts running out of control and pegs the processor at 100%.
I was lucky enough to see this happening live before someone simply restarted the service (the standard fix), and was able to grab a dump-file.
Loading this in WinDbg (w/ SOS and SOSEX), I found that I have about 15 threads (sub-tasks of the main processing thread) all running with identical stack-traces. However, there don't appear to be any deadlocks. I.E. the high-utilization threads are running, but never finishing.
The relevant stack-trace segment follows (addresses omitted):
System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].FindEntry(System.__Canon)
System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].TryGetValue(System.__Canon, System.__Canon ByRef)
System.Net.Http.Headers.HttpHeaders.ContainsParsedValue(System.String, System.Object)
System.Net.Http.Headers.HttpGeneralHeaders.get_TransferEncodingChunked()
System.Net.Http.Headers.HttpGeneralHeaders.AddSpecialsFrom(System.Net.Http.Headers.HttpGeneralHeaders)
System.Net.Http.Headers.HttpRequestHeaders.AddHeaders(System.Net.Http.Headers.HttpHeaders)
System.Net.Http.HttpClient.SendAsync(System.Net.Http.HttpRequestMessage, System.Net.Http.HttpCompletionOption, System.Threading.CancellationToken)
...
[Our Application Code]
According to this article (and others I've found), the use of dictionaries is not thread-safe, and infinite loops are possible (as are straight-up crashes) if you access a dictionary in a multi-threaded manner.
BUT our application code is not using a dictionary explicitly. So where is the dictionary mentioned in the stack-trace?
Following through via .NET Reflector, it appears that the HttpClient uses a dictionary to store any values that have been configured in the "DefaultRequestHeaders" property. Any request the gets sent through the HttpClient, therefore, triggers an enumeration of a singleton, non-thread-safe dictionary (in order to add the default headers to the request), which could potentially infinitely spin (or kill) the threads involved if a corruption occurs.
Microsoft has stated bluntly that the HttpClient class is thread-safe. But it seems to me like this is no longer true if any headers have been added to the DefaultRequestHeaders of the HttpClient.
My analysis seems to indicate that this is the real root problem, and an easy workaround is to simply never use the DefaultRequestHeaders where the HttpClient could be used in a multi-threaded manner.
However, I'm looking for some confirmation that I'm not barking up the wrong tree. If this is correct, it seems like a bug in the .NET framework, which I automatically tend to doubt.
Sorry for the wordy question, but thanks for any input you may have.
Thanks for all the comments; they got me thinking along different lines, and helped me find the ultimate root cause of the issue.
Although the issue was a result of corruption in the backing dictionary of the DefaultRequestHeaders, the real culprit was the initialization code for the HttpClient object:
private HttpClient InitializeClient()
{
if (_client == null)
{
_client = GetHttpClient();
_client.DefaultRequestHeaders.Accept.Clear();
_client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
SetBaseAddress(BaseAddress);
}
return _client;
}
I said that the HttpClient was a singleton, which is partially incorrect. It's created as a single-instance that is shared amongst multiple threads doing a unit of work, and is disposed when the work is complete. A new instance will be spun up the next time this particular task must be done.
The "InitializeClient" method above is called every time a request is to be sent, and should just short-circuit due to the "_client" field not being null after the first run-through.
(Note that this isn't being done in the object's constructor because it's an abstract class, and "GetHttpClient" is an abstract method -- BTW: don't ever call an abstract method in a base-class's constructor... that causes other nightmares)
Of course, it's fairly obvious that this isn't thread-safe, and the resultant behavior is non-deterministic.
The fix is to put this code behind a double-checked "lock" statement (although I will be eliminating the use of the "DefaultRequestHeaders" property anyways, just because).
In a nutshell, my original question shouldn't ever be an issue if you're careful in how you initialize the HttpClient.
Thanks for the clarity of thought that you all provided!
At the minute I am trying to put together an asynchronous tcp server to receive data which I then want to process, extracting values and inserting to sql server.
The basic concept I thought would be best is once the data is received and confirmed as the entire message, the message should then be passed of to some sort of collection to await processing on a FIFO basis, which will parse the values and insert them to sql server. I suppose this is whats known as the consumer/producer pattern.
I have been doing some looking into the best collection / way of doing this and have so far seen the BlockingCollection,ConcurrentCollection and BufferBlock using async/await and i think this may be the way to go but to be honest im not sure.
The best example i have found is on Stephen Cleary's blog in particular this article,
http://blog.stephencleary.com/2012/11/async-producerconsumer-queue-using.html
My main reservations are that I in no way want to slow down or interrupt the receiving of messages which to me would suggest using the multiple producer/consumer example which can be seen at the above link, but what i want to know is;
Am i correct in this assumption or is there a more suitable way of doing this in my scenario.
And if im correct in my assumption could anyone suggest the best way of implementing this taking into consideration my use case.
Any and all help is much appreciated.
At the minute I am trying to put together an asynchronous tcp server to receive data which I then want to process, extracting values and inserting to sql server.
There's a common pitfall with this kind of scenario. It is usually wrong to report success back to the client when the work has yet to be done. Most of the time I've seen this design, it's because of an efficiency "requirement" self-imposed by the developer, not by the client or for technical reasons. So first, take a step back and make absolutely sure that you do want to return a "successful completion" message to the client when the operation has not actually completed yet.
If you are sure that's what you want to do, then there's another question you must ask: is it acceptable to lose requests? That is, after you tell the client that the operation successfully completed, will the system still be stable if the operation does not actually ever complete?
The answer to that question is usually "no." At that point, the most common architectural solution is to have an out-of-process reliable queue (such as an Azure queue or MSMQ), with an independent backend (such as an Azure worker role or Win32 service) that processes the queue messages. This definitely complicates the architecture, but it is a necessary complication if the system must return completion messages early and must not lose messages.
On the other hand, if losing messages is acceptable, then you can keep them in-memory. It is only in this case that you can use one of the in-memory producer/consumer types mentioned on my blog. This is a very rare situation, but it does happen from time to time.
In general, I would avoid using BlockingCollection and friends for this sort of work. Doing so encourages you to architect the entire system into a single process, which is the enemy of scalability and reliability.
I second Stephen Cleary's suggestion of using an out-of-process queue to manage the work. I disagree that this necessarily complicates the architecture, though - in fact, I think it can make things quite a bit simpler. Specifically, a major complication of the original requirement ("put together an asynchronous tcp server") disappears. Asynchronous TCP servers are a pain in the butt to write and easy to screw up - why not just skip that part altogether and be free to focus all of your energy on the post-processing code?
When I built a system like this, I used a Redis List as the task queue. Tasks were serialized to JSON, and clients would add their task to the queue with a RPUSH command. Worker processes retrieve the next task from the queue BLPOP, do their thing, then go back to waiting for the next task.
Advantages:
No locks. All synchronization comes for free from Redis (or whatever task queue you choose).
Everything in the system is single-threaded. Multi-threading is hard.
I'm free to spin up as many worker processes as I want, across as many nodes as I want.
I'm looking for some feedback in regards to the best option for a problem I am working on.
To give you some background I recently inherited a broken business application (our project was using it, so we gained responsibility to fix it), I come from a SharePoint development background so a little C#, ASP.NET and SQL.
Currently we have an issue with the application where we continually receive timeout errors, I have narrowed it down to the web application calling a bunch of stored procedures to update status fields in other tables when something changes that might affect the status of other objects.
Without completely overhauling this application I have determined our best option is to offload these stored procedures to run in the background and not be tied to the UI. I've looked at a couple of options including:
Creating a separate thread to handle the execution. (Still times out)
Using BackgroundWorker (still times out, obviously it shouldn't but I can't seem to find out what is causing it to wait for the BackgroundWorker to finish)
Moving the Stored Proc execution to a job, which I then call from another SP. (This works, but the limitation is that I can only have one job running at once, and if multiple users update objects they then receive an exception because the job won't start)
Right now we have moved these stored procedures into a twice a day script, which updates all objects, however this is only a temporary fix.
I have two options that I'm looking at, and I'm hoping to get some guidance on the implementation of whatever you consider to be the best option:
Continue using the job and have the executing stored proc queue up items in a db which the job will loop through until empty. The executing stored proc will have to check if the job is running when it adds a new entry and then act accordingly.
It's been recommended that I look at using the Service Broker, but I am not familiar with it's use at all. I understand that it would likely be a better overall solution, as it allows me to queue up these updates in a more transactional way.
I think both these options are viable although I need some help in understanding the implementation of the second option. My other dilemma is with these stored procedures running anywhere from 45s to 20m how can I notify the user that changed the object that his/her updates have been made? This is where I fallback to using the job because i could simply add a user field into the 'queue' and have the stored proc send a quick email at the end.
Thoughts, suggestions? Maybe I'm over-thinking this?
If you are on .NET 4.5 and C# 5.0 use async and if you are on .NET 4.0 use TPL. They have the same underlying (almost) and async feature is built upon TPL (with some extra internals).
In any case TPL would be a proper choice.
Sounds like Service Broker would be an excellent solution to this problem. It's true that there is a bit of a learning curve to climb to get your head round how it works, but it's fundamentally pretty simple especially when your implementation is in a single database.
There's a good (and mercifully short) intro to how it works at http://msdn.microsoft.com/en-US/library/ms345108(v=SQL.90).aspx
Have a look at Asynchronous Procedure Execution. But I would look first if the updates can be improved, perhaps a simple index can eliminate the timeouts, and/or try to leverage snapshot isolation. These would be much simpler to try out w/o committing to 'major overhaul' of the application code.
I must also urge you to read Waits and Queues. This is a SQL Server methodology for identifying performance bottlenecks. Is a great way of narrowing down the problems of 'timeouts' to something more actionable (blocking, IO, indexes etc).