Does Azure EventHubClient is thread safe? [duplicate]

Does Azure EventHubClient is thread safe? [duplicate] - c#

I'm writing code that will be publishing messages from multiple threads to an Azure Event Hub in C# using the EventHubClient. The documentation for EventHubClient contains the fairly standard boiler plate.
"Any public static (Shared in Visual Basic) members of this type are
thread safe. Any instance members are not guaranteed to be thread
safe."
There is no additional documentation as to thread safety in any of the four send
methods I would most expect to be thread safe. Were I to believe that the send methods are not threadsafe then I would end up creating a new EventHubClient instance each time I wished to send to a message. Since the underlying tcp connection is apparently reused unless steps are taken this may not have too much overhead. Similar issues arise with partitioned senders though given that there is an async method to create one, they might well have their own AMQP connection.
Are some, if not all, instance methods of EventHubClient thread safe despite the documentation?
And for any Azure folks would it be possible to have this clarified in the documentation? This sort of documentation issue (assuming it is wrong as seems likely) appears to affect Azure Table as well and is generally common within the MSDN docs. With regards to EventHub this is in contrast to the clear thread safety statement of Kafka and AWS Kinesis at least does not explicitly label everything as unsafe. I did not find EventHubs in the open source portion of the SDK so could not check myself.

TLDR:
All critical runtime operations (aka data-plane) in the .NET SDK are thread-safe.
Create EventHubClient object once and re-use
The Story
ServiceBus SDK exposes two patterns to create senders:
Basic
Advanced
For Basic version - developer will directly use EventHubClient.CreateFromConnectionString() API and doesn't worry about managing MessagingFactory objects (connection gu's). SDK will handle reusing the MessagingFactory across all EventHubClient instances as long as the connection string is same - a literal match of all keys and values - is done in the SDK for this reuse.
For an Advanced developer who will need a bit more control at connection level, SB SDK provides MessagingFactory.CreateFromConnectionString() and from this developer can create the EventHubClient instance.
All instance methods of EventHubClient - to send to EventHubs are strictly thread-safe. In general, all data-plane operations are...
However, while reading from EventHubs, API is optimized for, this pattern.
while(true) {
var events = eventHubPartitionReceiver.receive(100);
processMyEvents(events);
}
So, for ex: properties like, EventHubReceiver.RuntimeInformation - is populated after every receive call without any synchronization. So, even though the actual receive API is thread-safe - the subsequent call to RuntimeInformation isn't - as it is rare for anyone to park multiple receive calls on an instance of PartitionReceiver.
Creating a new instance of EventHubClient in each component to start send messages is the default pattern - and the ServiceBus SDK will take care of reusing the underlying MessagingFactory - which reuses the same physical socket (if the connection string is same).
If you are looking for real high throughput scenarios then you should design a strategy to create multiple MessagingFactory objects and then Create an EventHubClient each. However - make sure that you have already increased the Thruput units for your EventHub on the Portal before trying this as the default is just 1 MBPS - cumulative of all 16 partitions.
Also, if the Send pattern you are using is Partitioned Senders - they all will also use the same underlying MessagingFactory - if you create all Senders from the same eventHubClient(.CreatePartitionedSender()) instance.

Related

Azure Function input and output binding object are Shared or created in each request

I am using Azure function : Timer Trigger and Cosmos DB feed trigger.
Timer Trigger Function
I have Output binding of Cosmos DB in Timer Trigger function. I am creating connection with Service Bus client and reading messages in batch and uploading in Cosmos DB as shown below. My question is for every runtime Cosmos DB connection will be created? if yes, how i can share the connection? How can i improve SB connection as well so every runtime it will not create new connection. Am I doing it right way in terms of performance?
Cosmos DB feed trigger
In this function I have Cosmos DB as an trigger and outbound with SB.
For every request new connection will be created or function will reuse existing object for both Cosmos and SB connection?

AFAIK, there are three recommended ways to share expensive data between functions on a server to improve performance:
Use static client variables: Static variables are reused for every function invocation, instead of creating a new one, this saves memory and gives performance benefits. When the load is less, only one server instance is created for functions in background, so static variables are reused for multiple functions invocations within the same server instance. But if more than one servers are created, every server instance will have its own static variable, which will be reused by function invocations handled within the same server instance. This is still much better than creating a new connection for every invocation.
Check out this detailed blog for performance load testing proofs of this as well.
Use MemoryCache: This would allow you to share a cache between functions. For example:
static MemoryCache memoryCache = MemoryCache.Default;
public static async Task<object> Run(HttpRequestMessage req, TraceWriter log)
{
var cacheObject = memoryCache["cachedCount"];
var cachedCount = (cacheObject == null) ? 0 : (int)cacheObject;
memoryCache.Set("cachedCount", ++cachedCount, DateTimeOffset.Now.AddMinutes(5));
log.Info($"Webhook triggered memory count {cachedCount}");
return ...
}
Here the code is trying to find the count in the cache, increment it, and save it with a five minute expiry. If we copy this same code to two functions within the same Azure Function App, then sure enough they each can see the count set by the other one. Note that, this cache will lose its contents every time you edit your code.
Check out this blog for more details around this.
Use Dependency Injection: Use DI to create Singleton instances and use them. Check out Use dependency injection in .NET Azure Functions. ServiceBusClient can be registered for dependency injection with the ServiceBusClientBuilderExtensions.
Note: The disposing of the static clients will be automatically done by .NET core runtime, if they implement IDisposable interface. Don't manually dispose it, otherwise it wont be reused. Further, ensure that the static clients are thread safe. As the MS Doc says:
Establishing a connection is an expensive operation that you can avoid
by reusing the same factory and client objects for multiple
operations. You can safely use these client objects for concurrent
asynchronous operations and from multiple threads.
It is safe to instantiate once and share ServiceBusClient.
Additional links:
https://learn.microsoft.com/en-us/azure/azure-functions/manage-connections#static-clients
Let me know if you have any follow-up questions.

Another alternative would be to have a Service Bus trigger and add a document as a message arrives instead of executing a timer and handling a batch. This approach includes the following benefits:
No need to worry about the broker connection as it's handled by Functions.
Less/simpler code. For example, you can receive the message deserialized into a POCO w/o going through manual deserialization.
Alignment with an event-driven approach and not a time-based batch.

.NET Simple chat server example

I was looking for the simple step by step communication tutorial for .Net programmers. After some Google queries, I have found the "CSharp Communications" code collections at net-informations.com. It looked quite good as long as I reached the "How to C# Chat Server" example.
Authors propose multithreaded server with the HashTable container to keep all connections in the shared memory at the server side. According to MSDN documentation TcpClient and NetworkStream classes used for broadcast messages are not thread safe, while the example uses them from multiple server threads.
My questions are:
Could you confirm that the example is wrong?
How should it be done, is it enough to lock the broadcast method (mark it as a critical section)?
Could you recommend some socket communication tutorials (.Net preferred)?

It is not perfect as I wrote it almost 7 years ago, but it is covering and will give you good understanding regarding the field of TCP communications:
Generic TCP/IP Client Server

According to MSDN documentation TcpClient and NetworkStream classes used for broadcast messages are not thread safe, while the example uses them from multiple server threads.
This is correct; but it is about concurrent access. If each thread uses the instance in turn (eg. using locks to control access) then different threads can be used.
In other words: not thread safe does not imply tied to a single thread.

Finding or building an inter-process broadcast communication channel

So we have this somewhat unusual need in our product. We have numerous processes running on the local host and need to construct a means of communication between them. The difficulty is that ...
There is no 'server' or master process
Messages will be broadcast to all listening nodes
Nodes are all Windows processes, but may be C++ or C#
Nodes will be running in both 32-bit and 64-bit simultaneously
Any node can jump in/out of the conversation at any time
A process abnormally terminating should not adversely affect other nodes
A process responding slowly should also not adversely affect other nodes
A node does not need to be 'listening' to broadcast a message
A few more important details...
The 'messages' we need to send are trivial in nature. A name of the type of message and a single string argument would suffice.
The communications are not necessarily secure and do not need to provide any means of authentication or access control; however, we want to group communications by a Windows Log-on session. Perhaps of interest here is that a non-elevated process should be able to interact with an elevated process and vise-versa.
My first question: is there an existing open-source library?, or something that can be used to fulfill this with little effort. As of now I haven't been able to find anything :(
If a library doesn't exist for this then... What technologies would you use to solve this problem? Sockets, named-pipes, memory mapped files, event handles? It seems like connection based transports (sockets/pipes) would be a bad idea in a fully connected graph since n nodes requires n(n-1) number of connections. Using event handles and some form of shared storage seems the most plausible solution right now...
Updates
Does it have to be reliable and guaranteed? Yes, and no... Let's say that if I'm listening, and I'm responding in a reasonable time, then I should always get the message.
What are the typical message sizes? less than 100 bytes including the message identifier and argument(s). These are small.
What message rate are we talking about? Low throughput is acceptable, 10 per second would be a lot, average usage would be around 1 per minute.
What are the number of processes involved? I'd like it to handle between 0 and 50, with the average being between 5 and 10.

I don't know of anything that already exists, but you should be able to build something with a combination of:
Memory mapped files
Events
Mutex
Semaphore
This can be built in such a way that no "master" process is required, since all of those can be created as named objects that are then managed by the OS and not destroyed until the last client uses them. The basic idea is that the first process to start up creates the objects you need, and then all other processes connect to those. If the first process shuts down, the objects remain as long as at least one other process is maintaining a handle to them.
The memory mapped file is used to share memory among the processes. The mutex provides synchronization to prevent simultaneous updates. If you want to allow multiple readers or one writer, you can build something like a reader/writer lock using a couple of mutexes and a semaphore (see Is there a global named reader/writer lock?). And events are used to notify everybody when new messages are posted.
I've waved my hand over some significant technical detail. For example, knowing when to reset the event is kind of tough. You could instead have each app poll for updates.
But going this route will provide a connectionless way of sharing information. It doesn't require that a "server" process is always running.
For implementation, I would suggest implementing it in C++ and let the C# programs call it through P/Invoke. Or perhaps in C# and let the C++ apps call it through COM interop. That's assuming, of course, that your C++ apps are native rather than C++/CLI.

I've never tried this, but in theory it should work. As I mentioned in my comment, use a UDP port on the loopback device. Then all the processes can read and write from/to this socket. As you say, the messages are small, so should fit into each packet - may be you can look at something like google's protocol buffers to generate the structures, or simply mem copy the structure into the packet to send and at the other end, cast. Given it's all on the local host, you don't have any alignment, network order type issues to worry about. To support different types of messages, ensure a common header which can be checked for type so that you can be backward compatible.
2cents...

I think one more important consideration is performance, what message rate are we talking about and no. of processes?
Either way you are relying on a "master" that allows the communication needs, be it a custom service or a system provided(Pipes, Message Queue and such).
If you don't need to keep track and query for past messages, I do think you should consider a dead simple service that opens a named Pipe - allowing all other processes to either read or write to it as PipeClients. If I am not mistaken it checks on all items in your list.

What your looking for is Mailslots!
See CreateMailslot:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365147(v=vs.85).aspx

Tibco EMS Session sharing Connection object

Our connectivity to EMS code was initially ill-designed and created one TopicConnection object per topic that we listened to. So, in effect, whenever we subscribed to a topic, we create a new connection, a new session and, lastly, a new listener.
We would like to switch to a single connection model. Although I am able to do this easily in our code by sharing one connection object and creating a new session object per topic, we are unsure whether this is going to cause any issues without code.
My understanding is that the Tibco EMS client library is thread safe with regards to sharing a connection. In effect, a connection is just a pipe and the sessions can resuse the this pipe in a thread safe manner.
Is this assumption correct or is there more to this?

The .NET EMS API is based on JMS. In JMS, the Connection and Session objects are specified to be thread-safe and can be reused within the program. You are quite correct in that the Connection object simply represent a network pipe to the EMS server. The EMS User's Guide states:
A connection is a fairly heavyweight object, so most clients will create a connection once and keep it open until the client exits. Your application can create multiple connections, if necessary.
And regarding sessions:
A Session is a single-threaded context for producing or consuming messages. You create Message Producers or Message Consumers using Session objects.
Essentially, unless you need very large volumes and are bumping into performance limitations, it's perfectly safe to use just one connection in your application. The session controls the transaction/acknowledgement semantics of any producers or consumers created within, but is again safe to reuse. I would probably use separate sessions for modules exist within the application with distinct life cycles (think separate deployment units within an application server).
Your EMS server installation will contain a samples directory with various code (something like C:\tibco\ems\5.0\samples\cs). The code in csTopicSubscriber.cs shows how to write a single-threaded topic consumer. There is no multi-threaded topic consumer example but csMsgConsumerPerf.cs demonstrates how to do it with queues.
Be sure to clean up any objects you create after you're done with them - e.g. close the topic consumer object, the session, and the connection when you're finished. Leaking handles without closing them can result in unpredictable behaviour when combined with prefetch and fault-tolerant reconnect settings.

I think yes as long as sharing is within the same application (exe, binary).
We have shared same connection object, and used it as a singleton in our code.

Agree with an earlier answer: the JMS Session must not be shared between threads, but the Connection can/should be. So one connection per application is ok (make sure you start/close it only once - best before/after the individual threads creation).
And then create and use one Session per thread. Remember that when you close() a Session, it will block until all receive callbacks have really returned. So do NOT call close() from within a callback's onMessage().

static WCF proxy class object

I have a WCF app on NetTCP Binding based. In client app i have created its proxy class object as static. This client app may run for 4-8 hrs after deployment. Basically at login window I am creating and initializing DataServiceClient proxy class (mainly database insert & updates) and using same object throughout my application until user closes Main Window.
Is there any adverse effect (performance wise) of creating static object of proxy class? If yes then how I can avoid this. Before using static object I was creating individual object at every window (wherever required) but this had increased window loading time.
How I can improve WCF performance. I am satisfied with its performance but it could be my illusion.

Nothing wrong with using the same instance, but make sure your error handling is good. Otherwise the proxy object will go into a faulted state when an error happens and you have to restart the whole application. There are some events you can attach to when the state changes.
After the proxy object goes into the faulted state you have to create a new one, there is no way to recover a faulted proxy object.
I have found that using message headers reduces the amount of methods I actually need to expose, but that really depends on what your service does.
Otherwise I would recommend to use streaming when possible. Keep your data as small as possible. Use the binary formatter.

Looks like you client is a Windows Forms application - a static service proxy should be ok for you as long as you don't do any multi-threading or callbacks on your proxy etc. Essentially, in such case, you need to synchronize the access to static variables.
Talking in general terms, WCF performance can be improved
Designing the service contract carefully - its should be chunky interface and not chatty so that number of service calls gets reduced
Choosing appropriate binding - TCP Binding would be faster than HTTP Binding but it would be .NET propriety and may not work over internet as other ports would be blocked. If your communicating on same machine then named piped binding would be the fastest mode

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.