Performance Issue of Sending Large Object using NetMQ Multipart Message - c#

I'm sending large object using the following code in C# (NetMQ):
var client = new DealerSocket("<connection String>");
var serializedData = new string('*', 500000);
var message = new List<byte[]>();
message.Add(Encoding.UTF8.GetBytes("BulkSend"));
message.Add(Encoding.UTF8.GetBytes(serializedData));
client.TrySendMultipartBytes(TimeSpan.FromMilliseconds(3000), message);
This code is taking 90% of CPU usage if it would be used in a high traffic (for example 10MB message per second).
After some research, I've tried the two following codes. First of all, I removed the first frame ("Bulk Send"):
var client = new DealerSocket("<connection String>");
var serializedData = new string('*', 500000);
var message = new List<byte[]>();
message.Add(Encoding.UTF8.GetBytes(serializedData));
client.TrySendMultipartBytes(TimeSpan.FromMilliseconds(3000), message);
Surprisingly, the performance was improved. Secondly, I rearrange two frames. I mean moving the large frame to the first. Like the following:
var client = new DealerSocket("<connection String>");
var serializedData = new string('*', 500000);
var message = new List<byte[]>();
// change the order of two following codes
message.Add(Encoding.UTF8.GetBytes(serializedData));
message.Add(Encoding.UTF8.GetBytes("BulkSend"));
client.TrySendMultipartBytes(TimeSpan.FromMilliseconds(3000), message);
Again surprisingly the performance was improved!
What's the problem? How can I improve the performance? What about using zproto on netmq? Is there any proper document around that?

As I found, there is a problem with sending multipart message. You can see this link http://hintjens.com/blog:84.
And probably encode your message into one message and then sending it!

High CPU/low performance might be because of GC and memory allocations. I tried to send the same message size with my framework, which uses NetMQ, with server and client working on same machine. In case the client was waiting for the response before sending another message to server I achieved 60K msg/sec. Nevertheless, if the client was sending hundreds of messages in parallel, CPU was as well high and throughput was tens of messages per sec. At some point in time I've even got OutOfMemoryException.
Would be good, if you post the complete code.
UPD: Sorry, I measured it wrong. With with either 100 messages sent in parallel or just one the performance is just 120 msg/sec

Related

How to crawl XML(s) very fast — considering the below networking limitations?

I have a .Net crawler that's running when the user makes a request (so, it needs to be fast). It crawls 400+ links in real time. (This is the business ask.)
The problem: I need to detect if a link is xml (think of rss or atom feeds) or html. If the link is xml then I continue with processing, but if the link is html I can skip it. Usually, I have 2 xml(s) and 398+ html(s). Currently, I have multiple threads going but the processing is still slow, usually 75 seconds running with 10 threads for 400+ links, or 280 seconds running with 1 thread. (I want to add more threads but see below..)
The challenge that I am facing is that I read the streams as follows:
var request = WebRequest.Create(requestUriString: uri.AbsoluteUri);
// ....
var response = await request.GetResponseAsync();
//....
using (var reader = new StreamReader(stream: response.GetResponseStream(), encoding: encoding)) {
char[] buffer = new char[1024];
await reader.ReadAsync(buffer: buffer, index: 0, count: 1024);
responseText = new string(value: buffer);
}
// parse first byts of reasponseText to check if xml
The problem is that my optimization to get only 1024 is quite useless because the GetResponseAsync is downloading the entire stream anyway, as I see.
(The other option that I have is to look for the header ContentType, but that's quite similar AFAIK because I get the content anyway - in case that you don't recommend to use OPTIONS, that I did not use so far - and in addition xml might be content-type incorrectly marked (?) and I am going to miss some content.)
If there is any optimization that I am missing please help, as I am running out of ideas.
(I do consider to optimize this design by spreading the load on multiple servers, so that I balance the network with the parallelism, but that's a bit of change from the current architecture, that I cannot afford to do at this point in time.)
Using HEAD requests could speed up the requests significantly, IF you can rely on the Content-Type.
e.g
HttpClient client = new HttpClient();
HttpResponseMessage response = await client.SendAsync(new HttpRequestMessage() { Method = HttpMethod.Head});
Just showing basic usage. Obviously you need to add uri and anything else required to the request.
Also just to note that even with 10 threads, 400 request will likely always take quite a while. 400/10 means 40 requests sequentially. Unless the requests are to servers close by then 200ms would be a good response time meaning a minimum of 8 seconds. Ovserseas serves that may be slow could easily push this out to 30-40 seconds of unavoidable delay, unless you increase the amount of threads to parallel more of the requests.
Dataflow (Task Parallel Library) Can be very helpful for writing parallel pipes with a convenient MaxDegreeOfParallelism property for easily adjusting how many parallel instances can be run.

Send bulk email with MailKit in C#

We are using MailKit to successfully send emails by creating a client, making a connection and sending a mail. Very standard stuff which works great as we receive a message in our application, we do something and send on an email.
However we want to be able to process 10's of thousands of emails as quickly as possible.
Our destination email server may be unavailable and therefore we could quickly backup messages therefore producing a backlog.
With MailKit, what is the best and quickwest way to process the mails so that they get sent as quickly as possible. For example at the moment each mail may be processed one after the other and if they take a second each to process it could take a long time to send 40000 mails.
We have been using a parallel foreach to spin up a number of threads but this has limitations. Any suggestions or recommendations would be appreciated.
Code sample added: CORRECTION, NEW CODE SAMPLE ADDED. This is much faster but I cannot get it to work creating a new connection each time. Exchange throws errors 'sender already specified'. This is currently sending around 6 mails per second on average.
var rangePartitioner = Partitioner.Create(0, inpList.Count, 15);
var po = new ParallelOptions { MaxDegreeOfParallelism = 30 };
Parallel.ForEach(rangePartitioner, (range, loopState) =>
{
using (var client = new SmtpClient(new SlabProtocolLogger()))
{
client.Connect(_appSettings.RelayAddress, _appSettings.RelayPort);
client.AuthenticationMechanisms.Remove("XOAUTH2");
for (int i = range.Item1; i < range.Item2; i++)
{
var message = _outboundQueueRepository.Read(inpList[i]).Load();
client.Send(message.Body, message.Metadata.Sender, message.Metadata.Recipients.Select(r => (MailboxAddress)r));
_outboundQueueRepository.Remove(inpList[i]);
};
}
});
Correct me if I'm wrong, but it looks to me like the way this works is that the Parallel.Foreach is creating some number of threads. Each thread is then creating an SMTP connection and then looping to send a batch of messages.
This seems pretty reasonable to me.
The only advice I can give you that might optimize this more is if many of these messages have the exact same content and the same From address and that the only difference is who the recipients are, you could vastly reduce the number of messages you need to send.
For example, if you are currently doing something like sending out the following 3 messages:
Message #1:
From: no-reply#company.com
To: Joe The Plumber <joe#plumbing-masters.com>
Subject: We've got a new sale! 50% off everything in stock!
some message text goes here.
Message #2
From: no-reply#company.com
To: Sara the Chef <sara#amazing-chefs.com>
Subject: We've got a new sale! 50% off everything in stock!
some message text goes here.
Message #3:
From: no-reply#company.com
To: Ben the Cobbler <ben#cobblers-r-us.com>
Subject: We've got a new sale! 50% off everything in stock!
some message text goes here.
Your code might create 3 threads, sending 1 of the messages in each of those threads.
But what if, instead, you created the following single message:
From: no-reply#company.com
To: undisclosed-recipients:;
Subject: We've got a new sale! 50% off everything in stock!
some message text goes here.
and then used the following code to send to all 3 customers at the same MimeMessage?
var sender = new MailboxAddress (null, "no-reply#company.com");
var recipients = new List<MailboxAddress> ();
recipients.Add (new MailboxAddress ("Joe the Plumber", "joe#plumbing-masters.com"));
recipients.Add (new MailboxAddress ("Sara the Chef", "sara#amazing-chefs.com"));
recipients.Add (new MailboxAddress ("Ben the Cobbler", "ben#cobblers-r-us.com"));
client.Send (message, sender, recipients);
All 3 of your customers will receive the same email and you didn't have to send 3 messages, you only needed to send 1 message.
You may already understand this concept so this might not help you at all - I merely mention it because I've noticed that this is not immediately obvious to everyone. Some people think they need to send 1 message per recipient and so end up in a situation where they try to optimize sending 1000's of messages when really they only ever needed to send 1.
So we found a wider ranging fix which improved performance massively in addition to the improvements we found with the Parrallel ForEach loop. Not related to MailKit but I thought I would share anyway. The way that our calling code was creating our inputList was to use DirectoryInfo.GetDirectories to enumerate over all the first in the directory. In some cases our code took 2 seconds to execute over the directory with 40k files in it. We changed this to EnumerateDirectories and it effecitvely freed up the mail sending code to send many many emails.
So to confirm, the Parallel loop worked great, underlying performance issue elsewhere was the real bottleneck.

Waiting for networking C# console application to fully start

I have run into an issue with the slow C# start-up time causing UDP packets to drop initially. Below, I is what I have done to mitigate this start-up delay. I essentially wait an additional 10ms between the first two packet transmissions. This fixes the initial drops at least on my machine. My concern is that a slower machine may need a longer delay than this.
private void FlushPacketsToNetwork()
{
MemoryStream packetStream = new MemoryStream();
while (packetQ.Count != 0)
{
byte[] packetBytes = packetQ.Dequeue().ToArray();
packetStream.Write(packetBytes, 0, packetBytes.Length);
}
byte[] txArray = packetStream.ToArray();
udpSocket.Send(txArray);
txCount++;
ExecuteStartupDelay();
}
// socket takes too long to transmit unless I give it some time to "warm up"
private void ExecuteStartupDelay()
{
if (txCount < 3)
{
timer.SpinWait(10e-3);
}
}
So, I am wondering is there a better approach to let C# fully load all of its dependencies? I really don't mind if it takes several seconds to completely load; I just do not want to do any high bandwidth transmissions until C# is ready for full speed.
Additional relevant details
This is a console application, the network transmission is run from a separate thread, and the main thread just waits for a key press to terminate the network transmitter.
In the Program.Main method I have tried to get the most performance from my application by using the highest priorities reasonable:
public static void Main(string[] args)
{
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
...
Thread workerThread = new Thread(new ThreadStart(worker.Run));
workerThread.Priority = ThreadPriority.Highest;
workerThread.Start();
...
Console.WriteLine("Press any key to stop the stream...");
WaitForKeyPress();
worker.RequestStop = true;
workerThread.Join();
Also, the socket settings I am currently using are shown below:
udpSocket = new Socket(targetEndPoint.Address.AddressFamily,
SocketType.Dgram,
ProtocolType.Udp);
udpSocket.Ttl = ttl;
udpSocket.SendBufferSize = 1024 * 1024;
udpSocket.Blocking = true;
udpSocket.Connect(targetEndPoint);
The default SendBufferSize is 8192, so I went ahead and moved it up to a megabyte, but this setting did not seem to have any affect on the dropped packets at the beginning.
From the comments I learned that TCP is not an option for you (because of inherent delays in transmission), also you do not want to loose packets due to other side being not fully loaded.
So you actually need to implement some features present in TCP (retransmission) but in more robust and lightweight fashion. I also assume that you are in control of the receiving side.
I propose that you send some predetermined number of packets. And then wait for confirmation. For instance, every packet can have an id that constantly grows. Every N packets, receiving application sends the number of last received packet to the sender. After receiving this number sender will know if it is necessary to repeat last N packets.
This approach should not hurt your bandwidth very much and you will get some sort of information about received data (although not guaranteed).
Otherwise it is best to switch to TCP. By the way did you try using TCP? How much your bandwidth hurts because of it?

HttpWebRequest Grinding to a halt, possibly just due to page size

I have WPF app that processes a lot of urls (thousands), each it sends off to it's own thread, does some processing and stores a result in the database.
The urls can be anything, but some seem to be massively big pages, this seems to shoot the memory usage up a lot and make performance really bad. I set a timeout on the web request, so if it took longer than say 20 seconds it doesn't bother with that url, but it seems to not make much difference.
Here's the code section:
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(urlAddress.Address);
req.Timeout = 20000;
req.ReadWriteTimeout = 20000;
req.Method = "GET";
req.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
using (StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream()))
{
pageSource = reader.ReadToEnd();
req = null;
}
It also seems to stall/ramp up memory on reader.ReadToEnd();
I would have thought having a cut off of 20 seconds would help, is there a better method? I assume there's not much advantage to using asynch web method as each url download is on its own thread anyway..
Thanks
In general, it's recommended that you use asynchronous HttpWebRequests instead of creating your own threads. The article I've linked above also includes some benchmarking results.
I don't know what you're doing with the page source after you read the stream to end, but using string can be an issue:
System.String type is used in any .NET application. We have strings
as: names, addresses, descriptions, error messages, warnings or even
application settings. Each application has to create, compare or
format string data. Considering the immutability and the fact that any
object can be converted to a string, all the available memory can be
swallowed by a huge amount of unwanted string duplicates or unclaimed
string objects.
Some other suggestions:
Do you have any firewall restrictions? I've seen a lot of issues at work where the firewall enables rate limiting and fetching pages grinds down to a halt (happens to me all the time)!
I presume that you're going to use the string to parse HTML, so I would recommend that you initialize your parser with the Stream instead of passing in a string containing the page source (if that's an option).
If you're storing the page source in the database, then there isn't much you can do.
Try to eliminate the reading of the page source as a potential contributor to the memory/performance problem by commenting it out.
Use a streaming HTML parser such as Majestic 12- avoids the need to load the entire page source into memory (again, if you need to parse)!
Limit the size of the pages you're going to download, say, only download 150KB. The average page size is about 100KB-130KB
Additionally, can you tell us what's your initial rate of fetching pages and what does it go down to? Are you seeing any errors/exceptions from the web request as you're fetching pages?
Update
In the comment section I noticed that you're creating thousands of threads and I would say that you don't need to do that. Start with a small number of threads and keep increasing them until you peek the performance on your system. Once you start adding threads and the performance looks like it's tapered off, then sop adding threads. I can't imagine that you will need more than 128 threads (even that seems high). Create a fixed number of threads, e.g. 64, let each thread take a URL from your queue, fetch the page, process it and then go back to getting pages from the queue again.
You could enumerate with a buffer instead of calling ReadToEnd, and if it is taking too long, then you could log and abandon - something like:
static void Main(string[] args)
{
Uri largeUri = new Uri("http://www.rfkbau.de/index.php?option=com_easybook&Itemid=22&startpage=7096");
DateTime start = DateTime.Now;
int timeoutSeconds = 10;
foreach (var s in ReadLargePage(largeUri))
{
if ((DateTime.Now - start).TotalSeconds > timeoutSeconds)
{
Console.WriteLine("Stopping - this is taking too long.");
break;
}
}
}
static IEnumerable<string> ReadLargePage(Uri uri)
{
int bufferSize = 8192;
int readCount;
Char[] readBuffer = new Char[bufferSize];
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (StreamReader stream = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
readCount = stream.Read(readBuffer, 0, bufferSize);
while (readCount > 0)
{
yield return new string(readBuffer, 0, bufferSize);
readCount = stream.Read(readBuffer, 0, bufferSize);
}
}
}
Lirik has really good summary.
I would add that if I were implementing this, I would make a separate process that reads the pages. So, it would be a pipeline. First stage would download the URL and write it to a disk location. And then queue that file to the next stage. Next stage reads from the disk and does the parsing & DB updates. That way you will get max throughput on the download and parsing as well. You can also tune your threadpools so that you have more workers parsing, etc. This architecture also lends very well to distributed processing where you can have one machine downloading, and another host parsing/etc.
Another thing to note is that if you are hitting the same server from multiple threads (even if you are using Async) then you will hit yourself against the max outgoing connection limit. You can throttle yourself to stay below that, or increase the connection limit on the ServicePointManager class.

C# UDPClient bad throughput

We have a production system that gathers telemetry data from remote devices. This data is sent at a reasonable frequency and we end up receiving up to thousands of messages a second at peak times. The payload is circa 80 bytes per message. I am starting to do some performance testing of various storage machanisms, but I thought first of all I would try and see how fast I could push UDP without any data storage involved. I am getting approx 70,000 messages a second max throughput testing on my local machine (seems to be around the same if I use another machine to send the test data). From my rough calcuations, this is way lower than I expected given the network link capacity. The sender sits in a tight loop sending data. I am fully aware of all the issues with UDP re; lost packets, etc. I just want to get an idea of our systems weak points.
Is the throughput so low because of the small packet size?
Matt
private IPEndPoint _receiveEndpoint = new IPEndPoint(IPAddress.Any, _receivePort);
private Stopwatch sw = new Stopwatch();
private int _recievedCount = 0;
private long _lastCount = 0;
private Thread _receiverThread;
private bool _running = true;
_clientReceive = new UdpClient();
_clientReceive.Client.Bind(_receiveEndpoint);
_receiverThread = new Thread(DoReceive);
_receiverThread.Start();
while (_running)
{
Byte[] receiveBytes = _clientReceive.Receive(ref _receiveEndpoint);
_clientReceive.Receive(ref _receiveEndpoint);
if (!sw.IsRunning)
sw.Start();
string receiveString = Encoding.ASCII.GetString(receiveBytes);
_recievedCount = ++_recievedCount;
long howLong = sw.ElapsedMilliseconds;
if (howLong/1000 > _lastCount)
{
_lastCount = howLong/1000;
Invoke(new MethodInvoker(() => { Text = _recievedCount + " iterations in " + sw.ElapsedMilliseconds + " msecs"; }));
}
}
Lots of small UDP packets are definitely going to result in a lower network throughput than you'd get with larger packets however did you include the IP & UDP header sizes in your calculations?
Apart from that 70k messages/second is very very high and definitely not something you'd want to have happening across the internet if that was where the app is eventually going to be deployed. Even thousands of messages/second is high and if it was me I'd be looking to try and make the communications from the telemetry equipment less chatty perhaps by bundling up multiple readings into a single transmission.
If that's not an option and you are on a private network and you need to increase the network throughput you may have to start looking at your network card, its driver and then fine tuning some Windows networking parameters. But whatever you do with the messages you are almost certainly going to bottle-neck on whatever processing you do on them, especially if it involves disk, way before you get to 70k messages/second (I'd be suprised if you can even get to 10K/second when you're doing anything useful with them).
Yes, you should do measurements with various payload sizes, and see how much throughput you get. For small payloads, there might be an overhead with UDP/IP/Ethernet headers that might be reducing your throughput.
Also see the following article on SO: Having trouble achieving 1Gbit UDP throughput

Categories