We are using MailKit to successfully send emails by creating a client, making a connection and sending a mail. Very standard stuff which works great as we receive a message in our application, we do something and send on an email.
However we want to be able to process 10's of thousands of emails as quickly as possible.
Our destination email server may be unavailable and therefore we could quickly backup messages therefore producing a backlog.
With MailKit, what is the best and quickwest way to process the mails so that they get sent as quickly as possible. For example at the moment each mail may be processed one after the other and if they take a second each to process it could take a long time to send 40000 mails.
We have been using a parallel foreach to spin up a number of threads but this has limitations. Any suggestions or recommendations would be appreciated.
Code sample added: CORRECTION, NEW CODE SAMPLE ADDED. This is much faster but I cannot get it to work creating a new connection each time. Exchange throws errors 'sender already specified'. This is currently sending around 6 mails per second on average.
var rangePartitioner = Partitioner.Create(0, inpList.Count, 15);
var po = new ParallelOptions { MaxDegreeOfParallelism = 30 };
Parallel.ForEach(rangePartitioner, (range, loopState) =>
{
using (var client = new SmtpClient(new SlabProtocolLogger()))
{
client.Connect(_appSettings.RelayAddress, _appSettings.RelayPort);
client.AuthenticationMechanisms.Remove("XOAUTH2");
for (int i = range.Item1; i < range.Item2; i++)
{
var message = _outboundQueueRepository.Read(inpList[i]).Load();
client.Send(message.Body, message.Metadata.Sender, message.Metadata.Recipients.Select(r => (MailboxAddress)r));
_outboundQueueRepository.Remove(inpList[i]);
};
}
});
Correct me if I'm wrong, but it looks to me like the way this works is that the Parallel.Foreach is creating some number of threads. Each thread is then creating an SMTP connection and then looping to send a batch of messages.
This seems pretty reasonable to me.
The only advice I can give you that might optimize this more is if many of these messages have the exact same content and the same From address and that the only difference is who the recipients are, you could vastly reduce the number of messages you need to send.
For example, if you are currently doing something like sending out the following 3 messages:
Message #1:
From: no-reply#company.com
To: Joe The Plumber <joe#plumbing-masters.com>
Subject: We've got a new sale! 50% off everything in stock!
some message text goes here.
Message #2
From: no-reply#company.com
To: Sara the Chef <sara#amazing-chefs.com>
Subject: We've got a new sale! 50% off everything in stock!
some message text goes here.
Message #3:
From: no-reply#company.com
To: Ben the Cobbler <ben#cobblers-r-us.com>
Subject: We've got a new sale! 50% off everything in stock!
some message text goes here.
Your code might create 3 threads, sending 1 of the messages in each of those threads.
But what if, instead, you created the following single message:
From: no-reply#company.com
To: undisclosed-recipients:;
Subject: We've got a new sale! 50% off everything in stock!
some message text goes here.
and then used the following code to send to all 3 customers at the same MimeMessage?
var sender = new MailboxAddress (null, "no-reply#company.com");
var recipients = new List<MailboxAddress> ();
recipients.Add (new MailboxAddress ("Joe the Plumber", "joe#plumbing-masters.com"));
recipients.Add (new MailboxAddress ("Sara the Chef", "sara#amazing-chefs.com"));
recipients.Add (new MailboxAddress ("Ben the Cobbler", "ben#cobblers-r-us.com"));
client.Send (message, sender, recipients);
All 3 of your customers will receive the same email and you didn't have to send 3 messages, you only needed to send 1 message.
You may already understand this concept so this might not help you at all - I merely mention it because I've noticed that this is not immediately obvious to everyone. Some people think they need to send 1 message per recipient and so end up in a situation where they try to optimize sending 1000's of messages when really they only ever needed to send 1.
So we found a wider ranging fix which improved performance massively in addition to the improvements we found with the Parrallel ForEach loop. Not related to MailKit but I thought I would share anyway. The way that our calling code was creating our inputList was to use DirectoryInfo.GetDirectories to enumerate over all the first in the directory. In some cases our code took 2 seconds to execute over the directory with 40k files in it. We changed this to EnumerateDirectories and it effecitvely freed up the mail sending code to send many many emails.
So to confirm, the Parallel loop worked great, underlying performance issue elsewhere was the real bottleneck.
Related
I've encountered a bit of an odd problem with my IMAP parsing code. I'm currently working on a small test solution which would allow me to check the IMAP server every 10 seconds using the S22 library for Unseen messages and display them in a command-line output. However, if I create an ImapClient and the first search for unseen messages returns nothing, at which point I send a new email to the inbox, the Client never manages to find the new email. After closing and running the code again though, it finds the new email without any problems.
Is there something I'm doing wrong here? I'm not sure if not using IMAP IDLE to update is causing any issues here - unfortunately, AOL doesn't support IMAP IDLE.
This is the relevant snippet of code below:
using (ImapClient Client = new ImapClient("imap.aol.com", 993, recieve_email_IMAP_test.Credential_AES.DecryptString(g_key, encryptedEmail, g_iv),
recieve_email_IMAP_test.Credential_AES.DecryptString(g_key, encryptedPass, g_iv), AuthMethod.Login, true))
{
IEnumerable<uint> uids = Client.Search(SearchCondition.Unseen());
while (uids.Count() == 0)
{
Console.WriteLine("No new emails. Sleeping for 10 seconds...");
uids = Client.Search(SearchCondition.Unseen());
Thread.Sleep(10000); // sleep for 10 seconds and wait before polling again
}
recieve_email_IMAP_test.helperFuncs.imapParser(uids, Client); // used for parsing new emails later
Client.Logout();
}
var configs = new Dictionary<string, string>
{
{"bootstrap.servers", MY_SERVER},
{"security.protocol", "SASL_PLAINTEXT"},
{"sasl.mechanism", "SCRAM-SHA-256"},
{"sasl.username", "MY_USERNAME"},
{"sasl.password", "MY_PWD"},
{"group.id", "sample_group"} // added
};
var consumerConfig = new ConsumerConfig(configs);
using (var schemaRegistry = new CachedSchemaRegistryClient(schemaRegistryConfig))
using (var consumer = new ConsumerBuilder<string, MyModel>(consumerConfig)
.SetKeyDeserializer(new AvroDeserializer<string>(schemaRegistry, avroSerializerConfig).AsSyncOverAsync())
.SetValueDeserializer(new AvroDeserializer<MyModel>(schemaRegistry, avroSerializerConfig).AsSyncOverAsync())
.Build())
{
consumer.Subscribe(TOPIC_NAME);
while (true)
{
var result = consumer.Consume(); //stuck here
Console.WriteLine(result);
}
}
As stated in the code, there is no response coming from consumer.Consume() . It does not throw any error message even during consumer.Subscribe() What will be the possible reason? (I am new to Kafka Consumer)
Maybe there is no message in Topic, so nothing to receive?
The code asked for missing 'group.id', so I added {"group.id", "sample_group"} in config and wrap with ConsumerConfig. Is random name ("sample_group") allowed for group.id or should it be something retrieved from Topic information?
anything else?
Your code looks fine and the fact that no errors and Exceptions are showing up is also a good sign.
"1. Maybe there is no message in Topic, so nothing to receive?"
Even if there are no messages in the Kafka topic, your observation matches the expected behavior. In the while(true) loop you are continuously trying to fetch data from the topic and if nothing can be fetched the consumer will try again in the next iteration. Consumers on Kafka topics are meant to read a topic sequentially while running continuously. It is totally fine that some times the consumers has consumed all messages and stays idle for some time until new message arrive into the topic. During the waiting time the consumer will not stop or crash.
Keep in mind that messages in a Kafka topic have by default a retention period of 7 days. After that time, the messages will be deleted.
"2. The code asked for missing 'group.id', so I added {"group.id", "sample_group"} in config and wrap with ConsumerConfig. Is random name ("sample_group") allowed for group.id or should it be something retrieved from Topic information?"
Yes, the name "sample_group" is allowed as a ConsumerGroup name. There are no reserved consumer group names so this name will not cause any trouble.
"3. anything else?"
By default, a KafkaConsumer reads the messages from "latest" offset. That means, if you run a ConsumerGroup for the very first time it will not read all messages from beginning but rather from end. Check the consumer configurations in the .net Kafka-API documentation for something like auto_offset_reset. You might set this configuration to "earliest" in case you want to read all messages from beginning. Please note, that as soon as you run your application with a given ConsumerGroup for the first time, the second time you run this application this configuration auto_offset_reset will not have any impact because the ConsumerGroup is now registered within Kafka.
What you can usually do, to ensure that the consumer should actually read messages is if you start your consumer before you start producing messages to that topic. Then, (almost) independent of your configuration you should see data flowing through your application.
I have an application that processes invoices from vendors by using a Gmail inbox and S22.Imap to read emails and pdf attachments. It has been used for several years now without any issue, but lately, I have been seeing "The stream could not be read" errors when trying to get all messages from the inbox to be processed using the command:
IEnumerable<uint> uids = client.Search(SearchCondition.Unseen());
IEnumerable<MailMessage> messages = client.GetMessages(uids);
This exception seems to only occur when a certain vendor sends a large number of emails at once. I have been looking through the S22.Imap documentation, but have not found anything of help.
Note: I did see this way of getting emails, but the original method I used above is the actual example for "Downloading unseen mail messages" in the documentation:
// Fetch the messages and process each one
foreach (uint uid in uids)
{
MailMessage message = client.GetMessage(uid);
ProcessMessage(message);
}
My problem here is that I have not been able to reproduce the exception locally, and I don't really understand how it relates to the client.GetMessages(uids) command.
Is there a maximum number of messages that can be processed this way, or is there a better way to read and process emails?
If client.GetMessages(uids) is the best way to get those emails, is there a good way that I could catch this exception and continue when it does happen?
I would appreciate any advice. Thanks,
Not sure if this will ever even help anyone since it is so particular to this one situation with how we are using s22 to process emails, but I figured I would post the answer anyhow.
The main issue that was happening was that all the emails were being marked 'unread' when using the client.GetMessages(uids) syntax. This marked all emails in the inbox as 'read' (which I counted unread emails to make sure that every email was processed). Changing this method to go through the collected uid one per time prevented this problem. The error still happens once in a while, but it does not mess anything up for the user. The message that failed to get processed just gets processed the next time the app runs. So basically, I just changed how I retrieved the unread emails with S22.Imap and dealt with the errors by catching them and moving on.
//get the message one at a time from uids. catching any rare errors with a message
try
{
MailMessage message = client.GetMessage(uid);
ProcessMessage(message);
}
catch(Exception e)
{
System.Threading.Thread.Sleep(1000);
SendErrorEmail("An exception has been caught: " + e.ToString() +
Environment.NewLine + "UID: " + uid.ToString());
}
I'm sending large object using the following code in C# (NetMQ):
var client = new DealerSocket("<connection String>");
var serializedData = new string('*', 500000);
var message = new List<byte[]>();
message.Add(Encoding.UTF8.GetBytes("BulkSend"));
message.Add(Encoding.UTF8.GetBytes(serializedData));
client.TrySendMultipartBytes(TimeSpan.FromMilliseconds(3000), message);
This code is taking 90% of CPU usage if it would be used in a high traffic (for example 10MB message per second).
After some research, I've tried the two following codes. First of all, I removed the first frame ("Bulk Send"):
var client = new DealerSocket("<connection String>");
var serializedData = new string('*', 500000);
var message = new List<byte[]>();
message.Add(Encoding.UTF8.GetBytes(serializedData));
client.TrySendMultipartBytes(TimeSpan.FromMilliseconds(3000), message);
Surprisingly, the performance was improved. Secondly, I rearrange two frames. I mean moving the large frame to the first. Like the following:
var client = new DealerSocket("<connection String>");
var serializedData = new string('*', 500000);
var message = new List<byte[]>();
// change the order of two following codes
message.Add(Encoding.UTF8.GetBytes(serializedData));
message.Add(Encoding.UTF8.GetBytes("BulkSend"));
client.TrySendMultipartBytes(TimeSpan.FromMilliseconds(3000), message);
Again surprisingly the performance was improved!
What's the problem? How can I improve the performance? What about using zproto on netmq? Is there any proper document around that?
As I found, there is a problem with sending multipart message. You can see this link http://hintjens.com/blog:84.
And probably encode your message into one message and then sending it!
High CPU/low performance might be because of GC and memory allocations. I tried to send the same message size with my framework, which uses NetMQ, with server and client working on same machine. In case the client was waiting for the response before sending another message to server I achieved 60K msg/sec. Nevertheless, if the client was sending hundreds of messages in parallel, CPU was as well high and throughput was tens of messages per sec. At some point in time I've even got OutOfMemoryException.
Would be good, if you post the complete code.
UPD: Sorry, I measured it wrong. With with either 100 messages sent in parallel or just one the performance is just 120 msg/sec
I'm currently writing a chat-server, bottom up, in C#.
It's like one single big room, with all the clients in, and then you can initiate private chats also. I've also laid the code out for future integration of multiple rooms (but not necessary right now).
It's been written mostly for fun, but also because I'm going to make a new chatsite for young people like myself, as there are no one left here in Denmark.
I've just tested it out with 170 clients (Written in Javascript with JQuery and a Flash bridge to socket connectivity). The response time on local network from a message being sent to it being delivered was less than 1 second. But now I'm considering what kind of performance I'm able to squeeze out of this.
I can see if I connect two clients and then 168 other, and write on client 2 and watch client 1, it comes up immediately on client 1. And the CPU usage and RAM usage shows no signs of server stress at all. It copes fine and I think it can scale to at least 1000 - 1500 without the slightest problem.
I have however noticed something, and that is if I open the 170 clients again and send a message on client 1 and watch on client 170, there is a log around 750 milliseconds or so.
I know the problem, and that is, when the server receives a chat message it broadcasts it to every client on the server. It does however need to enumerate all these clients, and that takes time. The delay right now is very acceptable for a chat, but I'm worried client 1 sending to client 750 maybe (not tested yet) will take 2 - 3 seconds. And i'm also worried when I begin to get maybe 2 - 3 messages a second.
So to sum it up, I want to speed up the server broadcasting process. I'm already utilizing a parallel foreach loop and I'm also using asynchronous sockets.
Here is the broadcasting code:
lock (_clientLock)
{
Parallel.ForEach(_clients, c =>
{
c.Value.Send(message);
});
}
And here is the send function being invoked on each client:
try {
byte[] bytesOut = System.Text.Encoding.UTF8.GetBytes(message + "\0");
_socket.BeginSend(bytesOut, 0, bytesOut.Length, SocketFlags.None, new AsyncCallback(OnSocketSent), null);
}
catch (Exception ex) { Drop(); }
I want to know if there is any way to speed this up?
I've considered writing some kind of helper class accepting messages in a que and then using maybe 20 threads or so, to split up the broadcasting list.
But I want to know YOUR opinions on this topic, I'm a student and I want to learn! (:
Btw. I like how you spot problems in your code when about to post to stack overflow. I've now made an overloaded function to accept a byte array from the server class when using broadcast, so the UTF-8 conversion only needs to happen once. Also to play it safe, the calculation of the byte array length only happens once now. See the updated version below.
But I'm still interested in ways of improving this even more!
Updated broadcast function:
lock (_clientLock)
{
byte[] bytesOut = System.Text.Encoding.UTF8.GetBytes(message + "\0");
int bytesOutLength = bytesOut.Length;
Parallel.ForEach(_clients, c =>
{
c.Value.Send(bytesOut, bytesOutLength);
});
}
Updated send function on client object:
public void Send(byte[] message, int length)
{
try
{
_socket.BeginSend(message, 0, length, SocketFlags.None, new AsyncCallback(OnSocketSent), null);
}
catch (Exception ex) { Drop(); }
}
~1s sounds really slow for a local network. Average LAN latency is 0.3ms. Is Nagle enabled or disabled? I'm guessing it is enabled... so: change that (Socket.NoDelay). That does mean you have to take responsibility for not writing to the socket in an overly-fragmented way, of course - so don't drip the message in character-by-character. Assemble the message to send (or better: multiple outstanding messages) in memory, and send it as a unit.