Download attachments according to number in subject (MailKit library) - c#

I'm downloading attachments from e-mail with this code:
int count = client.Count();
List<MimeMessage> allMessages = new List<MimeMessage>(count);
for (int i = 0; i < count; i++)
{
allMessages.Add(client.GetMessage(i));
foreach (var attachment in allMessages[i].Attachments)
{
using (var stream = File.Create(AppDomain.CurrentDomain.BaseDirectory + "/folderForSegments/" + attachment.ContentType.Name))
{
if (attachment is MessagePart)
{
var part = (MessagePart)attachment;
part.Message.WriteTo(stream);
}
else
{
var part = (MimePart)attachment;
part.ContentObject.DecodeTo(stream);
}
}
}
}
It works perfect but I want download attachments in sequence, according to number in subject. For example: if my inbox looks like this
attachments will be saved on my disc in order: 6, 8, 7, 3, 2... I want save attachments in order: 1, 2, 3, 4, 5... How can I do this?

For POP3, there's no way to download the messages in that order without knowing ahead of time what order the messages were in on the server.
If order is more important than wasted bandwidth, you could download the headers first using client.GetHeader(i) for each message so that you can use the Subject header value to determine the order, but that's a lot of wasted bandwidth because you'd just end up downloading the message headers a second time when you downloaded the messages.
Another option is to download all of the messages, add them to a List<T> and then sort them based on Subject before iterating over the messages and saving the attachments, but this might use too much RAM depending on how large your messages are.
Edit:
For IMAP, assuming your server supports the SORT extension, you can do something like this:
if (client.Capabilities.HasFlag (ImapCapabilities.Sort)) {
var query = SearchQuery.SubjectContains ("damian_mistrz_");
var orderBy = new OrderBy[] { OrderBy.Subject };
foreach (var uid in folder.Sort (query, orderBy) {
var message = folder.GetMessage (uid);
// save attachments...
}
}
If your server does not support SORT, then you could probably do something like this:
var query = SearchQuery.SubjectContains ("damian_mistrz_");
var orderBy = new OrderBy[] { OrderBy.Subject };
var uids = folder.Search (query);
var items = folder.Fetch (uids, MessageSummaryItems.Envelope | MessageSummaryItems.UniqueId);
items.Sort (orderBy);
foreach (var item in items) {
var message = folder.GetMessage (item.UniqueId);
// save the attachments...
}

Related

Doc2Vec (Or Word2Vec) In Catalyst C#: How Do I get it to give results? (FastText)

I'm trying to replicate results from Gensim in C# to compare results and see if we need to bother trying to get Python to work within our broader C# context. I have been programming in C# for about a week, am usually a Python coder. I managed to get LDA to function and assign topics with C#, but there is no Catalyst model (that I could find) that does Doc2Vec explicitly, but rather I need to do something with FastText as they have in their sample code:
// Training a new FastText word2vec embedding model is as simple as this:
var nlp = await Pipeline.ForAsync(Language.English);
var ft = new FastText(Language.English, 0, "wiki-word2vec");
ft.Data.Type = FastText.ModelType.CBow;
ft.Data.Loss = FastText.LossType.NegativeSampling;
ft.Train(nlp.Process(GetDocs()));
ft.StoreAsync();
The claim is that it is simple, and fair enough... but what do I do with this? I am using my own data, a list of IDocuments, each with a label attached:
using (var csv = CsvDataReader.Create("Jira_Export_Combined.csv", new CsvDataReaderOptions
{
BufferSize = 0x20000
}))
{
while (await csv.ReadAsync())
{
var a = csv.GetString(1); // issue key
var b = csv.GetString(14); // the actual bug
// if (jira_base.Keys.ToList().Contains(a) == false)
if (jira.Keys.ToList().Contains(a) == false)
{ // not already in our dictionary... too many repeats
if (b.Contains("{panel"))
{
// get just the details/desc/etc
b = b.Substring(b.IndexOf("}") + 1, b.Length - b.IndexOf("}") - 1);
try { b = b.Substring(0, b.IndexOf("{panel}")); }
catch { }
}
b = b.Replace("\r\n", "");
jira.Add(a, nlp.ProcessSingle(new Document(b,Language.English)));
} // end if
} // end while loop
From a set of Jira Tasks and then I add labels:
foreach (KeyValuePair<string, IDocument> item in jira) { jira[item.Key].Labels.Add(item.Key); }
Then I add to a list (based on a breakdown from a topic model where I assign all docs that are at or above a threshold in that topic to the topic, jira_topics[n] where n is the topic numner, as such:
var training_lst = new List<IDocument>();
foreach (var doc in jira_topics[topic_num]) { training_lst.Add(jira[doc]); }
When I run the following code:
// FastText....
var ft = new FastText(Language.English, 0, $"vector-model-topic_{topic_num}");
ft.Data.Type = FastText.ModelType.Skipgram;
ft.Data.Loss = FastText.LossType.NegativeSampling;
ft.Train(training_lst);
var wtf = ft.PredictMax(training_lst[0]);
wtf is (null,NaN). [hence the name]
What am I missing? What else do I need to do to get Catalyst to vectorize my data? I want to grab the cosine similarities between the jira tasks and some other data I have, but I can't even get the Jira data into anything resembling a vectorization I can apply to something. Help!
Update:
So, Predict methods apparently only work for supervised learning in FastText (see comments below). And the following:
var wtf = ft.CompareDocuments(training_lst[0], training_lst[0]);
Throws an Implementation error (and only doesn't work with PVDM). How do I use PVDM, PVDCbow in Catalyst?

C# MailKit get messages conversation/replies inside treeview

i'm trying to replicate the view of a mailbox, i try to use references and threads but don't work, somethimes thread has uniqueid null.
foreach (var rfr in Message.References ?? new MimeKit.MessageIdList())
{
var _uids = Imap.Inbox.Search(SearchQuery.HeaderContains("Message-Id", rfr));
if (_uids.Count > 0)
{
var _messages = Imap.Inbox.Fetch(_uids.ToList(), MessageSummaryItems.Envelope | MessageSummaryItems.Flags).OrderByDescending(o => o.Date);
foreach (var msg in _messages)
{
_Added.Add(msg.UniqueId);
RequestModel _model = new RequestModel
{
Address = msg.Envelope.From.Mailboxes.FirstOrDefault().Name ?? msg.Envelope.From.Mailboxes.FirstOrDefault().Address,
Subject = msg.Envelope.Subject,
Date = msg.Date.ToLocalTime().ToString(),
IsSeen = msg.Flags.Value.HasFlag(MailKit.MessageFlags.Seen),
Childs = new List<Scratch.MainWindow.RequestModel>(),
};
_retValue.Add(_model);
}
}
}
var _messages = _imapClient.Inbox.Fetch(_uids.ToList(), MessageSummaryItems.Envelope | MessageSummaryItems.Flags | MessageSummaryItems.References).OrderByDescending(o => o.Date).Take(50);
var _threads = MessageThreader.Thread(_messages, ThreadingAlgorithm.References);
The MessageThreader class uses the References (which contin a list of Message-Id values tracing back to the root of the thread) in order to construct the tree of messages. Obviously, if the list of message summaries that you give to the MessageThreader are missing some of those references, then the returned tree will have some empty nodes. This is why some of said nodes have a null UniqueId value.
FWIW, a few tips for you:
Don't do _uids.ToList() - _uids is already an IList<UniqueId>, why duplicate it for no reason?
It's more efficient to use the orderBy argument to MessageThreader.
Like this:
var orderBy = new OrderBy[] { OrderBy.ReverseDate };
var threads = MessageThreader.Thread (summaries, ThreadingAlgorithm.References, orderBy);

C# check conversion from List<string> to single string using String.Join, is possible or not?

I have one List<string> which length is undefined, and for some purpose I'm converting entire List<string> to string, so I want's to check before conversion that it is possible or not(is it gonna throw out of memory exception?) so I can process that much data and continue in another batch.
Sample
int drc = ImportConfiguration.Data.Count;
List<string> queries = new List<string>() { };
//iterate over data row to generate query and execute it
for (int drn = 0; drn < drc; drn++)//drn stands to Data Row Number
{
queries.Add(Generate(ImportConfiguration.Data[drn], drn));
//SO HERE I WANT"S TO CHECK FOR SIZE
//IF IT"S NOT POSSIBLE IN NEXT ITERATION THAN I'LL EXECUTE IT RIGHT NOW
//AND EMPTIED LIST AGAIN FOR NEXT BATCH
if (drn == drc - 1 || drn % 5000 == 0)
{
SqlHelper.ExecuteNonQuery(connection, System.Data.CommandType.Text, String.Join(Environment.NewLine, queries));
queries = new List<string>() { };
}
}
Since you are trying to send a large amount of text to a SQL Server instance, you could use SQL Server's streaming support to write the string to the stream as you go, minimizing the amount of memory needed to construct the data to send.
I can't say it is not possible but I think a better way would be to do the join and catch any exceptions:
try
{
var joined = string.Join(",", list);
}
catch(OutOfMemoryException)
{
// join failed, take action (log, notify user, etc.)
}
Note: if the exception is happening, then you need to consider a different approach than using a list and joining.
You could try:
List<string> theList;
try {
String allString = String.Join(",", theList.ToArray());
} catch (OutOfMemoryException e) {
// ... handle OutOfMemoryException exception (e)
}
EDIT
Based on your comment.
You could give an estimation in the following way.
Get available memory: Take a look at this post
Get sum size of your list strings theList.Sum(s => s.Length);
List<string> theList = new List<string>{ "AAA", "BBB" };
// number of characters
var allSize = theList.Sum(s => s.Length);
// available memory
Process proc = Process.GetCurrentProcess();
var availableMemory = proc.PrivateMemorySize64;;
if (availableMemory > allSize) {
// you can try
try {
String allString = String.Join(",", theList.ToArray());
} catch (OutOfMemoryException e) {
// ... handle OutOfMemoryException exception (e)
}
} else {
// it is not going to work...
}

How to get the recipients of an email using EWS

I am struggling to get the recipients of an email.
I understand that the Recipients is an array, so I need to put them into an array, but my code will not compile:
do
{
// set the prioperties we need for the entire result set
view.PropertySet = new PropertySet(
BasePropertySet.IdOnly,
ItemSchema.Subject,
ItemSchema.DateTimeReceived,
ItemSchema.DisplayTo, EmailMessageSchema.ToRecipients,
EmailMessageSchema.From, EmailMessageSchema.IsRead,
EmailMessageSchema.HasAttachments, ItemSchema.MimeContent,
EmailMessageSchema.Body, EmailMessageSchema.Sender,
ItemSchema.Body) { RequestedBodyType = BodyType.Text };
// load the properties for the entire batch
service.LoadPropertiesForItems(results, view.PropertySet);
e2cSessionLog("\tcommon.GetUnReadMailAll", "retrieved " + results.Count() + " emails from Mailbox (" + common.strInboxURL + ")");
foreach (EmailMessage email in results)
// looping through all the emails
{
emailSenderName = email.From.Address;
sEmailSubject = email.Subject;
emailDateTimeReceived = email.DateTimeReceived.ToShortDateString();
emailHasAttachments = email.HasAttachments;
ItemId itemId = email.Id;
emailDisplayTo = email.DisplayTo;
sEmailBody = email.Body; //.Text;
Recipients = email.ToRecipients;
....
the last line there will not compile, as apparently I cannot implicitly convert the collection ToRecipients to a string...
so I tried to loop through all the ToRecipients:
string[] Recipients;
for (int iIdx=0; iIdx<-email.ToRecipients.Count; iIdx++)
{
Recipients[iIdx] = email.ToRecipients[iIdx].ToString();
}
but I have obviously not declare this properly, as it won't compile with the message that Recipients is unassigned.
What is the correct way to assign this?
I need to be able to use the recipients again later - for example to send them a 'heads up' email about a problem for example.
You need to initialize the array correctly, and you need to use the Address property of a ToRecipient:
var Recipients = new string[email.ToRecipients.Count];
for (int iIdx = 0; iIdx < email.ToRecipients.Count; iIdx++) {
Recipients[iIdx] = email.ToRecipients[iIdx].Address;
}
BTW, I think you have a typo in your pseudo-code:
for(...; iIdx<-email.ToRecipients.Count; ...) {
You have a minus - in there, which would result in no iterations since the first iteration would not pass (0 < -count is false). I think you mean
for(...; iIdx < email.ToRecipients.Count; ...) {
UPDATE
A much simpler, less error-prone, solution would be:
var recipients = email.ToRecipients
.Select(x => x.Address)
.ToList(); // or ToArray()

Why this difference between foreach vs Parallel.ForEach?

Can anyone explain to me in simple langauage why I get a file about 65 k when using foreach and more then 3 GB when using Parallel.ForEach?
The code for the foreach:
// start node xml document
var logItems = new XElement("log", new XAttribute("start", DateTime.Now.ToString("yyyy-MM-ddTHH:mm:ss")));
var products = new ProductLogic().SelectProducts();
var productGroupLogic = new ProductGroupLogic();
var productOptionLogic = new ProductOptionLogic();
// loop through all products
foreach (var product in products)
{
// is in a specific group
var id = Convert.ToInt32(product["ProductID"]);
var isInGroup = productGroupLogic.GetProductGroups(new int[] { id }.ToList(), groupId).Count > 0;
// get product stock per option
var productSizes = productOptionLogic.GetProductStockByProductId(id).ToList();
// any stock available
var stock = productSizes.Sum(ps => ps.Stock);
var hasStock = stock > 0;
// get webpage for this product
var productUrl = string.Format(url, id);
var htmlPage = Html.Page.GetWebPage(productUrl);
// check if there is anything to log
var addToLog = false;
XElement sizeElements = null;
// if has no stock or in group
if (!hasStock || isInGroupNew)
{
// page shows => not ok => LOG!
if (!htmlPage.NotFound) addToLog = true;
}
// if page is ok
if (htmlPage.IsOk)
{
sizeElements = GetSizeElements(htmlPage.Html, productSizes);
addToLog = sizeElements != null;
}
if (addToLog) logItems.Add(CreateElement(productUrl, htmlPage, stock, isInGroup, sizeElements));
}
// save
var xDocument = new XDocument(new XDeclaration("1.0", "utf-8", "yes"), new XElement("log", logItems));
xDocument.Save(fileName);
Use of the parallel code is a minor change, just replaced the foreach with Parallel.ForEach:
// loop through all products
Parallel.ForEach(products, product =>
{
... code ...
};
The methods GetSizeElements and CreateElements are both static.
update1
I made the methods GetSizeElements and CreateElements threadsafe with a lock, also doesn't help.
update2
I get answer to solve the problem. That's nice and fine. But I would like to get some more insigths on why this codes creates a file that is so much bigger then the foreach solutions. I am trying get some more sense in how the code is working when using threads. That way I get more insight and can I learn to avoid the pitfalls.
One thing stands out:
if (addToLog)
logItems.Add(CreateElement(productUrl, htmlPage, stock, isInGroup, sizeElements));
logItems is not tread-safe. That could be your core problem but there are lots of other possibilities.
You have the output files, look for the differences.
Try to define the following parameters inside the foreach loop.
var productGroupLogic = new ProductGroupLogic();
var productOptionLogic = new ProductOptionLogic();
I think the only two is used by all of your threads inside the parallel foreach loop and the result is multiplied unnecessaryly.

Categories