i'm using mimekit for receive and send mail for my project. I'm sending received mails with some modifications (to & from parts). And now i need to modify in body section. I'll replace specific word with asterix chars. Specific text different for every mail. Mail may be any format. You can see i found what i want but i don't know how can i replace without any error?
MimeMessage.Body is a tree structure, like MIME, so you'll have to navigate to the MimePart that contains the content that you want to modify.
In this case, since you want to modify a text/* MimePart, it will actually be a subclass of MimePart called TextPart which is what has the .Text property (which is writable).
I've written documentation on how to traverse the MIME structure of a message to find the part that you are looking for here: http://www.mimekit.org/docs/html/WorkingWithMessages.htm
A very simple solution might be:
var part = message.BodyParts.OfType<TextPart> ().FirstOrDefault ();
part.Text = part.Text.Replace ("x", "y");
But keep in mind that that logic assumes that the first text/* part you find is the one you are looking for.
Related
I'm trying to read the content body of a message in an Azure Logic App, but I'm not having much success. I have seen a lot of suggestions which say that the body is base64 encoded, and suggest using the following to decode:
#{json(base64ToString(triggerBody()?['ContentData']))}
The base64ToString(...) part is decoding the content into a string correctly, but the string appears to contain a prefix with some extra serialization information at the start:
#string3http://schemas.microsoft.com/2003/10/Serialization/�3{"Foo":"Bar"}
There are also some extra characters in that string that are not being displayed in my browser. So the json(...) function doesn't accept the input, and gives an error instead.
InvalidTemplate. Unable to process template language expressions in
action 'HTTP' inputs at line '1' and column '2451': 'The template
language function 'json' parameter is not valid. The provided value
#string3http://schemas.microsoft.com/2003/10/Serialization/�3{"Foo":"bar" }
cannot be parsed: Unexpected character encountered while parsing value: #. Path '', line 0, position 0.. Please see https://aka.ms/logicexpressions#json for usage details.'.
For reference, the messages are added to the topic using the .NET service bus client (the client shouldn't matter, but this looks rather C#-ish):
await TopicClient.SendAsync(new BrokeredMessage(JsonConvert.SerializeObject(item)));
How can I read this correctly as a JSON object in my Logic App?
This is caused by how the message is placed on the ServiceBus, specifically in the C# code. I was using the following code to add a new message:
var json = JsonConvert.SerializeObject(item);
var message = new BrokeredMessage(json);
await TopicClient.SendAsync(message);
This code looks fine, and works between different C# services no problem. The problem is caused by the way the BrokeredMessage(Object) constructor serializes the payload given to it:
Initializes a new instance of the BrokeredMessage class from a given object by using DataContractSerializer with a binary XmlDictionaryWriter.
That means the content is serialized as binary XML, which explains the prefix and the unrecognizable characters. This is hidden by the C# implementation when deserializing, and it returns the object you were expecting, but it becomes apparent when using a different library (such as the one used by Azure Logic Apps).
There are two alternatives to handle this problem:
Make sure the receiver can handle messages in binary XML format
Make sure the sender actually uses the format we want, e.g. JSON.
Paco de la Cruz's answer handles the first case, using substring, indexOf and lastIndexOf:
#json(substring(base64ToString(triggerBody()?['ContentData']), indexof(base64ToString(triggerBody()?['ContentData']), '{'), add(1, sub(lastindexof(base64ToString(triggerBody()?['ContentData']), '}'), indexof(base64ToString(triggerBody()?['ContentData']), '}')))))
As for the second case, fixing the problem at the source simply involves using the BrokeredMessage(Stream) constructor instead. That way, we have direct control over the content:
var json = JsonConvert.SerializeObject(item);
var bytes = Encoding.UTF8.GetBytes(json);
var stream = new MemoryStream(bytes);
var message = new BrokeredMessage(stream, true);
await TopicClient.SendAsync(message);
You can use the substring function together with indexOf and lastIndexOf to get only the JSON substring.
Unfortunately, it's rather complex, but it should look something like this:
#json(substring(base64ToString(triggerBody()?['ContentData']), indexof(base64ToString(triggerBody()?['ContentData']), '{'), add(1, sub(lastindexof(base64ToString(triggerBody()?['ContentData']), '}'), indexof(base64ToString(triggerBody()?['ContentData']), '}')))))
More info on how to use these functions here.
HTH
Paco de la Cruz solution worked for me, though I had to swap out the last '}' in the expression for a '{', otherwise it finds the wrong end of the data segment.
I also split it into two steps to make it a little more manageable.
First I get the decoded string out of the message into a variable (that I've called MC) using:
#{base64ToString(triggerBody()?['ContentData'])}
then in another logic app action do the substring extraction:
#{substring(variables('MC'),indexof(variables('MC'),'{'),add(1,sub(lastindexof(variables('MC'),'}'),indexof(variables('MC'),'{'))))}
Note that the last string literal '{' is reversed from Paco's solution.
This is working for my test cases, but I'm not sure how robust this is.
Also, I've left it as a String, I do the conversion to JSON later in my logic app.
UPDATE
We have found that just occasionally (2 in several hundred runs) the text that we want to discard can contain the '{' character.
I have modified our expression to explicitly locate the start of the data segment, which for me is:
'{"IntegrationRequest"'
so the substitution becomes:
#{substring(variables('MC'),indexof(variables('MC'),'{"IntegrationRequest"'),add(1,sub(lastindexof(variables('MC'),'}'),indexof(variables('MC'),'{"IntegrationRequest"'))))}
I'm sending my fix marketdata request message as:
8=FIXT.1.1☺9=168☺35=V☺34=2☺49=XXXXX☺52=20160622-09:50:59.240☺56=XXXX☺262=1976060316☺263=1☺264=1☺265=0☺267=1☺269=0☺146=1☺55=ABC☺48=ABC☺22=8☺167=FXNDF☺762=PERIOD☺20000=1M☺10=165☺
In this I followed the order 35=V as follows:
55=ABC|48=ABC|22=8|167=FXNDF|762=PERIOD|20000=1M
I want to rearrange my message sequence as this:
146=1|55=ABC||167=FXNDF|762=PERIOD|48=ABC|20000=1M|22=8
I'm using Quickfix DLL.
The out of range exception you are getting is not a problem with the order of the fields of the FIX4.4 message but a problem with the contents of one particular field tag 625. This tag is usually called TradingSessionSubID and is usually expecting STRING content. However your configuration could easily redefine it as something else entirely. Your dictionary definition file will have the precise requirement for your implementation, you'd best look there, the file is often named something like FIX44.xml.
There is rarely any need to rearrange the order of your FIX message fields. You could try changing the order of the fields for your particular message in the dictionary definition file and see if that does it for you.
I have a Web application that allows to upload Outlook Mails (*.msg) with File Upload.
The customer want to forbid the store Mails that are Digitaly Signed or Encrypted.
So after uploading the Mail I should someshow check the mail if it's Signed or Encrypted.
If there a way to check that? Like a pattern in the Stream of the File?
Checking for the English words not only is valid, but it is the actual documented way.
Refer to the authority:
2.1.3.1.3 Recognizing a Message Object that Represents a Clear-Signed Message
If a Message object has a message class (PidTagMessageClass property
([MS-OXCMSG] section 2.2.1.3)) value of
"IPM.Note.SMIME.MultipartSigned" and contains exactly one Attachment
object, it SHOULD be treated as a clear-signed message. Additional
verification steps can be performed to verify that the Attachment
object is marked with the appropriate media type (for example, the
PidTagAttachMimeTag property ([MS-OXPROPS] section 2.680) has a value
of "multipart/signed") and represents a valid multipart/signed MIME
entity as specified in [RFC1847]. If the message class value is not
"IPM.Note.SMIME.MultipartSigned" but it ends with the suffix
".SMIME.MultipartSigned", the Message object MAY<7><8> be treated as a
clear-signed message.
If a Message object with a message class value of
"IPM.Note.SMIME.MultipartSigned" does not have the structure specified
in section 2.1.3.1, the behavior is undefined.
2.1.3.2.3 Recognizing a Message Object that Represents an Opaque-Signed or Encrypted S/MIME
If a Message object has the message class (PidTagMessageClass property
([MS-OXCMSG] section 2.2.1.3)) value of "IPM.Note.SMIME" and contains
exactly one Attachment object, it SHOULD be treated as an
opaque-signed message or an encrypted message. Additional verification
steps can be performed to verify that the Attachment object is marked
with the appropriate media type (for example, the PidTagAttachMimeTag
property ([MS-OXPROPS] section 2.680) is either
"application/pkcs7-mime" or "application/x-pkcs7-mime", or it is
"application/octet-stream" and filename, as specified by the
PidTagAttachFilename property ([MS-OXPROPS] section 2.671), and has a
file extension ".p7m") and represents a valid encrypted or
opaque-signed message, as specified in [RFC3852]. If the value of the
message class is not "IPM.Note.SMIME", but ends with the suffix
".SMIME", then the Message object MAY<11> be treated as an
opaque-signed message or an encrypted message.
The message class value "IPM.Note.SMIME" can be ambiguous.<12>
If a Message object has a message class value of "IPM.Note.SMIME" does
not have the appropriate structure or content as specified in section
2.1.3.2, then the behavior is undefined.
EDIT:
To be more specific, yes, you SHOULD look for a "pattern in the Stream of the file".
Specifically, if the MSG is unicode, you would scan the "__substg1.0_001A001F" stream, and check for the patterns mentioned above.
The MSG file is an OLE Structured Storage file that contains streams and storages. To get at the streams, use an OLE Storage library like OpenMCDF if you are in the C# world. There are similar ones for java, python, etc.
This blog post describes the format pretty well and another post by the same author describes exactly what you're after, which is information on rights managed mail messages.
Essentially as long as the message conforms to the file format these posts and specifications should give you all you need to check for signatures and encryption.
Checking for English words is a bad idea. What if users don't write in English and what if a psuedo-random stream of encrypted data happens to create words like "or" or "and" in some encoding they're using? It's just not reliable.
EDIT:
To clarify what I mean when I say that checking for English words is a bad idea, I mean to say that simply scanning over the file and verifying if a certain set of words is present is the bad idea. Since someone down voted this solution, I feel that they may have misunderstood what I was saying because of this ambiguity.
As another user indicated in their answer, parsing the object out and actually handling conditions in the data is fine. You can see from their post that it is the documented method and works fine, because it's based on the standards. This is similar to the information I gave here with the two posts and the format specification.
To open the message and look into, I suggest you to use Outlook Redemption. This is what I use and it works without Outlook installed on the server.
If the GetMessageFromMsgFile method returns an RDOEncryptedMessage, it means your mail is encrypted or signed.
I have an application allows a user to copy paste html into a form. This html gets sent as an email, and the email server will not allow more than 1000 characters per line. So, I'd like to insert line breaks (\r\n) into the html after the user has hit submit. How can I do this without changing the content?
My idea is this:
html.replace('<', '\r\n<');
But is that guaranteed to not change the result? Is '<' not allowed in attributes?
Edit: I'm actually thinking this will not work because the html could have a script block with something like if(x < 3). I guess what I need is an html pretty printer that works in either js or C#.
If you Base64 encode the content, then you can break up the content into however many lines you want.
Email MIME standard uses transfer encoding techniques to solve this problem. Ideally you would be using a mail library that takes care of this for you, so you can insert lines of any length.
Using the System.Net.Mail.MailMessage class in C#, you should be able to construct a normal message and it will transfer-encode it for you. If that doesn't work, you can also construct a multi-part message with a single System.Net.Mail.AlternativeView and set the transfer-encoding explicitly.
Here is a sample I am currently using (note it has a character encoding bug, so your body text must be a unicode string):
private void Send(string body, bool isHtml, string subject, string recipientAddress, string recipientName, string fromAddress)
{
using (var message = new MailMessage(new MailAddress(fromAddress),
new MailAddress(recipientAddress, recipientName)))
{
message.Subject = subject;
var alternateView = AlternateView.CreateAlternateViewFromString(body, message.BodyEncoding,
isHtml ? "text/html" : "text/plain");
alternateView.TransferEncoding = TransferEncoding.QuotedPrintable;
message.AlternateViews.Add(alternateView);
var client = new SmtpClient();
client.Send(message);
}
}
You're getting into dangerous territory attempting to parse HTML with a replace function. The easiest method would be to just display a warning box on the form that tells the user that lines cannot be longer than 1000 characters, and return an error message if they attempt to submit content with lines over that length.
Otherwise, you could insert a linebreak after X number of characters, and insert some special markup (like <!--AUTO-LINEBREAK-->, or similar) that informs whoever is receiving the e-mail that an automatic line break was inserted.
Add normal line breaks where you think they should be. For example:
Off the top of my head, find all <p>, <table>, <tr>,<td>,<br>, and <div> tags and add a \r\n right before them.
Once that is done, loop through all the lines one more time. If there are any that are still 1000+ characters long, I would insert a \r\n in the whitespace.
Also, you should be removing any script tags from the HTML email body. Having script tags can cause all types of problems (marked as spam, marked as a virus, blocked, etc..).
I am not sure how you are delivering your email... if it is handed off to a php script that then send it to a mail server or uses the mail() method, then this link might help.
http://php.net/manual/en/function.wordwrap.php
If not, can you clarify your question a bit?
Another simply thought, is that you could use:
html.replace('','\r\n');
or:
html.replace('',''+String.fromCharCode(13));//inserts a carriage return
However, since the will ideally be parsed in the browser, inserting "\r\n" may not be effective and may actually just display as "\r\n"....
Hope any of this is helpful.
Does anyone have any suggestions as to how I can clean the body of incoming emails? I want to strip out disclaimers, images and maybe any previous email text that may be also be present so that I am left with just the body text content. My guess is it isn't going to be possible in any reliable way, but has anyone tried it? Are there any libraries geared towards this sort of thing?
In email, there is couple of agreed markings that mean something you wish to strip. You can look for these lines using regular expressions. I doubt you can't really well "sanitize" your emails, but some things you can look for:
Line starting with "> " (greater than then whitespace) marks a quote
Line with "-- " (two hyphens then whitespace then linefeed) marks the beginning of a signature, see Signature block on Wikipedia
Multipart messages, boundaries start with --, beyond that you need to do some searching to separate the message body parts from unwanted parts (like base64 images)
As for an actual C# implementation, I leave that for you or other SOers.
A few obvious things to look at:
if the mail is anything but pure plain text, the message will be multi-part mime. Any part whose type is "image/*" (image/jpeg, etc), can probably be dropped. In all likelyhood any part whose type is not "text/*" can go.
A HTML message will probably have a part of type "multipart/alternative" (I think), and will have 2 parts, one "text/plain" and one "text/html". The two parts should be just about equivalent, so you can drop the HTML part. If the only part present is the HTML bit, you may have to do a HTML to plain text conversion.
The usual format for quoted text is to precede the text by a ">" character. You should be able to drop these lines, unless the line starts ">From", in which case the ">" has been inserted to prevent the mail reader from thinking that the "From " is the start of a new mail.
The signature should start with "-- \r\n", though there is a very good chance that the trailing space will be missing.
Version 3 of OSBF-Lua has a mail-parsing library that will handle the MIME and split a message into its MIME parts and so on. I currently have a mess of Lua scripts that do
stuff like ignore most non-text attachments, prefer plain text to HTML, and so on. (I also wrap long lines to 80 characters while trying to preserve quoting.)
As far as removing previously quoted mail, the suggestions above are all good (you must subscribe to some ill-mannered mailing lists).
Removing disclaimers reliably is probably going to be hard. My first cut would be simply to maintain a library of disclaimers that would be stripped off the end of each mail message; I would write a script to make it easy for me to add to the library. For something more sophisticated I would try some kind of machine learning.
I've been working on spam filtering since Feb 2007 and I've learned that anything to do with email is a mess. A good rule of thumb is that whatever you want to do is a lot harder than you think it is :-(
Given your question "Is it possible to programmatically ‘clean’ emails?", I'd answer "No, not reliably".
The danger you face isn't really a technological one, but a sociological one.
It's easy enough to spot, and filter out, some aspects of the messages - like images. Filtering out signatures and disclaimers is, likewise, possible to achieve (though more of a challenge).
The real problem is the cost of getting it wrong.
What happens if your filter happens to remove a critical piece of the message? Can you trace it back to find the missing piece, or is your filtering desctructive? Worse, would you even notice that the piece was missing?
There's a classic comedy sketch I saw years ago that illustrates the point. Two guys working together on a car. One is underneath doing the work, the other sitting nearby reading instructions from a service manual - it's clear that neither guy knows what he's doing, but they're doing their best.
Manual guy, reading aloud: "Undo the bold in the centre of the oil pan ..." [turns page]
Tool guy: "Ok, it's out."
Manual guy: "... under no circumstances."
If you creating your own application i'd look into Regex, to find text and replace it. To make the application a little nice, i'd create a class Called Email and in that class i have a property called RAW and a property called Stripped.
Just some hints, you'll gather the rest when you look into regex!
SigParser has an assembly you can use in .NET. It gives you the body back in both HTML and text forms with the rest of the stuff stripped out. If you give it an HTML email it will convert the email to text if you need that.
var parser = new SigParser.EmailParsing.EmailParser();
var result = await parser.GetCleanedBodyAsync(new SigParser.EmailParsing.Models.CleanedBodyInput {
FromEmailAddress = "john.smith#example.com",
FromName = "John Smith",
TextBody = #"Hi Mark,
This is my message.
Thanks
John Smith
888-333-4434"
});
// This would print "Hi Mark,\r\nThis is my message."
Console.WriteLine(result.CleanedBodyPlain);