Weird characters in email

Weird characters in email - c#

I have written a mail-processing program, which basically slaps a template on incoming mail and forwards it on. Incoming mail goes to a Gmail account, which I download using POP, then I read the mail (both html and plain text multipart-MIME), make whatever changes I need to the template, then create a new mail with the appropriate plain+html text and send it on to another address.
Trouble is, when the mail gets to the other side, some of the mails have been mangled, with weird characters like Ã and Â magically getting inserted. They weren't in the original mails, they're not in my template, and I can't find any sort of predictable pattern as to when these characters appear. I'm sure it's got something to do with the encoding properties of the mails, but I am making sure to set both the charset and the transfer encoding of the outgoing mail to be the same as the incoming mail. So what else do I need to do?
EDIT: Here's a snipped sample of an incoming mail:
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
=0A=0ASafari Special:=0A=0A=A0=0A=0ASafari in Thornybush Priv=
ate Game Reserve 9-12=0AJanuary 2012 (3nights)
After processing, this comes out as:
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
=0D=0A=0D=0ASafari Special:=0D=0A=0D=0A=C2=A0=0D=0A=0D=0A=
Safari in Thornybush Private Game Reserve 9-12=0D=0AJanuary=
2012 (3nights)
Notice the insertion of the =0D and =C2 characters (aside from a few =0A's that weren't in the original).
So what does you think is happening here?
ANOTHER CLUE: Here's my code that creates the alternate view:
var htmlView = AlternateView.CreateAlternateViewFromString(htmlBody, null, "text/html");
htmlView.ContentType.CharSet = charSet;
htmlView.TransferEncoding = transferEncoding;
m.AlternateViews.Add(htmlView);
Along the lines of what #mjwills suggested, perhaps the CreateAlternativeViewFromString() method already assumes UTF-8, and changing it later to iso-8859-1 doesn't make a difference?

So every =0A is becoming =0D=0A.
And every =A0 is becoming =C2=A0.
The former looks like it might be related to Carriage Return / Line Feeds.
The latter looks like it might be related to What is "=C2=A0" in MIME encoded, quoted-printable text?.
My guess is that even though you have specified the charset, something alone the line is treating it as UTF8.
You may want to try using this form of CreateAlternateViewFromString, where the ContentType.CharSet is set appropriately.

Related

SendGrid inbound parse nordic chars

Completely stuck on a problem related to the inbound parse webhook functionality offered by SendGrid: https://sendgrid.com/docs/for-developers/parsing-email/setting-up-the-inbound-parse-webhook/
First off everything is working just fine with retrieving the mail sent to my application endpoint. Using Request.Form I'm able to retrieve the data and work with it.
The problem is that we started noticing question mark symbols instead of letters when recieving some mails (written in swedish using Å Ä and Ö). This occured both when sending plaintext mails, and mails with an HTML-body.
However, this only happens every now and then. After a lot of searching I found out that if the mail is sent from e.g. Postbox or Outlook (or the like), and the application has the charset set to iso-8859-1 that's when Å Ä Ö is replaced by question marks.
To replicate the error and be able to debug it I set up a HTML page with a form using the iso-8859-1 encoding, sending a similar payload as the one seen in the link above (the default one). And after that been through testing a multitude of things trying to get it to work.
As of now I'm trying to recode the input, without success. Code I'm testing:
Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = wind1252.GetBytes(Request.Form.["html"]);
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8,wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);
This only results in the utf8String producing the same result with "???" where Å Ä Ö should be. My guess here is that perhaps it's due to the Request.Form["html"] returning a UTF-16 string, of the content that is encoded already in the wrong encoding iso-8859-1.
The method for fetching the POST is as follows
public async Task<InboundParseModel> FetchMail(IFormCollection form)
{
InboundParseModel _em = new InboundParseModel
{
To = form["to"].SingleOrDefault(),
From = form["from"].SingleOrDefault(),
Subject = form["subject"].SingleOrDefault(),
Html = form["html"].SingleOrDefault(),
Text = System.Net.WebUtility.HtmlEncode(form["text"].SingleOrDefault()),
Envelope = form["envelope"].SingleOrDefault()
};
}
Called from another method that the POST is done to by FetchMail(Request.Form);
Project info: ASP.NET Core 2.2, C#
So as stated earlier, I am completely stuck and don't really have any ideas on how to solve this. Any help would be much appreciated!

Avoid "3D" near = on Exchange 2010

I have an question releated to encoding on Microsoft Exchange servers. I have built an app that is processing messages on Exchange and one of options is to force the encoding always to "US-ASCII".
As long as the mails goes directly through Exchange protocols, there is no problem. I have noticed the issue releated to messages sent by third-party mail clients (e.g. Thunderbird) over SMTP protocol.
Although the charset is visible in source code as US-ASCII I can find "3D" near = character, therefore the source code is corrupted and some parts of message are not displaying correctly (e.g. images).
To resolve this problem I have tried to force 7-bit content transfer encoding, but the is issue still persisting.
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
<html><head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
</head>
<body bgcolor=3D"#FFFFFF" text=3D"#000000">
dsadsadsadsdsdsadasdsadasdsad<b>dsa</b>
<p style=3D"FONT-FAMILY: Arial" id=3D"c1-id-6">Some signature with image.=
</p><p style=3D"FONT-FAMILY: Arial" id=3D"c1-id-7"><img alt=3D"" src=3D"cid=
:img1.jpg" id=3D"c1-id-8"></p><p style=3D"FONT-FAMILY: Arial" id=3D"c1-id-9=
"> </p></body>
</html>
As long as the message is processed by my app, the "3D" does not appear, even after changing the charset.

Your choice of content transfer encoding is causing this: Content-Transfer-Encoding: quoted-printable
Quoted printable uses the equals sign as an escape character, so the mail sever has dutifully escaped all the 'raw' equals signs for you.
Quoted-Printable, or QP encoding, is an encoding using printable ASCII
characters (alphanumeric and the equals sign "=") to transmit 8-bit
data over a 7-bit data path or, generally, over a medium which is not
8-bit clean.[1] It is defined as a MIME content transfer encoding for
use in e-mail.
QP works by using the equals sign "=" as an escape character.
If you wanted to properly process this, look for all '=' characters in your content (not headers), read the next two characters, and then replace the '=XX' triple with the ascii value of the hex you read. "=3D" replaces to "=" with the above scheme.
For more information on Content-Transfer-Encoding refer to section 5 of RFC 1341, and RFC 1521 at least; consider reading the RFCs that obsolete the above RFCs.

POP3 receive email encoding C#

I use POP3 to receive email. But encoding error like, email's Subject "主题" turns to "涓婚". Chinese errors, strong text when the content of the text's language is English, no errors. Who can tell me, what should I do for it? The code below:
POP3 pop = new POP3();
pop.Connect("userName", "password", "pop.126.com", 110);//smtp.126.com
pop.DownloadMessages();
for (int i = 1; i < pop.Messages.Count; i++)
{
Email email = new Email();
Message msg = pop.Messages[i];
email.From = msg.From;
email.FromName = msg.FromName;
email.Body = msg.HTMLBody;
email.Title = msg.Subject;
}

I'm not sure what POP3 library you are using, but it is clearly broken and there's nothing you can do to "fix" your code to make it work beyond switching to another POP3 library, such as my MailKit library which is the only library that correctly handles charsets in all cases (most will handle Latin1 ok, but completely fail for CJK charsets).
The reason that most clients break for anything outside of Latin1 (ISO-8859-1) is that most email libraries have parsers that only work on strings. In order to convert the message data from bytes into a string, they need to pick a System.Text.Encoding (and most pick ISO-8859-1). They assume that email messages follow the rules outlined in the various RFCs that restrict email headers to US-ASCII, but it is very common for clients to ignore these rules.
Unlike those other parsers, MailKit's email parser parses byte streams and so does not require charset conversion before it can start parsing a message. This allows the parser to properly handle mixed charsets in the headers and body.

Parsing MIME mail type

After lots of efforts I created my own mail parser. Now successfully able to parse and display emails. But few mails especially sent from apple or Iphone appear like this after parsing. I have no idea why this is happening. Please help.
=D8=AA=D9=88=D8= =A7=D8=AC=D9=87=D9=86=D9=8A =D9=85=D8=B4=D9=83=D9=84=D8=A9 =D8=A5=D8=B4=D8= =A7=D8=B1=D8=A9 =D9=84=D9=84=D9=83=D8=B1=D8=AA =D8=B1=D9=82=D9=85 410814189= 68 =D8=B9=D9=84=D9=85=D8=A7=D9=8B =D8=A8=D8=A3=D9=86 =D8=A5=D8=B4=D8=

It would appear that you mail parser does not handle decoding of Quoted Printable content.
I imagine that if you looked at the headers, you'd find a header like this:
Content-Transfer-Encoding: quoted-printable
I've written several email clients and multiple mime parsers and am currently working on writing a new mime parser in C# (the others were in C) called MimeKit here: http://github.com/jstedfast/MimeKit. This may be of interest to you...
I've got a filterable stream class that you can add a QuotedPrintableDecoder to (which I've also implemented), then pass your data through that to decode it. Or you could just pass it through the QuotedPrintableDecoder directly, depending on whatever is easiest for you.
Example usage:
var decoder = new QuotedPrintableDecoder ();
var output = new byte[decoder.EstimateOutputLength (input.Length)];
var outputLength = decoder.Decode (input, 0, input.Length, output);
// convert the output into a string displayable to the user...
var text = System.Text.Encoding.UTF8.GetString (output, 0, outputLength);
Obviously you'd use the proper System.Text.Encoding for the content (by looking at the "charset" parameter in the Content-Type header) instead of blindly using System.Text.Encoding.UTF8.

Generate vCard that can be downloaded on Android using ASP.NET

I have been trying for quite some time now to generate a vCard using ASP.NET (C#) that can be downloaded onto an Android device.
The process of generating the card is quite simple and so I'm not too worried about it. It's the download itself that I can't get to work.
My code for attaching the vCard to the page response looks like this:
public void downloadCard()
{
//generate the vCard text
string vCard = generateCard();
//create the filename the user will download the file as
string filename = HttpUtility.UrlEncode(username + ".vcf", System.Text.Encoding.UTF8);
//get a reference to the response
HttpResponse response = HttpContext.Current.Response;
//clear the response and write our own one.
response.Clear();
response.ContentType = "text/x-vcard";
response.AddHeader("Content-Disposition", "attachment; filename=" + filename + ";");
response.Write(vCard);
response.End();
}
I won't bother showing the generation process as it's not really important though the only parameter the page takes is for a username which is received through a RESFUL URL thanks to some URL rewriting in the web.config file. So the URL example.com/vcard/apbarratt produces the vCard for the user, apbarratt.
The response that a GET request produces for this code looks like this:
200 OK
Date: Wed, 15 Aug 2012 13:49:56 GMT
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Content-Disposition: attachment; filename=apbarratt.vcf;
Content-Length: 199
Server: Microsoft-IIS/7.5
Content-Type: text/x-vcard; charset=utf-8
Cache-Control: private
BEGIN:VCARD
VERSION:2.1
N;LANGUAGE=en-us:Andy Barratt
FN:Andy Barratt
TEL;CELL;VOICE:07000000000
URL;WORK:http://example.com
EMAIL;INTERNET:apbarratt#example.com
END:VCARD
This works perfectly in every single browser that I have tested it in (not iOS, that's another issue that has been solved in another way), except for Android stock browsers. In these browsers, the download fails, either with the filename "unknown" and the term "failed" or on other devices with the username "apbarratt.vcf" and the term "In progress" which doesn't ever seem to change.
The issue is not a problem in other browsers such as opera mobile/mini.
I've tried every possible thing I can think of, reading so many blogs on similar issues that I'm having dreams about the whole thing... they're really dull...
Anyway, hopefully some fresh eyes will be able to help me. Perhaps someone has done this before and could share some code, looking forward to some help.
Andy

I had exactly the same problem: all but the stock Droid Safari Browsers seemed to work. My solution was to read the file as text, and then convert it to ASCII bytes. Once I changed my code, Droids (2.3 and 3.2) seemed to be happy.
Here is a code snippet (from my MVC-based project):
public ActionResult GetContact()
{
Response.Clear();
Response.AddHeader("Content-disposition", string.Format("attachment; filename=\"{0}\";", "MyContact.vcf"));
// VERY IMPORTANT!!!
// Read the file as text, and then convert it to ASCII bytes.
// If ReadAllBytes is used, extra stray characters seem to appear and DROID fails.
// Put the content type in the second parameter!!!
var vCardFile = System.IO.File.ReadAllText(Server.MapPath("~/Contacts/MyContact.vcf"));
return File(System.Text.Encoding.ASCII.GetBytes(vCardFile), "text/x-vcard");
}
Hope this helps...
Cheers.

Don't know if you've solved it, but I encountered the same problem and one of the stumbling blocks was, that the N field seems to be expected to have 5 values, so you should insert an extra semi-colons to the end (in your example 4 of them), or thus:
N;LANGUAGE=en-us:Barratt;Andy;;;
Another thing is, that it's better to set content type to text/vcard, that's the standard now.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Weird characters in email - c#

Related

SendGrid inbound parse nordic chars

Avoid "3D" near = on Exchange 2010

POP3 receive email encoding C#

Parsing MIME mail type

Generate vCard that can be downloaded on Android using ASP.NET

Categories

Resources