How to disable base64-encoded filenames in HttpClient/MultipartFormDataContent - c#

I'm using HttpClient to POST MultipartFormDataContent to a Java web application. I'm uploading several StringContents and one file which I add as a StreamContent using MultipartFormDataContent.Add(HttpContent content, String name, String fileName) using the method HttpClient.PostAsync(String, HttpContent).
This works fine, except when I provide a fileName that contains german umlauts (I haven't tested other non-ASCII characters yet). In this case, fileName is being base64-encoded. The result for a file named 99 2 LD 353 Temp Äüöß-1.txt
looks like this:
__utf-8_B_VGVtcCDvv73vv73vv73vv71cOTkgMiBMRCAzNTMgVGVtcCDvv73vv73vv73vv70tMS50eHQ___
The Java server shows this encoded file name in its UI, which confuses the users. I cannot make any server-side changes.
How do I disable this behavior? Any help would be highly appreciated.
Thanks in advance!

I just found the same limitation as StrezzOr, as the server that I was consuming didn't respect the filename* standard.
I converted the filename to a byte array of the UTF-8 representation, and the re-armed the bytes as chars of "simple" string (non UTF-8).
This code creates a content stream and add it to a multipart content:
FileStream fs = File.OpenRead(_fullPath);
StreamContent streamContent = new StreamContent(fs);
streamContent.Headers.Add("Content-Type", "application/octet-stream");
String headerValue = "form-data; name=\"Filedata\"; filename=\"" + _Filename + "\"";
byte[] bytes = Encoding.UTF8.GetBytes(headerValue);
headerValue="";
foreach (byte b in bytes)
{
headerValue += (Char)b;
}
streamContent.Headers.Add("Content-Disposition", headerValue);
multipart.Add(streamContent, "Filedata", _Filename);
This is working with spanish accents.
Hope this helps.

I recently found this issue and I use a workaround here:
At server side:
private static readonly Regex _regexEncodedFileName = new Regex(#"^=\?utf-8\?B\?([a-zA-Z0-9/+]+={0,2})\?=$");
private static string TryToGetOriginalFileName(string fileNameInput) {
Match match = _regexEncodedFileName.Match(fileNameInput);
if (match.Success && match.Groups.Count > 1) {
string base64 = match.Groups[1].Value;
try {
byte[] data = Convert.FromBase64String(base64);
return Encoding.UTF8.GetString(data);
}
catch (Exception) {
//ignored
return fileNameInput;
}
}
return fileNameInput;
}
And then use this function like this:
string correctedFileName = TryToGetOriginalFileName(fileRequest.FileName);
It works.

In order to pass non-ascii characters in the Content-Disposition header filename attribute it is necessary to use the filename* attribute instead of the regular filename. See spec here.
To do this with HttpClient you can do the following,
var streamcontent = new StreamContent(stream);
streamcontent.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment") {
FileNameStar = "99 2 LD 353 Temp Äüöß-1.txt"
};
multipartContent.Add(streamcontent);
The header will then end up looking like this,
Content-Disposition: attachment; filename*=utf-8''99%202%20LD%20353%20Temp%20%C3%84%C3%BC%C3%B6%C3%9F-1.txt

I finally gave up and solved the task using HttpWebRequest instead of HttpClient. I had to build headers and content manually, but this allowed me to ignore the standards for sending non-ASCII filenames. I ended up cramming unencoded UTF-8 filenames into the filename header, which was the only way the server would accept my request.

Related

Reading email body with encoding ISO-8859-1

I'm using Mailkit for reading some email's body content by using IMAP.
Some of these emails come with content-type text/plain and charset ISO-8859-1 which causes that my code replaces some Latin characters á é í ó ú and apparently also CR and LF by weird chars such as =E1 =FA =F3 =...
var body = message.BodyParts.OfType<BodyPart>().FirstOrDefault(x => x.ContentType.IsMimeType("text", "plain"));
var bodyText = (TextPart)folder.GetBodyPart(message.UniqueId, body);
var bodyContent = bodyText.Text;
There is no problem when opening these emails with email clients such as Thunderbird or Outlook. They are showing these chars as they are. I want to be able to retrieve these Latin chars.
I've tried with some encoding options with no success.
var bodyContent = bodyText.GetText(System.Text.Encoding.ASCII);
var bodyContent = bodyText.GetText(System.Text.Encoding.UTF-8);
Normally you don't need to decode quoted-printable encoded content yourself, but my guess is that the client that sent this message encoded the content using the quoted-printable encoding but did not set the Content-Transfer-Encoding header properly.
I would probably change your code to something more like this:
// figure out which body part we need
var body = message.BodyParts.OfType<BodyPartText>().FirstOrDefault(x => x.ContentType.IsMimeType("text", "plain"));
// download the body part we need
var bodyText = (TextPart)folder.GetBodyPart(message.UniqueId, body);
// If it's encoded using quoted-printable we'll need to decode it first.
// To do so, we'll need the charset.
//
// The reason I would get it from the `bodyText.ContentType` is because
// this will work even if you used MessageSummaryItems.Body instead of
// MessageSummaryItems.BodyStructure.
var charset = bodyText.ContentType.Charset;
// Decodes the content by using QuotedPrintableDecoder from MimeKit library.
var bodyContent = DecodeQuotedPrintable(bodyText.Content, charset);
// The main changes I'm making to this function compared to what you have is
// using the stream/filter interfaces rather than using the low-level decoder
// directly. You can do it either way, but if you continue using your
// method - I would recommend using Encoding.UTF8.GetBytes() rather than
// Encoding.ASCII.GetBytes() because UTF-8 can handle all strings while
// ASCII cannot.
static string DecodeQuotedPrintable (IMimeContent content, string charset)
{
using (var output = new MemoryStream ()) {
using (filtered = new FilteredStream (output)) {
// add a quoted-printable decoder
filtered.Add (DecoderFilter.Create (ContentEncoding.QuotedPrintable));
// pump the content through the decoder
content.DecodeTo (filtered);
// flush the filtered stream
filtered.Flush ();
}
var encoding = Encoding.GetEncoding (charset);
return encoding.GetString (output.GetBuffer (), 0, (int) output.Length);
}
}
The message body is encoded using quoted printable.
You have to decode it first.
In MailKit it should be the DecodeTo method
I could finally get it working by using QuotedPrintableDecoder from MimeKit library.
var body = message.BodyParts.OfType<BodyPart>().FirstOrDefault(x => x.ContentType.IsMimeType("text", "plain"));
// If it's encoded using quoted-printable we'll need to decode it first. To do so, we'll need the charset.
var charset = body.ContentType.Charset;
var bodyText = (TextPart)folder.GetBodyPart(message.UniqueId, body);
// Decodes the content by using QuotedPrintableDecoder from MimeKit library.
var bodyContent = DecodeQuotedPrintable(bodyText.Text, charset);
static string DecodeQuotedPrintable (string input, string charset)
{
var decoder = new QuotedPrintableDecoder ();
var buffer = Encoding.ASCII.GetBytes (input);
var output = new byte[decoder.EstimateOutputLength (buffer.Length)];
int used = decoder.Decode (buffer, 0, buffer.Length, output);
var encoding = Encoding.GetEncoding (charset);
return encoding.GetString (output, 0, used);
}

Flurl AddFile fileName Encoding

I try to use flurl to send a file like this:
public ImportResponse Import(ImportRequest request, string fileName, Stream stream)
{
request).PostAsync(content).Result<ImportTariffResponse>();
return FlurlClient(Routes.Import, request).PostMultipartAsync(mp => mp.AddJson("json", request).AddFile("file", stream, ConvertToAcsii(fileName))).Result<ImportResponse>();
}
fileName = "Файл импорта тарифов (1).xlsx"
But in post method I get this:
Request.Files.FirstOrDefault().FileName =
"=?utf-8?B?0KTQsNC50Lsg0LjQvNC/0L7RgNGC0LAg0YLQsNGA0LjRhNC+0LIgKDEpLnhsc3g=?="
Any suggestions?
The filename appears to be encoded using MIME encoded-word syntax. (Flurl doesn't do this directly, it presumably happens deeper down in the HttpClient libraries when non-ASCII characters are detected.) .NET doesn't directly support decoding this format, but you can do it yourself fairly easily. If you strip the =?utf-8?B? from the beginning and ?= from the end, what you're left with is your filename base64 encoded.
Here's one way you could do it:
var base64 = Request.Files.FirstOrDefault().FileName.Split('?')[3];
var bytes = Convert.FromBase64String(base64);
var filename = Encoding.UTF8.GetString(bytes);

Issue when download html files from asp.net mvc project

When I tried to download arabic files from mvc project, I found the arabic data is changed to special characters like this تاريخ الشكوى
That's my code that I use in download:
System.Web.Mvc.FileStreamResult FSR = new FileStreamResult(stream, "application/msword");
FSR.FileDownloadName = CorrespondenceselectedFile.FileName;
return FSR;
It seems like the text "تاريخ الشكوى" ("Date of the complaint") is decoded with the default encoding instead of UTF-8.
You should probably correct the encoding somewhere in your code (not part of the shown code) or do it manually (not preferred):
string ascii = "تاريخ الشكوى";
var bytes = Encoding.Default.GetBytes(ascii);
string utf8 = Encoding.UTF8.GetString(bytes);
// utf = تاريخ الشكوى

Using Character Encoding with streamreader

My program connects to an ftp server and list all the file that are in the specific folder c:\ClientFiles... The issue I'm having is that the files name have some funny character like – i.e. Billing–File.csv, but code removes replace these characters with a dash "-". When I try downloading the files its not found.
I've tried all the encoding types that are in the class Encoding but not is able to accommodate these character.
Please see my code listing the files.
UriBuilder ub;
if (rootnode.Path != String.Empty) ub = new UriBuilder("ftp", rootnode.Server, rootnode.Port, rootnode.Path);
else ub = new UriBuilder("ftp", rootnode.Server, rootnode.Port);
String uristring = ub.Uri.OriginalString;
req = (FtpWebRequest)FtpWebRequest.Create(ub.Uri);
req.Credentials = ftpcred;
req.UsePassive = pasv;
req.Method = WebRequestMethods.Ftp.ListDirectoryDetails;
try
{
rsp = (FtpWebResponse)req.GetResponse();
StreamReader rsprdr = new StreamReader(rsp.GetResponseStream(), Encoding.UTF8); //this is where the problem is.
Your help or advise will be highly appreciated
Not every encoding has a class in the encoding namespace. You can get a list of all encodings know in your system by using:
Encoding.GetEncodings()
(MSDN info for GetEncodings).
If you know what the name of the file should be, you can iterate through the list and see what encodings result in the correct filename.
Try:
StreamReader rsprdr = new StreamReader(rsp.GetResponseStream(), Encoding.GetEncodings(1251)) ;
You may also try "iso-8859-1" instead of 1251

Converting a string encoded in utf8 to unicode in C#

I've got this string returned via HTTP Post from a URL in a C# application, that contains some chinese character eg:
Gelatos® Colors Gift Set中文
Problem is I want to convert it to
Gelatos® Colors Gift Set中文
Both string are actually identical but encoded differently. I understand in C# everything is UTF16. I've tried reading alof of postings here regarding converting from one encoding to the other but no luck.
Hope someone could help.
Here's the C# code:
WebClient wc = new WebClient();
json = wc.DownloadString("http://mysite.com/ext/export.asp");
textBox2.Text = "Receiving orders....";
//convert the string to UTF16
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;
Encoding utf8 = Encoding.UTF8;
byte[] asciiBytes = ascii.GetBytes(json);
byte[] utf8Bytes = utf8.GetBytes(json);
byte[] unicodeBytes = Encoding.Convert(utf8, unicode, utf8Bytes);
string sOut = unicode.GetString(unicodeBytes);
System.Windows.Forms.MessageBox.Show(sOut); //doesn't work...
Here's the code from the server:
<%#CodePage = 65001%>
<%option explicit%>
<%
Session.CodePage = 65001
Response.charset ="utf-8"
Session.LCID = 1033 'en-US
.....
response.write (strJSON)
%>
The output from the web is correct. But I was just wondering if some changes is done on the http stream to the C# application.
thanks.
Download the web pages as bytes in the first place. Then, convert the bytes to the correct encoding.
By first converting it using a wrong encoding you are probably losing data. Especially using ASCII.
If the server is really returning UTF-8 text, you can configure your WebClient by setting its Encoding property. This would eliminate any need for subsequent conversions.
using (WebClient wc = new WebClient())
{
wc.Encoding = Encoding.UTF8;
json = wc.DownloadString("http://mysite.com/ext/export.asp");
}

Categories