How do I get ZipArchive.CreateEntry name encoding right?

How do I get ZipArchive.CreateEntry name encoding right? - c#

I use the following code to create a zip archive with C#.
using (var zipArchive = new ZipArchive(compressedFileStream, ZipArchiveMode.Create, false))
{
var zipEntry = zipArchive.CreateEntry(name + ".pdf");
...
}
The name often consist of Swedish characters such as ÅÄÖ åäö. When I open the zip and look at the names the Swedish chars are garbled like this "Fl+Âdesm+ñtare.pdf".
I tried fixing the name encoding with this code. But it didn't work.
var iso = Encoding.GetEncoding("ISO-8859-1");
var utf8 = Encoding.UTF8;
var utfBytes = utf8.GetBytes(name);
var isoBytes = Encoding.Convert(utf8, iso, utfBytes);
var isoName = iso.GetString(isoBytes);
Any ideas?

Since DotNetZip is a dead project and this article is relevant to google searches, here is an alternative solution with the IO.Compression library :
Archive = New IO.Compression.ZipArchive(Stream, ZipArchiveMode, LeaveOpen, Text.Encoding.GetEncoding(Globalization.CultureInfo.CurrentCulture.TextInfo.OEMCodePage))
This might not cover all conversions, from what I gathered from the sources on the subject, the underlying code uses the local machine's (server) regional culture's encoding page for entry names. Mapping them with that encoding explicitly has fixed the issue for my client-domain, no guarantees that it's a silver bullet however.

You can try out DotNetZip library (get it via NuGet). Here is a code sample, where i use cp866 encoding:
private string GenerateZipFile(string filename, BetPool betPool)
{
using (var zip = new ZipFile(Encoding.GetEncoding("cp866")))
{
//zip.Password = AppConfigHelper.Key + DateTime.Now.Date.ToString("ddMMyy");
zip.AlternateEncoding = Encoding.GetEncoding("cp866");
zip.AlternateEncodingUsage = ZipOption.AsNecessary;
zip.AddFile(filename, "");
var zipFilename = FormZipFileName(betPool);
zip.Save(zipFilename);
return zipFilename;
}
}

using (var zip = new ZipArchive(ZipFilePath, ZipArchiveMode.Read, false, Encoding.GetEncoding("cp866")))

Related

Streamed File Contains Strange Characters - An Encoding Issue

I have a WCF service end-point which generates an excel file and returns this file as a MemoryStream in the end in order to make client download the relevant file.
The file generated on the respective directory has no issues. I don't see any strange characters when I open and check it.
But, the file I returned with MemoryStream is full of strange unreadable characters.
My end-point is like that,
public Stream GetEngagementFeedFinalizeData(int workspaceId, string startDate, string endDate, Stream data)
{
try
{
string contentType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;";
string extension = "xls";
string fileName = "report-" + DateTime.Now.Ticks.ToString();
string contentDisposition = string.Format(CultureInfo.InvariantCulture, "attachment; filename={0}.{1}", fileName, extension);
WebOperationContext.Current.OutgoingResponse.ContentType = contentType;
WebOperationContext.Current.OutgoingResponse.Headers.Set("Content-Disposition", contentDisposition);
//Here is some business logic and fetching data from db. Not any encoding
//related issue. The data set is assigned to a variable
//named "feedFinalizeDataTable" in the end
feedFinalizeDataTable.TableName = "Summary";
DataSet dataSet = new DataSet();
dataSet.Tables.Add(feedFinalizeDataTable);
using (ExcelPackage excelPackage = new ExcelPackage())
{
foreach (DataTable dt in dataSet.Tables)
{
ExcelWorksheet sheet = excelPackage.Workbook.Worksheets.Add(dt.TableName);
sheet.Cells["A1"].LoadFromDataTable(dt, true);
}
var path = System.IO.Path.Combine(System.AppDomain.CurrentDomain.BaseDirectory);
var filePath = path + "\\" + "New.xls";
excelPackage.SaveAs(new System.IO.FileInfo(filePath)); //This file is flawless
FileStream fs = new FileStream(filePath, FileMode.Open);
int length = (int)fs.Length;
WebOperationContext.Current.OutgoingResponse.ContentLength = length;
byte[] buffer = new byte[length];
int sum = 0;
int count;
while ((count = fs.Read(buffer, sum, length - sum)) > 0)
{
sum += count;
}
fs.Close();
return new MemoryStream(buffer); //This file is full of unreadable chars as per above shared screenshot
}
I'm using OfficeOpenXml to generate excel files.
Then, I checked both files encoding by open them with notepad. I saw that the file on the directory (the flawless one) has ANSI encoding. And, the one which is returned by the end-point (the broken one) has UTF-8 encoding.
After that, I try to change the encoding type of the stream like this,
var byteArray = System.IO.File.ReadAllBytes(filePath);
string fileStr = new StreamReader(new MemoryStream(byteArray), true).ReadToEnd();
var encd = Encoding.GetEncoding(1252); //On the other topics I saw that ANSI represented with 1252
var end = encd.GetBytes(fileStr);
return new MemoryStream(end);
But, this doesn't help too. Though some of the strange characters are replaced with some other strange characters, but as I said, streamed file is still unreadable. And, when I open it with notepad to see its encoding, I saw that its still UTF-8.
Thus, I'm kind of stuck. I have also try directly to stream the generated excel file (without writing it to a directory and then reading it) with OfficeOpenXml's built in function called .GetAsByteArray(), but the downloaded file looks exactly the same as per above screenshot.
Thanks in advance.

Encoding issue with spanish file in C#

I have a file store online in an azure blob storage in spanish. Some word have special charactere (for example : Almacén)
When I open the file in notepad++, the encoding is ANSI.
So now I try to read the file with the code :
using StreamReader reader = new StreamReader(Stream, Encoding.UTF8);
blobStream.Seek(0, SeekOrigin.Begin);
var allLines = await reader.ReadToEndAsync();
the issue is that "allLines" are not proper encoding, I have some issue like : Almac�n
I have try some solution like this one :
C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H
but still not working
(the final goal is to "merge" two csv so I read the stream of both, remove the header and concatenate the string to push it again. If there is a better solution to merge csv in c# that can skip this encoding issue I am open to it also)

You are trying to read a non-UTF8 encoded file as if it was UTF8 encoded. I can replicate this issue with
var s = "Almacén";
using var memStream = new MemoryStream(Encoding.GetEncoding(28591).GetBytes(s));
using var reader = new StreamReader(memStream, Encoding.UTF8);
var allLines = await reader.ReadToEndAsync();
Console.WriteLine(allLines); // writes "Almac�n" to console
You should be attempting to read the file with encoding iso-8859-1 "Western European (ISO)" which is codepage 28591.
using var reader = new StreamReader(Stream, Encoding.GetEncoding(28591));
var allLines = await reader.ReadToEndAsync();

Using Character Encoding with streamreader

My program connects to an ftp server and list all the file that are in the specific folder c:\ClientFiles... The issue I'm having is that the files name have some funny character like â€“ i.e. Billingâ€“File.csv, but code removes replace these characters with a dash "-". When I try downloading the files its not found.
I've tried all the encoding types that are in the class Encoding but not is able to accommodate these character.
Please see my code listing the files.
UriBuilder ub;
if (rootnode.Path != String.Empty) ub = new UriBuilder("ftp", rootnode.Server, rootnode.Port, rootnode.Path);
else ub = new UriBuilder("ftp", rootnode.Server, rootnode.Port);
String uristring = ub.Uri.OriginalString;
req = (FtpWebRequest)FtpWebRequest.Create(ub.Uri);
req.Credentials = ftpcred;
req.UsePassive = pasv;
req.Method = WebRequestMethods.Ftp.ListDirectoryDetails;
try
{
rsp = (FtpWebResponse)req.GetResponse();
StreamReader rsprdr = new StreamReader(rsp.GetResponseStream(), Encoding.UTF8); //this is where the problem is.
Your help or advise will be highly appreciated

Not every encoding has a class in the encoding namespace. You can get a list of all encodings know in your system by using:
Encoding.GetEncodings()
(MSDN info for GetEncodings).
If you know what the name of the file should be, you can iterate through the list and see what encodings result in the correct filename.

Try:
StreamReader rsprdr = new StreamReader(rsp.GetResponseStream(), Encoding.GetEncodings(1251)) ;
You may also try "iso-8859-1" instead of 1251

How to disable base64-encoded filenames in HttpClient/MultipartFormDataContent

I'm using HttpClient to POST MultipartFormDataContent to a Java web application. I'm uploading several StringContents and one file which I add as a StreamContent using MultipartFormDataContent.Add(HttpContent content, String name, String fileName) using the method HttpClient.PostAsync(String, HttpContent).
This works fine, except when I provide a fileName that contains german umlauts (I haven't tested other non-ASCII characters yet). In this case, fileName is being base64-encoded. The result for a file named 99 2 LD 353 Temp Äüöß-1.txt
looks like this:
__utf-8_B_VGVtcCDvv73vv73vv73vv71cOTkgMiBMRCAzNTMgVGVtcCDvv73vv73vv73vv70tMS50eHQ___
The Java server shows this encoded file name in its UI, which confuses the users. I cannot make any server-side changes.
How do I disable this behavior? Any help would be highly appreciated.
Thanks in advance!

I just found the same limitation as StrezzOr, as the server that I was consuming didn't respect the filename* standard.
I converted the filename to a byte array of the UTF-8 representation, and the re-armed the bytes as chars of "simple" string (non UTF-8).
This code creates a content stream and add it to a multipart content:
FileStream fs = File.OpenRead(_fullPath);
StreamContent streamContent = new StreamContent(fs);
streamContent.Headers.Add("Content-Type", "application/octet-stream");
String headerValue = "form-data; name=\"Filedata\"; filename=\"" + _Filename + "\"";
byte[] bytes = Encoding.UTF8.GetBytes(headerValue);
headerValue="";
foreach (byte b in bytes)
{
headerValue += (Char)b;
}
streamContent.Headers.Add("Content-Disposition", headerValue);
multipart.Add(streamContent, "Filedata", _Filename);
This is working with spanish accents.
Hope this helps.

I recently found this issue and I use a workaround here:
At server side:
private static readonly Regex _regexEncodedFileName = new Regex(#"^=\?utf-8\?B\?([a-zA-Z0-9/+]+={0,2})\?=$");
private static string TryToGetOriginalFileName(string fileNameInput) {
Match match = _regexEncodedFileName.Match(fileNameInput);
if (match.Success && match.Groups.Count > 1) {
string base64 = match.Groups[1].Value;
try {
byte[] data = Convert.FromBase64String(base64);
return Encoding.UTF8.GetString(data);
}
catch (Exception) {
//ignored
return fileNameInput;
}
}
return fileNameInput;
}
And then use this function like this:
string correctedFileName = TryToGetOriginalFileName(fileRequest.FileName);
It works.

In order to pass non-ascii characters in the Content-Disposition header filename attribute it is necessary to use the filename* attribute instead of the regular filename. See spec here.
To do this with HttpClient you can do the following,
var streamcontent = new StreamContent(stream);
streamcontent.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment") {
FileNameStar = "99 2 LD 353 Temp Äüöß-1.txt"
};
multipartContent.Add(streamcontent);
The header will then end up looking like this,
Content-Disposition: attachment; filename*=utf-8''99%202%20LD%20353%20Temp%20%C3%84%C3%BC%C3%B6%C3%9F-1.txt

I finally gave up and solved the task using HttpWebRequest instead of HttpClient. I had to build headers and content manually, but this allowed me to ignore the standards for sending non-ASCII filenames. I ended up cramming unencoded UTF-8 filenames into the filename header, which was the only way the server would accept my request.

DotNetZip trouble with russian encoding

i use DotNetZip in my project.
using (var zip = new ZipFile())
{
zip.ProvisionalAlternateEncoding = System.Text.Encoding.GetEncoding(866);
zip.AddFile(filename, "directory\\in\\archive");
zip.Save("archive.zip");
}
all ok but when i use method AddDirectoryByName i have a bad directory names.

Universal way for all is :
zip.AlternateEncoding = Encoding.UTF8;
zip.ProvisionalAlternateEncoding = Encoding.GetEncoding(Console.OutputEncoding.CodePage);
zip.AlternateEncodingUsage = ZipOption.AsNecessary;

This way in new version work for me
zip.AlternateEncodingUsage = ZipOption.Always;
zip.AlternateEncoding = Encoding.GetEncoding(866);

You may Peek Definition first.
Then you will find this:
public ZipFile(Encoding encoding);
So you can use this:
using (ZipFile zip = new ZipFile(Encoding.UTF8))

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How do I get ZipArchive.CreateEntry name encoding right? - c#

using (var zip = new ZipArchive(ZipFilePath, ZipArchiveMode.Read, false, Encoding.GetEncoding("cp866")))

Related

Streamed File Contains Strange Characters - An Encoding Issue

Encoding issue with spanish file in C#

Using Character Encoding with streamreader

How to disable base64-encoded filenames in HttpClient/MultipartFormDataContent

DotNetZip trouble with russian encoding

Categories

Resources