How to encode Unicode so both iPad and Excel can understand? - c#

I have a CSV that is encoded with UTF32. When I open stream in IE and open with Excel I can read everything. On iPad I stream and I get a blank page with no content whatsoever. (I don't know how to view source on iPad so there could be something hidden in HTML).
The http response is written in asp.net C#
Response.Clear();
Response.Buffer = true;
Response.ContentType = "text/comma-separated-values";
Response.AddHeader("Content-Disposition", "attachment;filename=\"InventoryCount.csv\"");
Response.RedirectLocation = "InventoryCount.csv";
Response.ContentEncoding = Encoding.UTF32;//works on Excel wrong in iPad
//Response.ContentEncoding = Encoding.UTF8;//works on iPad wrong in Excel
Response.Charset = "UTF-8";//tried also adding Charset just to see if it works somehow, but it does not.
EnableViewState = false;
NMDUtilities.Export oUtilities = new NMDUtilities.Export();
Response.Write(oUtilities.DataGridToCSV(gvExport, ","));
Response.End();
The only guess I can make is that iPad cannot read UTF32, is that true? How can I view source on iPad?
UPDATE
I just made an interesting discovery. When my encoding is UTF8 things work on iPad and characters are displayed properly, but Excel messes up a character. But when I use UTF32 the inverse is true. iPad displays nothing, but Excel works perfectly. I really have no idea what I can do about this.
iPad UTF8 outputs = " Quattrode® "
Excel UTF8 outputs = " Quattrode® "
iPad UTF32 outputs = " "
Excel UTF32 outputs = " Quattrode® "
Here's my implementation of DataGridToCsv
public string DataGridToCsv(GridView input, string delimiter)
{
StringBuilder sb = new StringBuilder();
//iterate Gridview and put row results in stringbuilder...
string result = HttpUtility.HtmlDecode(sb.ToString());
return result;
}
UPDATE2 Excel is barfing on UTF8 >:{. Man. I just undid the second option he lists because it doesnt work on iPad. I cant win for losing on this one.
UPDATE3Per your suggestions I have looked at the hex code. There is no BOM, but there is a difference between the file layouts.
UTF84D 61 74 65 (MATE from the first word MATERIAL)
UTF324D 00 00 00 (M from the first word MATERIAL)
So it looks like UTF32 lays things out in 32 bits vs UTF8 doing it in 8 bits. I think this is why Excel can guess. Now I will try your suggested fixes.

The problem is that the browser knows your data's encoding is UTF-8, but it has no way of telling Excel. When Excel opens the file, it assumes your system's default encoding. If you copy some non-ASCII text, paste it in Notepad, and save it with UTF-8 encoding, though, you'll see that Excel can properly detect it. It works on the iPad because its default encoding just happens to be UTF-8.
The reason is that Notepad puts the proper byte order mark (EF BB BF for UTF-8) in the beginning of the file. You can try it yourself by using a hex editor or some other means to create a file containing
EF BB BF 20 51 75 61 74 74 72 6F 64 65 C2 AE 20
and opening that file in Excel. (I used Excel 2010, but I assume it would work with all recent versions.)
Try making sure your output starts with those first 3 bytes.
How to write a BOM in C#
byte[] BOM = new byte[] { 0xef, 0xbb, 0xbf };
Response.BinaryWrite(BOM);//write the BOM first
Response.Write(utility.DataGridToCSV(gvExport, ","));//then write your CSV

Excel tries to infer the encoding based on your file contents, and ASCII and UTF-8 happen to overlap on the first 128 characters (letters and numbers). When you use UTF-16 and UTF-32, it can figure out that the content isn't ASCII, but since most of your content using UTF-8 matches ASCII, if you want your file to be read in as UTF-8, you have to tell it explicitly that the content is UTF-8 by writing the byte order mark as Gabe said in his answer. Also, see the answer by Andrew Csontos on this other question:
What's the best way to export UTF8 data into Excel?

Related

load screenshot from adb through c#

I want to get a screenshot into c# using adb without saving files to the filesystem all the time.
I'm using the SharpAdbClient to talk with the device.
I'm on a windows platform.
This is what i got so far:
AdbServer server = new AdbServer();
StartServerResult result = server.StartServer(#"path\to\adb.exe", restartServerIfNewer: false);
DeviceData device = AdbClient.Instance.GetDevices().First();
ConsoleOutputReceiver receiver = new ConsoleOutputReceiver();
AdbClient.Instance.ExecuteRemoteCommand("screencap -p", device, receiver);
string str_image = receiver.ToString().Replace("\r\r", "");
byte[] bytes = Encoding.ASCII.GetBytes(str_image);
Image image = Image.FromStream(new MemoryStream(bytes));
I can successfully load both str_image, and create the byte array but it keeps saying System.ArgumentException when trying to load it into an Image.
I also tried saving the data to a file, but the file is corrupt.
I tried both replacing "\r\r" and "\r\n", both same result.
Anyone has some insight in how to load this file?
It's actually preferred if it could be loaded into a Emgu image since i'm gonna do some CV on it later.
One possible cause is the nonprintable ASCII characters in the string.
Look at the code below
string str_image = File.ReadAllText("test.png");
byte[] bytes = Encoding.ASCII.GetBytes(str_image);
byte[] actualBytes = File.ReadAllBytes("test.png");
str_image is shown in the below screencap, note that there are some non-printable chars (displayed as question mark).
The first eight bytes of a PNG file are always
137 80 78 71 13 10 26 10
While you read the console output as a string, then use ASCII to encode the string, the first byte becomes 63 (0x3F), which is the ASCII code for a question mark.
Also note that the size of the two byte arrays vary hugely (7828/7378).
And other thing is you are replace "\r\r", while actually a new line character in Windows is "\r\n".
So my conclusion is some image data is lost or modified in the output redirection by the ConsoleOutputReceiver, and you cannot recover the original data from the output string.

UTF-8 CSV file created with C# shows  characters in Excel

When a CSV file is generated using C# and opened in Microsoft Excel it displays  characters before special symbols e.g. £
In Notepad++ the hex value for  is: C2
So before writing the £ symbol to file, I have tried the following...
var test = "£200.00";
var replaced = test.Replace("\xC2", " ");
StreamWriter outputFile = File.CreateText("testoutput.csv"); // default UTF-8
outputFile.WriteLine(replaced);
outputFile.Close();
When opening the CSV file in Excel, I still see the "Â" character before the £ symbol (hex equivalent \xC2 \xA3); It made no difference.
Do I need to use a different encoding? or am I missing something?
Thank you #Evk and #Mortalier, your suggestions lead me to the right direction...
I needed to update my StreamWriter so it would explicitly include UTF-8 BOM at the beginning http://thinkinginsoftware.blogspot.co.uk/2017/12/correctly-generate-csv-that-excel-can.html
So my code has changed from:
StreamWriter outputFile = File.CreateText("testoutput.csv"); // default UTF-8
To:
StreamWriter outputFile = new StreamWriter("testoutput.csv", false, new UTF8Encoding(true))
Or: Another solution I found here was to use a different encoding if you're only expecting latin characters...
http://theoldsewingfactory.com/2010/12/05/saving-csv-files-in-utf8-creates-a-characters-in-excel/
StreamWriter outputFile = new StreamWriter("testoutput.csv", false, Encoding.GetEncoding("Windows-1252"))
My system will most likely use latin & non-latin characters so I'm using the UTF-8 BOM solution.
Final code
var test = "£200.00";
StreamWriter outputFile = new StreamWriter("testoutput.csv", false, new UTF8Encoding(true))
outputFile.WriteLine(test);
outputFile.Close();
I tried your code and Excel does show AŁ in the cell.
Then I tried to open the csv with LibreOffice Clac. At first there too was AŁ, but
on import the program will ask you about encoding.
Once I chose UTF-8 the £ symbol was displayed correctly.
My guess is that in fact there is an issue with your encoding.
This might help with Excel https://superuser.com/questions/280603/how-to-set-character-encoding-when-opening-excel

Content-Length Occasionally Wrong on Simple C# HTTP Server

For some experimentation was working with Simple HTTP Server code here
In one case I wanted it to serve some ANSI encoded text configuration files. I am aware there are more issues with this code but the only one I'm currently concerned with is Content-Length is wrong, but only for certain text files.
Example code:
Output stream initialisation:
outputStream = new StreamWriter(new BufferedStream(socket.GetStream()));
The handling of HTTP get:
public override void handleGETRequest(HttpProcessor p)
{
if (p.http_url.EndsWith(".pac"))
{
string filename = Path.Combine(Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().Location), p.http_url.Substring(1));
Console.WriteLine(string.Format("HTTP request for : {0}", filename));
if (File.Exists(filename))
{
FileInfo fi = new FileInfo(filename);
DateTime lastWrite = fi.LastWriteTime;
Stream fs = File.Open(filename, FileMode.Open, FileAccess.Read, FileShare.Read);
StreamReader sr = new StreamReader(fs);
string result = sr.ReadToEnd().Trim();
Console.WriteLine(fi.Length);
Console.WriteLine(result.Length);
p.writeSuccess("application/x-javascript-config",result.Length,lastWrite);
p.outputStream.Write(result);
// fs.CopyTo(p.outputStream.BaseStream);
p.outputStream.BaseStream.Flush();
fs.Close();
}
else
{
Console.WriteLine("404 - FILE not found!");
p.writeFailure();
}
}
}
public void writeSuccess(string content_type,long length,DateTime lastModified) {
outputStream.Write("HTTP/1.0 200 OK\r\n");
outputStream.Write("Content-Type: " + content_type + "\r\n");
outputStream.Write("Last-Modified: {0}\r\n", lastModified.ToUniversalTime().ToString("r"));
outputStream.Write("Accept-Range: bytes\r\n");
outputStream.Write("Server: FlakyHTTPServer/1.3\r\n");
outputStream.Write("Date: {0}\r\n", DateTime.Now.ToUniversalTime().ToString("r"));
outputStream.Write(string.Format("Content-Length: {0}\r\n\r\n", length));
}
For most files I've tested with Content-Length is correct. However when testing with HTTP debugging tool Fiddler some times protocol violation is reported on Content-Length.
For example fiddler says:
Request Count: 1
Bytes Sent: 303 (headers:303; body:0)
Bytes Received: 29,847 (headers:224; body:29,623)
So Content-Length should be 29623. But the HTTP header generated is
Content-Length: 29617
I saved the body of HTTP content from Fiddler and visibly compared the files, couldn't notice any difference. Then loaded them into BeyondCompare Hex compare, there are several problems with files like this:
Original File: 2D 2D 96 20 2A 2F
HTTP Content : 2D 2D EF BF BD 20 2A 2F
Original File: 27 3B 0D 0A 09 7D 0D 0A 0D 0A 09
HTTP Content : 27 3B 0A 09 7D 0A 0A 09
I suspect problem is related to encoding but not exactly sure. Only serving ANSI encoded files, no Unicode.
I made the file serve correctly with right Content-Length by modifying parts of the file with bytes sequence. Made this change in 3 parts of the file:
2D 2D 96 (--–) to 2D 2D 2D (---)
Based on the bytes you pasted, it looks like there are a couple things going wrong here. First, it seems that CRLF in your input file (0D 0A) is being converted to just LF (0A). Second, it looks like the character encoding is changing, either when reading the file into a string, or Writeing the string to the HTTP client.
The HTTP Content-Length represents the number of bytes in the stream, whereas string.Length gives you the number of characters in the string. Unless your file is exclusively using the first 128 ASCII characters (which precludes non-English characters as well as special windows-1252 characters like the euro sign), it's unlikely that string.Length will exactly equal the length of the string encoded in either UTF-8 or ISO-8859-1.
If you convert the string to a byte[] before sending it to the client, you'll be able to get the "true" Content-Length. However, you'll still end up with mangled text if you didn't read the file using the proper encoding. (Whether you specify the encoding or not, a conversion is happening when reading the file into a string of Unicode characters.)
I highly recommend specifying the charset in the Content-Type header (e.g. application/x-javascript-config;charset=utf-8). It doesn't matter whether your charset is utf-8, utf-16, iso-8859-1, windows-1251, etc., as long as it's the same character encoding you use when converting your string into a byte[].

How to determine encoding of image using header bytes

So I am using c#, and I need to determine the actual encoding of an image-file. Most images can be in one format while simultaneously having a different extension and still work in general.
My need's require precise knowledge of the image format.
There is one other thread that deals with this: Determine Image Encoding of Image File
This show's how to find the actual encoding once you have the image's header information. I need to open the image and extract this header information.
FileStream imageFile = new FileStream("myImage.gif", FileMode.Open);
After this bit, how do I open only the bytes which contain the header?
Thank you.
You can't really read "just the header" unless you know it's size.
Instead, determine the minimum amount of bytes you need to be able to distinguish between the formats you need to support, and read only those bytes. Most likely all of the formats you need will have a unique header.
For example, if you need to support png & jpeg, those formats start with:
PNG: 89 50 4E 47 0D 0A 1A 0A
JPEG: FF D8 FF E0
So in that case you'd only have to read a single byte to differ between the two. In reality I'd say use a few more bytes, just in case you encounter other file formats.
To read, say 8 bytes, from the beginning of a file:
using( var sr = new FileStream( "file", FileMode.Open ) )
{
var data = new byte[8];
int numRead = sr.Read( data, 0, data.Length );
// numRead gives you the number of bytes read
}
Well I figured it out in the end. So im going to update the thread and close it. The only issue with my solution is that it requires opening the entire image file, rather than just the required bytes. This uses alot more memory, and takes longer. So it isn't the optimal solution when speed is a concern.
Just to give credit where it's due, this code was created from a
couple of sources here on stack-overflow, you can find the link's in
the OP and earlier comments. The rest of the code was written by me.
If anyone feels like modifying the code to only open the correct amount of bytes, feel free.
TextWriterTraceListener writer = new TextWriterTraceListener(System.Console.Out);
Debug.Listeners.Add(writer);
// PNG file contains 8 - bytes header.
// JPEG file contains 2 - bytes header(SOI) followed by series of markers,
// some markers can be followed by data array. Each type of marker has different header format.
// The bytes where the image is stored follows SOF0 marker(10 - bytes length).
// However, between JPEG header and SOF0 marker there can be other segments.
// BMP file contains 14 - bytes header.
// GIF file contains at least 14 bytes in its header.
FileStream memStream = new FileStream(#"C:\\a.png", FileMode.Open);
Image fileImage = Image.FromStream(memStream);
//get image format
var fileImageFormat = typeof(System.Drawing.Imaging.ImageFormat).GetProperties(System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Static).ToList().ConvertAll(property => property.GetValue(null, null)).Single(image_format => image_format.Equals(fileImage.RawFormat));
MessageBox.Show("File Format: " + fileImageFormat);
//get image codec
var fileImageFormatCodec = System.Drawing.Imaging.ImageCodecInfo.GetImageDecoders().ToList().Single(image_codec => image_codec.FormatID == fileImage.RawFormat.Guid);
MessageBox.Show("MimeType: " + fileImageFormatCodec.MimeType + " \n" + "Extension: " + fileImageFormatCodec.FilenameExtension + "\n" + "Actual Codec: " + fileImageFormatCodec.CodecName);
Output is as Expected:
file_image_format: Png
Built-in PNG Codec, mime: image/png, extension: *.PNG

CSV encoding issues (Microsoft Excel)

I am dynamically creating CSV files using C#, and I am encountering some strange encoding issues. I currently use the ASCII encoding, which works fine in Excel 2010, which I use at home and on my work machine. However, the customer uses Excel 2007, and for them there are some strange formatting issues, namely that the '£' sign (UK pound sign) is preceded with an accented 'A' character.
What encoding should I use? The annoying thing is that I can hardly test these fixes as I don't have access to Excel 2007!
I'm using Windows ANSI codepage 1252 without any problems on Excel 2003. I explicitly changed to this because of the same issue you are seeing.
private const int WIN_1252_CP = 1252; // Windows ANSI codepage 1252
this._writer = new StreamWriter(fileName, false, Encoding.GetEncoding(WIN_1252_CP));
I've successfully used UTF8 encoding when writing CSV files intended to work with Excel.
The only problem I had was making sure to use the overload of the StreamWriter constructor that takes an encoding as a parameter. The default encoding of StreamWriter says it is UTF8 but it's really UTF8-Without-A-Byte-Order-Mark and without a BOM Excel will mess up characters using multiple bytes.
You need to add Preamble to file:
var data = Encoding.UTF8.GetBytes(csv);
var result = Encoding.UTF8.GetPreamble().Concat(data).ToArray();
return File(new MemoryStream(result), "application/octet-stream", "file.csv");

Categories