Generating proper CSV files

Generating proper CSV files - c#

I'm having a problem programmatically generating a proper CSV file that is then downloaded by the user and opened in excel in my ASP.NET project. Excel seems to open the file properly but when I go to “save as” it defaults to Unicode text. I understand that CSV is basically a text file but if you try creating a CSV in Excel, saving, and then going to save as it will default the save as type to CSV. Therefore I believe something extra is being saved along with the file. I’ve made sure the HTTP header context-type is set to “text/csv” so I am sure that the response is correct to the user.

We generate a lot of CSV where I work, and I've noticed this a lot. There's a really good chance that your file is just fine.
The problem with CSV is that it's not defined by any standard, so every app interprets it slightly different. Excel probably does this for any CSV file which isn't precisely in its preferred format.
Maybe Excel expects CSV to be ASCII, and you've got a UTF BOM in the file which makes it decide tab-delimited "Unicode text" is a better fit.

This should work:
protected void btnDownload_Click(object sender, EventArgs e)
{
Response.AddHeader("Content-Disposition", "attachment;filename=myfilename.csv");
Response.ContentType = "text/csv";
Response.Write("1;computer;1000");
Response.End();
}

Have you looked at a binary dump of the file to make sure the file being downloaded is identical to the file you're looking at locally? There could be different line terminators being used (e.g. ) that might be causing Excel to tolerantly read it in and display it, but default to saving it as unicode text.
On a Linux (or cygwin) system, using "od -a -x" will tell you how the file is made up.

Related

SSRS Reportviewer Export Excel Invalid File

We have an application that among other things will display an SSRS report through a reportviewer control. When exporting to excel, it creates a file that, when opened, has recently begun throwing the error:
Excel cannot open the file 'MyReportName.xlsx' because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file.
This appears to be an issue across all Office application formats, but the one I'm concerned about is Excel.
I've attempted to rename the file extension to .xls, but that did not prove valid. I've attempted to rename the extension to .zip and open in 7z, which typically works with xlsx files, but it doesn't open there either. The file is quite a bit smaller than we expect - just a few kb.
When opened directly from the SSRS server, the excel file produced can be opened fine.
Has anyone encountered this before?
Update:
There are some extra bytes at the start of the xlsx file.
Something is adding them into the stream that is sent to the browser. Remove them and the file opens just fine.
The last 2 bytes of this extra are 0D 0A, which is carriage return, line feed. So it looks like something is adding a text line in before sending the file. If I download the file I get 4 bytes and 0D 0A. Another file we examined had 5 bytes then 0D 0A, so it’s definitely something text.

I was having the same issue with The Content-Length being appended to the beginning of File exports in SRSS. The issue would stop when I disabled the URL Rewrite outbound rules. I found a solution based on this thread How to fix URL Rewriting for links inside CSS files with IIS7.
Short answer modify the web.config file to add rewriteBeforeCache="true" to your outbound rules tag.
<outboundRules rewriteBeforeCache="true">
This will stop the addition of the Content-Length to the beginning of the file.

Excel opening CSV with wrong encoding

This is partly a question for the Microsoft forums too, but I think there might be some coding involved.
We have a system built in C# .NET that generates CSV files. However, we have problems with special characters "æÆøØåÅ". The thing is, when I open the file in NotePad, everything is correct. But when I open the file in Excel, these characters are wrong. If I open in NotePad and save without actually doing any changes, it works in Excel. But I dont understand why? Is there some hidden information added to the file that can we adjusted in our C# code to make it correct in the first place?
There are other questions like this, but all answers I could find are workarounds for when you already have a wrong CSV file. In our case, we create this file, and the people we send the files too are usually not computer-people capable of changing encoding, etc.
Edit:
Here is the code we tried to use at the end, after generating our result CSV-string:
string result = "some;æøå;string";
byte[] bytes = System.Text.Encoding.GetEncoding(65001).GetBytes(result.ToString());
return System.Text.Encoding.GetEncoding(65001).GetString(bytes);

Two files that are binary identical, yet exhibit different behavior

I'm posting with tags asp.net and excel because that is the origination of my problem, but I'm not really sure this is the right place - ultimately, my problem is that I have two files (served by an ASP.Net application) which are identical based on a binary file compare using
fc /B A.xls B.xls
However, they exhibit different behavior: the first one opens fine in Excel; the second one does not. I conclude, then, that there is something different about the files beyond what the FC utility checks.
I have tried sending these two files to a friend to ask for his help, but discovered that when I do so, the problem file gets "fixed". In fact, if I do just about anything with this file, it gets "fixed". By fixed, I mean that it then opens fine in Excel. For example, if I zip it, then extract it from the zip, it is fine. If I open in Notepad++ and "Save As", it is fine. Same with Wordpad. Using plain old Notepad does NOT fix it.
So, obviously, there is some difference about these two files that I am missing.
I'm not sure if I will have any luck asking people to visit a random website, but if you want to see an example of the behavior, I have created a minimal page to duplicate the problem at http://rodj.me/ExcelTest
Click on the link for "MinimalHtml.aspx", and the app will serve an HTML based xls file using the following in the Page Load:
protected void Page_Load(object sender, EventArgs e)
{
Response.ContentType = "application/vnd.ms-excel";
Response.AddHeader("Content-Disposition", "filename=MinimalHtml.xls");
}
Depending on your browser and browser settings (my tests have been in Chrome), you may get Excel opened with a blank page. Regardless, you should get the file MinimalHtml.xls downloaded. It is a plain text file. You should find that this file will NOT open in Excel. However, if you zip the file, then extract it from zip, it WILL open.
I'm curious about what other file differences I'm missing when just doing an FC compare, but ultimately, I need to get the ASP.Net application corrected to serve the HTML version of the Excel file correctly. Interestingly, if I create an XML version of the spreadsheet, it downloads/opens fine. That is what the "MinimalXml.aspx" link does.
Can anyone help with either 1) how to figure out what is different about the two files; or 2) what must change in the ASP.Net application to get it to serve the file correctly?

I think your problem might be a Microsoft security patch. See this article:
Infoworld article
When you open the file directly, the patch causes the issue which results in a blank page because the file contents is HTML not Excel. When you download the file in a Zip file and unzip it, it is deemed safe and opens correctly.

Universal Data Link - File cannot be opened. Ensure it is a valid Data Link file

I am attempting to create a UDL file programmatically in C#. In my program, I want to show the user the Data Link properties window but with my own default values for the connection string. I initially thought to do the following:
string[] lines = new string[]
{
"[oledb]",
"; Everything after this line is an OLE DB initstring",
"Provider=SQLOLEDB.1;Persist Security Info=False"
};
File.WriteAllLines("Test.udl", lines);
Process p = Process.Start("Test.udl");
p.WaitForExit();
However, I get this error when trying to open the file:
File cannot be opened. Ensure it is a valid Data Link file.
This is strange because I created an empty file, named it something.udl, opened it, clicked OK, and then opened the contents of the file which was:
[oledb]
; Everything after this line is an OLE DB initstring
Provider=SQLOLEDB.1;Persist Security Info=False
But there was a newline character at the end of the connection string. I used KDiff to compare the this file and the file I created in my program and it said the "Files are equal text but the they are not binary equal" or something to that effect.
I believe it has to do with how the File.WriteAllLines method writes the strings. So I attempted to use different encodings with the method but with no success. Any ideas on where I am going wrong?
I am using this MSDN link as a reference about UDL files. Its also interesting to note that if I open a new text file and past in all of the lines in my lines array, I arrive at the same error.

All you need to do is use the Unicode encoding:
File.WriteAllLines("Test.udl", lines, Encoding.Unicode);

When creating the file in a plain-text editor, use the UTF-16 Little Endian encoding and include a Byte Order Mark (since Microsoft started on the Intel platform they consider that the "default" when they talk about UTF-16).
When using a program, make sure to use that particular encoding as well, programming languages might still default to a legacy codepage or use UTF-8, in which case opening the UDL file will trigger the error shown in the question.

Getting the type of file from the ASP.NET FileUpload control?

I want to get the type of file uploaded using the ASP.NET FileUpload control. When I upload a file, I want to be able to get the type of file uploaded, so I can assign a an icon to the file (such as a word, excel, pdf icon).
Here is the problem, I can't go off the file extension because a file could be called test.xxxxxxxx and be a valid pdf file, or a file might not have an extension.
The other option is to read the content-type, but with some of these appear not to be standard or in a simple to read format such as excel files, so is there another option to determine the file type?

I would review the process that names a PDF file "test.xxxxxxxx" without a ".pdf" extension. Most workflows that files have some kind of naming convention (manual or automated)
If you cannot read the extension, the file format will need to be detected by interogating markers that make it a known file format: eg: if the stream starts with or contains "%PDF" its a PDF.

If you don't know the extension, then you would have to know the file format of popular file types and then read the file and see if it matches one of your known formats. That doesn't sound optimal, though.

An easy way to check the true type of a file server side, using System.IO.BinaryReader, is described here:
http://forums.asp.net/post/2680667.aspx
and VB version here:
http://forums.asp.net/post/2681036.aspx
You'll need to know the binary 'codes' for the file type(s) you're checking for, but you can get those by implementing this solution and debugging the code.
Also note that when the BinaryReader is closed with the r.Close() statement, this will make FileUploader.HasFile = False

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.