We have an application that among other things will display an SSRS report through a reportviewer control. When exporting to excel, it creates a file that, when opened, has recently begun throwing the error:
Excel cannot open the file 'MyReportName.xlsx' because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file.
This appears to be an issue across all Office application formats, but the one I'm concerned about is Excel.
I've attempted to rename the file extension to .xls, but that did not prove valid. I've attempted to rename the extension to .zip and open in 7z, which typically works with xlsx files, but it doesn't open there either. The file is quite a bit smaller than we expect - just a few kb.
When opened directly from the SSRS server, the excel file produced can be opened fine.
Has anyone encountered this before?
Update:
There are some extra bytes at the start of the xlsx file.
Something is adding them into the stream that is sent to the browser. Remove them and the file opens just fine.
The last 2 bytes of this extra are 0D 0A, which is carriage return, line feed. So it looks like something is adding a text line in before sending the file. If I download the file I get 4 bytes and 0D 0A. Another file we examined had 5 bytes then 0D 0A, so it’s definitely something text.
I was having the same issue with The Content-Length being appended to the beginning of File exports in SRSS. The issue would stop when I disabled the URL Rewrite outbound rules. I found a solution based on this thread How to fix URL Rewriting for links inside CSS files with IIS7.
Short answer modify the web.config file to add rewriteBeforeCache="true" to your outbound rules tag.
<outboundRules rewriteBeforeCache="true">
This will stop the addition of the Content-Length to the beginning of the file.
Related
Issue - I am downloading an excel file from an online service. The problem is that it downloads as an XLS extention, but when you do a Save As its actually a WebPage type HTML. What I have to do is open excel and save as Excel file. When I import the file programmatically to Postgres, the data imports fine no problem. The problem is when clicking the save from PgAdmin and the CSV shows those special characters. They look like foreign characters.
However, lets go back to the online service. If I manually copy the list and paste it into Notepad (which strips those characters)...then copy the list back into an excel file. Then there is no issue.
Question is: How to strip these characters programmatically instead of doing it manually with Notepad?
Example of the issue: Look at the Image. The date time has the special character and also the other cell should say "FILLER 5"
I have a third-party library, which creates xlsx-file. It doesn't use OpenXmlSDK, it combines file from fragments of the xml-markup. For zipping there are used ZipArchive class.
But when I try to do with OpenXmlSDK
var document = SpreadsheetDocument.Open(fileStream, false);
it fails with error:
DocumentFormat.OpenXml.Packaging.OpenXmlPackageException: 'The specified package is invalid. The main part is missing.'
MS Excel opens this file normally. Resaving from Excel helps.
Also I unzip files, then zip them again (without any changes), try to call above code again and it works.
Where is the problem? How to zip xlsx-file ready for OpenXmlSDK?
SOLUTION
Problem was with saving file by third-party library. Files, included to zip have entry name with \ instead /. Code of that library was edited to fix that and all is ok.
After some research I found people complaining about this exception in two scenarios:
document uses or references not installed font (as described here:
https://github.com/OfficeDev/Open-XML-SDK/issues/561)
invalid file name extension (other than xlsx, as described here: https://social.msdn.microsoft.com/Forums/office/en-US/6e7e27d4-cd97-46ae-9eca-bfd618dde301/openxml-sdk20-the-specified-package-is-invalid-the-main-part-is-missing?forum=oxmlsdk)
Since You open the file from a stream, the second cause is rather not applicable in this case.
If font usage is not the cause, try to manually compare file versions before and after saving with Excel in Open XML Productivity Tool (https://www.microsoft.com/en-us/download/details.aspx?id=30425).
If there are no differences in documents' contents, try to compare archive compression settings.
UPDATE
It seems I've found some more information about the issue that can help to find the solution.
I was able to reproduce The main part is missing. error by creating archive with: ZipFile.CreateFromDirectory(#"C:\DirToCompress", destFilePath, CompressionLevel.Fastest, false);.
Then, I've checked that opening the file with Package.Open(destFilePath, FileMode.Open, FileAccess.Read) actually listed 0 parts found in the file.
After verifying some differences, I noticed that in the correct xlsx file, entries nested within folders in the archive have FullName paths presented using / character, for example: _rels/.rels. In the corrupted file, the names were written with \ character, for example: _rels\.rels.
You can investigate it by opening a file using ZipArchive class (for example: new ZipArchive(archiveStream, ZipArchiveMode.Read, false, UTF8Encoding.UTF8);) and inspecting the Entries collection.
The important thing to note is that there are naming rules for parts described in the Office Open XML specification: https://www.ecma-international.org/news/TC45_current_work/Office%20Open%20XML%20Part%202%20-%20Open%20Packaging%20Conventions.pdf
As a test, I wrote a code that opens the corrupted xlsx file using ZipArchive class and rewrites each entry by copying its contents and replacing \ with / for the name of the recreated entry. After this operation, the resulting file seems to be opened correctly by SpreadsheetDocument.Open(...) method.
Please note that the name fixing method I used was very simple and may be not enough or working correctly in some scenarios. However, these notes may help to find a desired solution for the issue.
Here is my case:
I'm using ABCPDF to generate a HTML document from a .DOCX file that I need to show on the web.
When you export to HTML from ABCPDF you generate a HTML and a folder with support files (.css, .js, .png)
Now these HTML files may contain quite sensitive data so I immediately after generating the files, I move them to a password-protected .zip file (from which I fetch them later)
The problem is, that this leaves the files unencrypted on the HDD for a few seconds and even longer if I'm (for some reason) unable to delete them at once.
I'd like suggestions for another way of doing this. I've looked in to a ram drive, but I'm not happy with installing such drivers on my servers. (AND the RAM drive would still be accessible from the OS)
The cause of the problem here might be that ABCPDF can only export HTML as files (since its multiple files) and not as a stream.
Any ideas?
I'm using .NET 4.6.x and c#
Since all your files except the .HTML are anonymous, you can use the suggested way of writing the HTML to a stream. Only all other files will be stored to the file system.
http://www.websupergoo.com/helppdfnet/source/5-abcpdf/doc/1-methods/save.htm
When saving to a Stream the format can be indicated using a Doc.SaveOptions.FileExtension property such as ".htm" or ".xps". For HTML you must provide a sensible value for the Doc.SaveOptions.Folder property.
http://www.websupergoo.com/helppdfnet/source/5-abcpdf/xsaveoptions/2-properties/folder.htm
This property specifies the folder where to store additional data such as images and fonts. it is only used when exporting documents to HTML. It is ignored otherwise.
For a start, try using a simple MemoryStream to hold the sensitive data. If you get large files or high traffic, open an encrypted stream to a file on your system.
I have a routine in a DLL I made (C#) that I use to upload a document to a Content Manager (IBM Filent P8).
It works fine but i have a problem with some characters.
I uploaded a document with file name like 'filenameØ.docx' and it happens that the 'Ø' char in someway causes trouble because the routine always end up fine (the document and the metadata are actually loaded) but the content in the CM is corrupted and unreadable.
The strange thing is that this behavior happens only in web application; in client apps the file is loaded correctly and I can read it back.
Is it probably a charset mistake in some configuration. Can anyone help?
I'm posting with tags asp.net and excel because that is the origination of my problem, but I'm not really sure this is the right place - ultimately, my problem is that I have two files (served by an ASP.Net application) which are identical based on a binary file compare using
fc /B A.xls B.xls
However, they exhibit different behavior: the first one opens fine in Excel; the second one does not. I conclude, then, that there is something different about the files beyond what the FC utility checks.
I have tried sending these two files to a friend to ask for his help, but discovered that when I do so, the problem file gets "fixed". In fact, if I do just about anything with this file, it gets "fixed". By fixed, I mean that it then opens fine in Excel. For example, if I zip it, then extract it from the zip, it is fine. If I open in Notepad++ and "Save As", it is fine. Same with Wordpad. Using plain old Notepad does NOT fix it.
So, obviously, there is some difference about these two files that I am missing.
I'm not sure if I will have any luck asking people to visit a random website, but if you want to see an example of the behavior, I have created a minimal page to duplicate the problem at http://rodj.me/ExcelTest
Click on the link for "MinimalHtml.aspx", and the app will serve an HTML based xls file using the following in the Page Load:
protected void Page_Load(object sender, EventArgs e)
{
Response.ContentType = "application/vnd.ms-excel";
Response.AddHeader("Content-Disposition", "filename=MinimalHtml.xls");
}
Depending on your browser and browser settings (my tests have been in Chrome), you may get Excel opened with a blank page. Regardless, you should get the file MinimalHtml.xls downloaded. It is a plain text file. You should find that this file will NOT open in Excel. However, if you zip the file, then extract it from zip, it WILL open.
I'm curious about what other file differences I'm missing when just doing an FC compare, but ultimately, I need to get the ASP.Net application corrected to serve the HTML version of the Excel file correctly. Interestingly, if I create an XML version of the spreadsheet, it downloads/opens fine. That is what the "MinimalXml.aspx" link does.
Can anyone help with either 1) how to figure out what is different about the two files; or 2) what must change in the ASP.Net application to get it to serve the file correctly?
I think your problem might be a Microsoft security patch. See this article:
Infoworld article
When you open the file directly, the patch causes the issue which results in a blank page because the file contents is HTML not Excel. When you download the file in a Zip file and unzip it, it is deemed safe and opens correctly.