What do i mean by 'non-standard'?
Take a look at these images: http://imgur.com/a/tFqHQ
The first one is the non-standard excel file. I'm pretty sure it's not an excel file, but the file's extension is .xls and for some reason Excel can open it, and understand it's structure.
The second image is the same file after it was opened in excel, and saved out to .xls (97-2003).
If excel can open it, and view it correctly, i should be able to do as well. Any tips how to approach this?
I have to mention that, my app have to use and read the non-standard excel files, because otherwise the user have to open the files one-by-one in (excel/libre office) and save it out in a correct format, which i would like to avoid for convenience.
Related
Basically need to identify the excel is originally created from a valid excel. I have tried using Microsoft.Interop.Excel File format.
Seems like I cannot use this without Microsoft office being installed. So need another approach for this problem.
EDIT :
Basically I want to be able to distinguish a valid excel from invalid one like an excel file which is converted from DLL. Because the file extension will say it is xls. If you try to open that file, it will open an empty workbook without any sheets. But I cannot decide an excel with no sheets is an invalid one.
If you want to idenify wheter file is a valid excel file or not, you can use 'trid file identifier'. TrID - File Identifier.
You can do it own your own also by reading magic number or file signaure bytes. Please go through this question Original file bytes from StreamReader, magic number detection.
I am creating a series of Excel Workbooks using EPPlus v3.1.3. When I open the newly created files, if I close it without touching anything it asks me if I want to save my changes. The only thing I've noticed changes if I say "yes" is that the app.xml file is slightly altered - there is no visible difference in the workbook, and the rest of the XML files are the same. I have tried both of these approaches:
ExcelPackage p = new ExcelPackage(new FileInfo(filename));
p.Save();
as well as
ExcelPackage p = new ExcelPackage();
p.SaveAs(new FileInfo(filename));
and both have the same problem. Is there a way to have the app.xml file output in its final form?
The reason this is an issue is because we use a SAS program to QC, and when the SAS program opens the files as they have been directly output from the EPPlus program it doesn't pick up the values from cells that have formulas in them. If it is opened and "yes" is chosen for "do you want to save changes", it works fine. However, as we are creating several hundred of these, that is not practical.
Also, I am using a template. The template appears normal.
What is particularly strange is that we have been using this system for well over a year, and this is the first time we have encountered this issue.
Is there any way around this? On either the C# or SAS side?
What you are seeing is not unusual actually. Epplus does not actually generate a full XLSX file - rather it creates the raw XML content (all office 2007 document formats are xml-based) and places it in the zip file which is renamed to XLSX. Since it has not been ran through the Excel engine it has not be fully formatted to excels liking.
If it is a simple data sheet then chances are Excel does not have to do much calculation - just basic formatting. So in that case it will not prompt you to save. But even then if you do you will see it change the XLSX file a little. If you really want to see what it is doing behind the scenes rename the file to .zip and look at the xml files inside before and after.
The problem you are running in to is because it is not just a simple table export Excel has to run calculations when opened for the first time. This could be many things - formulas, autofilters, auto column/row height adustments, outlining, etc. Basically, anything that will make the sheet look a little "different" after excel gets done with it.
Unfortunately, there is no easy fix for this. Running it through excel's DOM somehow would be simplest which of course defeats the purpose of using EPPlus. The other thing you could do is see the difference between the before and after of the xml files (and there are a bunch in there you would have to look at) and mimic what excel would change/add in the "after" file version by manually editing the XML content. This is not a very pretty option depending on how extensive the changes would be. You can see how I have done it in other situations here:
Create Pivot Table Filters With EPPLUS
Adding a specific autofilter on a column
Set Gridline Color Using EPPlus?
I ran into this same issue using EPPlus (version 4.1.0, fyi) and found adding the following code before closing fixed the problem:
p.Workbook.Calculate();
p.Workbook.FullCalcOnLoad = false;
I've been asked to strip an Excel file of macros, leaving only the data. I've been asked to do this by converting the Excel file to XML and then reading that file back into Excel using C#. This seems a bit inefficient to me and I was thinking that it would be easier to simply load the source Excel file into C# and then create a new target Excel file and add the sheets from the source back into the target.
I don't know where macros live inside an Excel file, so I'm not sure if this would accomplish the task or not. So, will this work? Will simply copying the sheets from one file to another strip it of it's macros or are they actually stored at the worksheet level?
As always, any and all suggestions are welcome, including alternate suggestions or even "why are you even doing this???". :)
To do this programmatically, you can use the ZipFile class from the System.IO.Compression library in .NET from C#. (.NET Framework 4.5)
Rename the file to add a ".zip" extension, and then open the file as a ZIP archive. Look for an element in the resultant "xl" folder called "vbproject.bin", and delete it. Remove the .zip extension. Macros gone.
Your best bet is to save the workbook as an xlsx, close it, open it, then save as a format of your choice.
This will strip the macros and is robust. It will also work if the VBA is locked for viewing.
Closing and reopening the workbook is necessary otherwise the macros are retained.
If you're needing to use C# to do this, I agree that it would be easier to load the source Excel file into C# and create a new target file only copying over the cells and sheets you need. Especially if you're doing this for a large amount of excel files I would recommend just creating a small console app that, when given an excel sheet, will automatically generate a new excel sheet with just the data for you.
One tool that I've found extremely useful and easy to use for such tasks is EPPlus.
From a console application written in C#, how can I :
extract an Office Open XML file,
Obtain the data part of it modify
the data and re-zip it again
My motivation is to save an excel file with the formats and use it to populate cells via a console application.
Is this possible to achieve, do I need a specific library to that provides Excel files manipulation (unzipping it, modifying it etc.)
For the zip-unzip part i think you can find easily many examples here.
To edit the excel file, I'd suggest you to have a look at Open XML SDK. With it, you can easily edit office files programmatically.
Hope it helps
I'm aware that I can generate an HTML <table> and save it as an .xls file to read into Excel, and that works fine and all, but it only gives me one sheet.
Is there a way to generate HTML so that I can have multiple sheets in a single .xls file? I've tried to simply generate more then one <table>, but they just end up getting appended to eachother.
Short Answer: No.
Longer Answer: You cannot cause an HTML generated page to split into multiple worksheets in an Excel file. Further, the HTML you generate for even a single page could cause Excel to choke on certain machines as it does the conversion when the file is loaded. We've seen a number of low powered machines take upwards of 5 minutes to show a HTML file in excel (simple table with rows/columns, nothing fancy) depending on size.
Better Answer: Use a third party product like ClosedXML or FileHelpers to generate a proper xlsx file.
there seems to be way though I didn't try it, see http://www.c-sharpcorner.com/UploadFile/kaushikborah28/79Nick08302007171404PM/79Nick.aspx and check the Worksheets attribute
check the official documentation at http://msdn.microsoft.com/en-us/library/Aa155477%28office.10%29.aspx
After installing the Help file you can find an example of a file with 3 Worksheets in XML Reference / Excel Workbook...
You can use open source ClosedXML, а wrapper around OpenXML to conveniently generate xlsx files - i.e. Office 2007+ format Excel files.