How to properly programmatically convert a XLSX file to HTML using C#? - c#

At work, we're modifying an XLSX file, and we would like to turn this modified file into an HTML file (to convert it into PDF using Puppeteer#, but it's not the point here).
We know how to get XML files of this XLSX, and I already found XSLCompiledTransform to convert XML files to HTML.
The annoyance here is that, from what I have read for XSLCompiledTransform, to transform XML file to HTML you need one stylesheet + one XML file.
This brings three problems :
It looks like the stylesheet into XLSX for each sheet isn't well formated to use with this XSLCompiledTransform.
The XLSX file contains multiples sheets, so we would have to fuse them in some manner, and we don't know how.
It is not just some random XML files, they're parts of an XLSX file. Thus there are also some XML files in addition to the sheets (like a workbook and other files) and we can't figure how we could generate an HTML file which is precisely like the XLSX file as open using Excel without using these XML files.
These problems could be resumed as: We struggle to find how to generate an HTML file which will look exactly like the original whole XLSX file.
We don't really want to create an HTML file from some XML files, so any means to transform an XLSX to HTML is good.
We also know that there are some tools and libs available to directly do this, but all the ones I've found aren't free, and we would like to avoid to pay for that as it's the first time we need it and maybe the last.
Does anyone know an accurate option to programmatically transform an XLSX file to HTML, keeping every style options and using C#?

Related

How Can I convert a uploaded pptx/ppt file to XML format in C# using OpenXML?

I have an application which allows uploading a ppt/pptx file. I want to convert the presentation file to equivalent XML format.
A pptx file is essentially a zip file, renamed to pptx. If you rename and extract the content you can find the xml document.
With ppt you have more problem as it is proprietary to Microsoft and may not even be publicly documented. Office automation would most probably work, but rather complicated.

Get page count of RTF, TXT and XLS documents by OpenXml in C#

I want to count pages of the file with types of .rtf, .txt & .xls(x) through Open Xml.
One can say to use Office.InterOp assemblies (also to get counts of TXT & RTF and it works too). but I tried already and can't use as it takes so much time specifically on large files.
I've tried a lot but I couldn't or can say didn't get any reference on web.
By digging deep, for Excel files, I found something described on THIS site, but it just setups the page print settings.
Before printing, I want to have page count to be shown to user.
Also, I'm still trying to have any way for TXT & RTF files. Please share if this also can be via Open Xml and how.

How to acces the author name and other docx metadata

I want to use C# to get the metadata of a file, for example a docx.
In the screenshot below you see the auteur and other metadata of a file.
How do I write this metadata to the console?
A word file in DOCX is packaged as a zip file. The metadata is in an XML file within that zip file.
As a very simple way to think about it, this is what you would need to do programmatically through C#:
Unzip the DOCX file into it's folder structure.
Open the core.xml file located in the docProps folder of that structure.
Pull out and store the relevant XML elements that you are looking for, such as
title, subject or whatever.
Write those elements with Console.WriteLine().
Image Showing Structure and XML file
Info on Office Open XML format

Reading non-standard excel file with C#

What do i mean by 'non-standard'?
Take a look at these images: http://imgur.com/a/tFqHQ
The first one is the non-standard excel file. I'm pretty sure it's not an excel file, but the file's extension is .xls and for some reason Excel can open it, and understand it's structure.
The second image is the same file after it was opened in excel, and saved out to .xls (97-2003).
If excel can open it, and view it correctly, i should be able to do as well. Any tips how to approach this?
I have to mention that, my app have to use and read the non-standard excel files, because otherwise the user have to open the files one-by-one in (excel/libre office) and save it out in a correct format, which i would like to avoid for convenience.

Generating Excel file from HTML

I'm aware that I can generate an HTML <table> and save it as an .xls file to read into Excel, and that works fine and all, but it only gives me one sheet.
Is there a way to generate HTML so that I can have multiple sheets in a single .xls file? I've tried to simply generate more then one <table>, but they just end up getting appended to eachother.
Short Answer: No.
Longer Answer: You cannot cause an HTML generated page to split into multiple worksheets in an Excel file. Further, the HTML you generate for even a single page could cause Excel to choke on certain machines as it does the conversion when the file is loaded. We've seen a number of low powered machines take upwards of 5 minutes to show a HTML file in excel (simple table with rows/columns, nothing fancy) depending on size.
Better Answer: Use a third party product like ClosedXML or FileHelpers to generate a proper xlsx file.
there seems to be way though I didn't try it, see http://www.c-sharpcorner.com/UploadFile/kaushikborah28/79Nick08302007171404PM/79Nick.aspx and check the Worksheets attribute
check the official documentation at http://msdn.microsoft.com/en-us/library/Aa155477%28office.10%29.aspx
After installing the Help file you can find an example of a file with 3 Worksheets in XML Reference / Excel Workbook...
You can use open source ClosedXML, а wrapper around OpenXML to conveniently generate xlsx files - i.e. Office 2007+ format Excel files.

Categories