Best tools to create valid XML files from an Excel file - c#

I need to create a script that extracts some data from a complex Excel 2003 file (with multiple sheets and different tables inside a single sheet) and produces different XML files that need to be validated against a given XSD file.
My preferred language is Python;
to create and validate XML files i would go with lxml.
What do you suggest for parsing XLS files?
Is xlrd the right tool to use for complex Excel files?
Or do i need to convert all the sheets in CSV manually, and read files line by line, splitting and getting data?
I accept C#, VB6, VBA suggestions too.

[disclaimer: I'm the author of xlrd]
xlrd is quite suited for this kind of job. Get the latest version from PyPI. Get the flavour from the tutorial found here. XLSX support is in alpha test; e-mail me if you need it. The awkwardness and lossiness of the save-as-CSV approach was one of the things that prompted me to write xlrd.

Xlrd is OK. We use it extensively to import XLS files full of references and formulas with multiple sheets and data presented in custom (not Latin-1) encoding.

I am convinced the most simple solution for this task is using Excel VBA together with MSXML parser. Look here for some links how to use the MSXML parser in VBA for reading XML files; you can adopt this easily for writing XML files, I think.

I cant answer whether xlrd/python is the right tool for the job - as I don't know python well enough.
But there are many ways to access the excel data...in the main you have VBA built directly in to Excel.
Then you have Ado.net See David Hayden's article here which allows you to access the data via any DotNet language...even IronPython

Related

Merging two excel files in c# without using interop

I have to merge two excel files containing one sheet in each of them and I have to generate a third file containing two sheets corresponding to the two original sheets.
This task can be done using "interop" and the code works but when the same code is run in a system that does not contain MS Office, the process fails and an error comes up.
Can you please guide me as to what dll files to be included or how this merging could be done without using interop?
Thanks in advance.
From what I've experienced, there is unfortunately no framework way of doing this (without writing your own excel file reader). I happened across this interesting library which does just that.
http://exceldatareader.codeplex.com/
So far it has worked for our needs and requires no interop.
You should use an external component to work with excel files. I use the syncfusion xslIo.
If you only have raw data (no formulas, etc) you could also just save the files using the XML Spreadsheet 2003 (*.xml) format (its very easy to read) and process the data using standard XML tools.

Easily read Excel in C#?

I've tried the OleDb driver, LinqToExcel, and Excel Data Reader to read .xls files, but all of them seem to have very annoying limitations. LinqToExcel and the OleDb driver both throw "Too Many Fields Defined" error messages if the excel files have phantom columns. The Excel Data Reader threw undefined exceptions, which I was never able to get to the bottom of.
Is there any excel driver that "just works", and can handle slightly mis-formatted excel files?
A commercial software package would be fine. My current requirements only specify reading dates and text from cells, though more sophisticated functionality would be a plus.
[Edit]
Needs to support both XLS and XLSX file formats.
I can recommend Aspose.Cells and Flexcel... didn't try SpreadsheetGear but hear+read lots of good things about it...
A free option (though for the newer xlsx format only!) is OpenXML 2 from MS.
Try Epplus Open Source library for excel
Even if i did not yet try, this project seems interesting. It is minded expecially for writing but it just work even for read. Unfortunately it accepts the new xslx formats.
I've used SpreadSheetGear previously. You have to pay but it worked very well for my needs and handled the different file formats well.
I tested a few other libs but SSG worked best in terms of maintaining fidelity of the file when I saved copies of it but then my files had lots of data validations and controls in place etc. For simpler files there's a range of other options.
If all else fails you could write an Excel macro and, if needed, call the macro from within C#. I am not sure if you need that as you do not give any reasons why you are doing this in C#
Another way into the data is by using an interop assembly which has become much easier since the arrival of the dynamic keyword in C# 4

How do I generate a PDF/Excel file from an SQL database using C# and MVC 2?

I need to make the information in the database usable by allowing the user to download it as a PDF or Excel spreadsheet (either one works, both is perfect).
I've looked around at a bunch of options, but I really can't decide which one I should use, let alone if any of those options are actually useful. Most of the options I've found revolve around converting already existing HTML files into PDFs which is not what I need. Also, it needs to be free. My bosses haven't given me a budget to spend on this
I'm not sure what other information I should include here.
Well, any help is greatly appreciated. If you have questions about missing information, I'll get it posted ASAP. I'm here all day, so I'll be able to respond to any comments very quickly.
EDIT: Oh wow! Huge thanks, guys, for the massive response! I got a ton of ideas. This is super-helpful. Thanks!
if you want to generate an Excel (or also a Word) you can use openXml. You can create a new document exactly the way you want from pure code.
OpenXml SDK page
The solution I usually propose to my clients in this situation is to use Sql Server Reporting Services (SSRS). You can use the ReportViewer control included with it in order to generate PDF's, Excel spreadsheets, XML files, CSV files, and others. If you need ad hoc reporting, there is a Report Builder available as well.
Barring that, you can use OpenXml to generate Excel spreadsheets and there are a host of PDF toolkits available.
Have you looked into the reportviewer control, which is part of Visual Studio?
It allows you to export the report in PDF or Excel format.
http://www.carlosag.net/tools/excelxmlwriter/sample
check this might be useful for you
There are lots of reporting solutions out there such as SQL Server Reporting Services(for which you might already have a license). Take a look at Reporting (free || open source) Alternatives to Crystal Reports in Winforms which can likely be applied to the web with a bit of serialization.
I would suggest thinking about rolling your own depending on the situation. You could use pdfsharp for the pdf export and EPPlus for excel. They are both very easy to use and, I'm pretty sure, available in nuget with a couple of clicks.
If you want to go the Excel route, i'd recommend this article from Stephen Walther entitled ASP.NET MVC Tip #2 - Create a custom Action Result that returns Microsoft Excel Documents. This uses an old trick of writing an HTML document with an Excel mime type. This is different than streaming a native Excel file. And it's fairly easy to change the to rendering a CSV file if you want to strip it down, and make it a more universal file. Just remember to double-quote all the fields if there's a possibility of commas showing up.
If what your doing isn't too complicated you can use CSV files. CSV stands for comma separated values, and it is what it sounds like. You can create simple tables and columns using commas. For example paste the following lines into a text file:
heading1,heading2,heading3
info1,info2,info3
info1,info2,info3
Save the text file as a .csv file and voila - an excel spreadsheet. Obviously it is extremely easy to build these looping object collections. Mind you if you need any complicated text formatting etc then it is not really the best option.

How do you create an .xslt file from an Excel spreadsheet?

I'm working on a C# application that is going to save a DataSet to an Excel file.
I have found several examples of how to do this, but they all require you to have an xslt style sheet. I already have an existing Excel spreadsheet with all the worksheets and columns created. Is there an easy way to create an .xslt file from my existing Excel spreadsheet?
An example.
First of all, your question is really confusing and hard to understand. You might want to rewrite it to be a bit more specific. If you really have two or three questions, then ask each one on a separate page.
Step 1. You said that you have an Excel file. Open it up in a text editor and look at it. If it is in XML format, then the XSL that you need should be obvious because your job is to convert the existing XML dataset into the Excel XML format.
Step 2. XSL Templates are not magic. They are actually a form of programming language that is executed by an XLST engine. Of course it is possible to write a program that can compare two XML formats and generate an XSL template that would transform one into another, but that would be a very complex program and it would probably have some rigid requirements on the type of XML files that it would work with. I doubt that anyone has created such a tool that you can leverage.
Step 3. Get a decent tool that allows you to apply an XSL template to your dataset so that you can test it without a lot of work. Personally I use Netbeans with an XML Debug plugin, but there are numerous other tools out there. In fact, you could just write a simple transform tool yourself that just runs the XSLT on a sample dataset and then opens it up in both Excel and a text editor.

Transforming Excel 2010 documents?

I am interested in writing an application that will take in an excel document of a specific format, massage the data and create a new Excel document that has different formatting.
I am curious if anyone can recommend a good place to start on this.
My first thought was to write something my self in C#. I came across this tool on codeplex:
http://excelwrapperdotnet.codeplex.com/wikipage?title=Usage%20-%20Example&referringTitle=Documentation
But it appears to only be for Excel 2007.
Is there a best practice for doing this type of thing for Excel 2010 documents? Do I even need to program something custom to do this or does Excel offer something that might handle this?
Another nice library to modify Excel 2007/2010 documents (.xlsx) is EPPlus. It gives you a nice object model on your spreadsheets.
Excel files (.xslx) are archived XML files. They use 'Open XML', take a look here MICROSOFT Open XML
That should get you going on the right path.

Categories