I currently have a website (ASP.NET 3.5, IIS 7.0) that allows users to upload Excel files for processing.
Should I be concerned with viruses and malicious code being executed when the document is opened?
We are currently using the .NET Office.Interop assemblies to fetch the information from the document. The information isn't exactly tabular and requires a little bit of interrogation to get it into the required format.
Once the document has been uploaded it will be stored in the database, only when the document is inspected is it written to disk.
Are there any recommendations that would provide a secure implementation?
Using the xlsx (Open XML) file format will be safer than using xls or xlsm since xlsx workbooks cannot contain macros.
You might consider using a pure .NET component which does not use COM Interop or any native calls and does not require FullTrust. SpreadsheetGear for .NET is an example of such a component.
Disclaimer: I own SpreadsheetGear LLC
Sanitizing is the only way to be sure. Since it's not simply form input, you want to take extra precautions. The simplest method I can imagine is to nuke any binary-indicators, like control-characters.
As far as best practices, you can't really tell your users "Please don't hack me", so you have to have a certain level of trust (or give up on Excel files)... I would say if the first pass picks up any binary flags, incinerate it and throw a fairly obtuse error like "error in file format", etc.
But of course, your users will murder you they ever get that error for a good file.
Related
I am searching from last two days but did not find any thing.
My requirement is to create a document viewer in my web application (C#.Net) and I don't want to use any third party tool for this. Can I convert the files in image or PDF or in any common formate which can be easly render on web page. I also can not use Introp object.
Any help will be highly appreciated
You mention in one of your comments that you'd like to write all the code yourself but don't know where to start. Here's how I would go about it...
First, you'll need to familiarize yourself with the Microsoft Office Format specification. You can find that here (there's a link to the technical specification). Office documents are actually a .zip file with an XML file inside along with any binary data representing attachments. Just renamed a .docx file as .zip and you'll be able to open it up and see the XML and any other supporting documents inside (same is true for xlsx, etc...).
Then you'll need to become intimately familiar with either PDF or HTML, as your job now will be to convert the various Office document structure into PDF or HTML structure, being sure to respect page layout, margins, order, etc...
As others have said, this is a large task which is why third party tools exist today. Also, each third party toolset has it's limitation as this is really hard to "get right" in all situations and there will be edge cases that work for one document and not another (because maybe they didn't use Microsoft Word to save the .docx, maybe they used OpenOffice and OpenOffice interpreted the standard slightly differently...)
If you cannot use COM/Interop technologies in your solution, you can take a look at the specialized 3rd party options. I see that you prefer not to use them, however, there are no existing built-in solutions in the .NET Framework. Check out my answer in a similar thread that describes how to accomplish exactly the same task using 3rd party libraries (for example, DevExpress, since I have experience with it). In addition, take a look at the Documents demo, where you can see how to create images/thumbnails from different types of MS Office documents.
I believe what you need is an intermediate representation of the documents which can be converted into an image for the viewer to display.
Lets me try to explain with the below diagram:
You can use tools like smallpdf or OfficeToPDF to do that. Just integrate them into your application.
Small PDF(https://smallpdf.com/library-detail)
officetopdf (https://officetopdf.codeplex.com/)
I am searching a C# way to delete (empty) Excel-rows in a worksheet without using the Microsoft.Office.Interop.Excel namespace.
Found many examples with the Interop namespace like C# and excel deleting rows . But is there a way to do it without third-party-tools - only with the .NET?
Thank you for your help!
The options for working with Excel files relying only on standard .NET Framework namespaces is limited. Two possibilities come to mind. The first is "simplest", but only applicable if your main interest is in working with the content as a database. The second allows you to do pretty much "anything" with the Excel workbook, but the learning curve will be steep.
Both of these approaches are suited for working in a server environment (unlike those that require presence of the Excel application) and do not require any licenses.
You can use an OLE DB connection (ACE OLE DB provider) to communicate with the contents of an Excel workbook. It allows connecting to individual worksheets as well as named ranges. Basic SQL functionality is supported.
The file format of Excel 2007 and later versions is Office Open XML (OOXML). These files are "zip packages" containing the files (xml for the most part) that make up a workbook. So any standard tools that can work with Zip packages and XML can be used to open up an Excel workbook, edit the content, then close the workbook back up. In the .NET Framework, these would be the System.IO.Packaging (in WindowsBase.dll, usually needs to be referenced specifically) and System.XML namespaces.
The documentation for the file formats is the ECMA-376 standard (http://www.ecma-international.org/publications/standards/Ecma-376.htm). A useful on-line resource is openxmldeveloper.org.
Note that Microsoft also provides the Open XML SDK, a free download which can be distributed license-free with your solution. The Open XML SDK reduces the "learning curve" as it reduces the amount of knowledge you need about the OOXML file formats. I mention this for the sake of completeness, because I know how challenging trying to work directly with the file format is. Also, since the DLL is freely distributable and can be copied as part of your solution it might meet your requirements.
This stackoverflow post may help - it discusses some libraries that can manipulate excel without needing Office installed.
The question regards VB.NET but I believe the options discussed would work with C# too...
How to process excel file in vb.net without office installed
I know this is a little subjective, but I'm looking into the following situation:
I need to produce a number of documents automatically from data in a SQL Server database. There will be an MVC3 app sat on the database to allow data entry etc. and (probably) a "Go" button to produce the documents.
There needs to be some business logic about how these documents are created, named and stored (e.g. "Parent" documents get one name and go in one folder, "Child" documents get a computed name and go in a sub-folder.
The documents can either be PDF or Doc(x) (or even both), as long as the output can be in EN-US and AR-QA (RTL text)
I know there are a number of options from SSRS, Crystal Reports, VSTO, "manual" PDF in code, word mail merge, etc... and we already have an HTML to PDF tool if thats any use?
Does anyone have any real world advice on how to go about this and what the "best" (most pragmatic) approach would be? The less "extras" I need to install and configure on a server the better - the faster the development the better (as always!!)
Findings so far:
Word Mail Merge (or VSTO)
Simply doesn't offer the simplicity, control and flexibility I require - shame really. Would be nice to define a dotx and be able to pass in the data to it on an individual basis to generate the docx. Only way I could acheive this (and I may be wrong here) was to loop through controls/bookmarks by name and replace the values...messy.
OpenXML
Creating documents based on dotx templates, even using OpenXML is not as simple as (IMHO) it should be. You have to replace each Content control by name, so maintenance isn't the simplest task.
SSRS
On the face of it this is a good solution (although it needs SQL Enterprise), however it gets more complicated if you want to dynamically produce the folders and documents. Data driven subscription gets very close to what I want though.
Winnovative HTML to PDF Convertor*
This is the tool we already have (albeit a .Net 2.0 version). This allows me to generate the HTML pages and convert those to PDF. A good option for me since I can run this on an MVC3 website adn pass the parameters into the controllers to generate the PDF's. This gives me much finer-grained control over the folder and naming structures - the issue with this method is simply generating the pages in the correct way. A bonus is that it automatically gives me a "preview"...basiclly just the HTML page!
Office OpenXML is a nice and simple way of generating office files. XSLT's can be strong tool to format your content. This technology will not let you create pdf's.
Fast development without using any third party components will be difficult. But if you do consider using a report server, make sure to check out BIRT or Jasper.
To generate pdf's I have been using the deprecated Report.net. It has many ports to different languages and is still sufficient to make simple pdf's. Report.net on sourceforge
I dont think SQL Server itself can produce pdf files. What you can do is, as you mentioned, install an instance of SSRS and create a report that produces the information you need. Then you can create a subscription to deliver your report to where you want, when you want.
Here is an example of a simple subscription:
Go for SSRS only if you are OK with setting it up on a server and there is a definite need for schedule reporting and complex reports.
If you have code for manual PDF/docx generation, I would suggest to go ahead with it. Hopefully the complexity of its code is not a matter to you.
I have used both in separate scenarios. We used excel classes and objects from .NET for a minimal reporting from a web application.
But went for an elaborate reporting scheme for a system which required 1000s of reports to be generated in a scheduled manner and delivered to selected set of people.
I need to make the information in the database usable by allowing the user to download it as a PDF or Excel spreadsheet (either one works, both is perfect).
I've looked around at a bunch of options, but I really can't decide which one I should use, let alone if any of those options are actually useful. Most of the options I've found revolve around converting already existing HTML files into PDFs which is not what I need. Also, it needs to be free. My bosses haven't given me a budget to spend on this
I'm not sure what other information I should include here.
Well, any help is greatly appreciated. If you have questions about missing information, I'll get it posted ASAP. I'm here all day, so I'll be able to respond to any comments very quickly.
EDIT: Oh wow! Huge thanks, guys, for the massive response! I got a ton of ideas. This is super-helpful. Thanks!
if you want to generate an Excel (or also a Word) you can use openXml. You can create a new document exactly the way you want from pure code.
OpenXml SDK page
The solution I usually propose to my clients in this situation is to use Sql Server Reporting Services (SSRS). You can use the ReportViewer control included with it in order to generate PDF's, Excel spreadsheets, XML files, CSV files, and others. If you need ad hoc reporting, there is a Report Builder available as well.
Barring that, you can use OpenXml to generate Excel spreadsheets and there are a host of PDF toolkits available.
Have you looked into the reportviewer control, which is part of Visual Studio?
It allows you to export the report in PDF or Excel format.
http://www.carlosag.net/tools/excelxmlwriter/sample
check this might be useful for you
There are lots of reporting solutions out there such as SQL Server Reporting Services(for which you might already have a license). Take a look at Reporting (free || open source) Alternatives to Crystal Reports in Winforms which can likely be applied to the web with a bit of serialization.
I would suggest thinking about rolling your own depending on the situation. You could use pdfsharp for the pdf export and EPPlus for excel. They are both very easy to use and, I'm pretty sure, available in nuget with a couple of clicks.
If you want to go the Excel route, i'd recommend this article from Stephen Walther entitled ASP.NET MVC Tip #2 - Create a custom Action Result that returns Microsoft Excel Documents. This uses an old trick of writing an HTML document with an Excel mime type. This is different than streaming a native Excel file. And it's fairly easy to change the to rendering a CSV file if you want to strip it down, and make it a more universal file. Just remember to double-quote all the fields if there's a possibility of commas showing up.
If what your doing isn't too complicated you can use CSV files. CSV stands for comma separated values, and it is what it sounds like. You can create simple tables and columns using commas. For example paste the following lines into a text file:
heading1,heading2,heading3
info1,info2,info3
info1,info2,info3
Save the text file as a .csv file and voila - an excel spreadsheet. Obviously it is extremely easy to build these looping object collections. Mind you if you need any complicated text formatting etc then it is not really the best option.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
In one of our ASP.NET applications in C#, we take a certain data collection (SubSonic collection) and export it to Excel. We also want to import Excel files in a specific format. I'm looking for a library I can use for this purpose.
Requirements:
Excel 2007 files (Does Excel 2003 support over 64k rows? I need more than that.)
Does not require Excel on the server
Takes a typed collection and, if it can, tries to put numeric fields as numeric in Excel.
Works well with large files (100k to 10M) - fast enough.
Doesn't crash when exporting GUIDs!
Does not cost a crapload of money (no enterprise library like aspose). Free is always great, but can be a commercial library.
What library do you recommend? Have you used it for large quantities of data? Are there other solutions?
Right now, I am using a simple tool that generates HTML that is loaded by Excel later on, but I am losing some capabilities, plus Excel complains when we load it. I don't need to generate charts or anything like that, just export raw data.
I am thinking of flat CSV files, but Excel is a customer requirement. I can work with CSV directly, if I had a tool to convert to and from Excel. Given Excel 2007 is an xml-based (and zipped) file format, I am guessing this kind of library should be easy to find. However, what matters most to me are your comments and opinions.
EDIT: Ironically, in my opinion and following the answer with the most votes, the best Excel import&export library is no export at all. This is not the case for all scenarios, but it is for mine. XLS files support only 64k rows. XLSX supports up to 1M. The free libraries that I've tried feature bad performance (one second to load one row when you have 200k rows). I haven't tried the paid ones, as I feel they are overpriced for the value they deliver when all you need is a fast XLSX<->CSV conversion routine.
I'm going to throw my hand in for flat csv files, if only because you've got the greatest control over the code. Just make sure that you read in the rows and process them one at a time (reading the document to the end and splitting will eat up all of your memory - same with writing, stream it out).
Yes, the user will have to save-as CSV in excel before you can process it, but perhaps this limitation can be overcome by training and providing clear instructions on the page?
Finally, when you export to the customer, if you set the mime type to text/csv, Excel is usually mapped to that type so it appears to the user to be 'an Excel file'.
I discovered the Open XML SDK since my original answer. It provides strongly typed classes for spreadsheet objects, among other things, and seems to be fairly easy to work with.
I am going to use it for reports in one of my projects. Alas, version 2.0 is not supposed to get released until late 2009 or 2010.
the last version of ExcelPackage that is free under LGPL for commercial projects is, https://www.nuget.org/packages/EPPlus/4.5.3.3
If you need latest and greatest, Commercial license is available here: https://epplussoftware.com/en/LicenseOverview/
I'm still fighting with the export to excel function since my application should export some data to excel-template 2007
this project seems fine to me, and the developer is very responsive to bugs and issues.
I've been using ClosedXML and it works great!
ClosedXML makes it easier for developers to create Excel 2007/2010
files. It provides a nice object oriented way to manipulate the files
(similar to VBA) without dealing with the hassles of XML Documents. It
can be used by any .NET language like C# and Visual Basic (VB).
SpreadsheetGear for .NET reads and writes CSV / XLS / XLSX and does more.
You can see live ASP.NET samples with C# and VB source code here and download a free trial here.
Of course I think SpreadsheetGear is the best library to import / export Excel workbooks in ASP.NET - but I am biased. You can see what some of our customers say on the right hand side of this page.
Disclaimer: I own SpreadsheetGear LLC
NPOI For Excel 2003
Open Source
http://www.leniel.net/2009/07/creating-excel-spreadsheets-xls-xlsx-c.html
I've used Flexcel in the past and it was great. But this was more for programmatically creating and updating excel worksheets.
CSV export is simple, easy to implement, and fast. There is one potential issue worth noting, though.
Excel (up to 2007) does not preserve leading zeros in CSV files. This will garble ZIP codes, product ids, and other textual data containing numeric values.
There is one trick that will make Excel import the values correctly (using delimiters and prefix values with the = sign, if I remember correctly, e.g. ..,="02052",...).
If you have users who will do post-processing tasks with the CSV, they need to be aware that they need to change the format to XLS and not save the file back to CSV. If they do, leading zeros will be lost for good.
For years, I have used JExcel for this, an excellent open-source Java project. It was also .NET-able by using J# to compile it, and I have also had great success with it in this incarnation. However, recently I needed to migrate the code to native .NET to support a 64-bit IIS application in which I create Excel output. The 32-bit J# version would not load.
The code for CSharpJExcel is LGPL and is available currently at this page, while we prepare to deploy it on the JExcel SourceForge site. It will compile with VS2005 or VS2008. The examples in the original JExcel documentation will pretty well move over intact to the .NET version.
Hope it is helpful to someone out here.
I've worked with excel jetcell for a long time and can really recommend it.
http://www.devtriogroup.com/exceljetcell
Commercial product
Excel files XLS & XLSX
Based on own engine in pure net.
The following site demonstrates how to export a DataTable, DataSet or List<> into a "proper" Excel 2007 .xlsx file (rather than exporting a .csv file, and getting Excel to open it).
It uses the OpenXML libraries, so you don't need to have Excel installed on your server.
Mikes Knowledge Base - ExportToExcel
All of the source code is given, free of charge, aswell as a demo application.
It's very easy to add to your own applications, you just need to call one function, passing in an Excel filename, and your data source:
DataSet ds = CreateSampleData();
string excelFilename = "C:\\Sample.xlsx";
CreateExcelFile.CreateExcelDocument(ds, excelFilename);
Hope this helps.
Check the ExcelPackage project, it uses the Office Open XML file format of Excel 2007, it's lightweight and open source...
I've tried CSharpJExcel and wouldn't recommend it, at least not until there is some documentation available. Contrary to the developers comments it is not a straight native port.
I know this is quite late, but I feel compelled to answer xPorter (writing) and xlReader (reading) from xPortTools.Net. We tested quite a few libraries and nothing came close in the way of performance (I'm talking about writing millions of rows in seconds here). Can't say enough good things about these products!
You can use Microsoft.Jet.OLEDB.4.0
We have just identified a similar need. And I think it's important to consider the user experience.
We nearly got sidetracked along the same:
Prepare/work in spreadsheet file
Save file
Import file
Work with data in system
... workflow
Add-in Express allows you to create a button within Excel without all that tedious mucking about with VSTO. Then the workflow becomes:
Prepare/work in spreadsheet file
Import file (using button inside Excel)
Work with data in system
Have the code behind the button use the "native" Excel API (via Add-in Express) and push direct into the recipient system. You can't get much more transparent for the developer or the user. Worth considering.
There's a pretty good article and library on CodeProject by Yogesh Jagota:
Excel XML Import-Export Library
I've used it to export data from SQL queries and other data sources to Excel - works just fine for me.
Cheers
You could try the following library, it is easy enough and it is just a light wrapper over Microsoft's Open XML SDK (you can even reuse formatting, styles and even entire worksheets from secondary Excel file) :
http://officehelper.codeplex.com
Spreadsheetgear is the best commercial library we have found and are using. Our company does a lot of advanced excel import and export and Spreadsheetgear supports lots of advanced excel features far beyond anything you can do with simple CSV, and it's fast. It isn't free or very cheap though but worth it because the support is excellent. The developers will actually respond to you if you run into an issue.
How about the apache POI java library. I havent used it for Excel , but did use it for Word 2007.