I need some guidance here.
I am reading a excel file using a StreamReader , then get the file to a string using the StreamReader.ReadToEnd(); method. Then I write the string to a different location on the file system using a StreamWriter.Write() method.
Then I re-read file from the location I wrote it earlier. However it seems I am reading some garbage values and I can't open the excel file from the new location...
Am I doing something wrong here to file to get corrupted ? Am I missing something to do with encoding here ?
Excel files are binary. StreamReader is a kind of TextReader, and StreamWriter is a kind of TextWriter.
Binary and Text - not the same thing.
Depending on what Excel format you are using, you will find it very painful to read/write directly. Libraries such as NPOI make this much easier.
The version that reads/writes xlsx files is still in beta, but stable in my use. If you need xlsx format files, download from their site, instead of via NuGet. NPOI on GitHub
Related
Basically need to identify the excel is originally created from a valid excel. I have tried using Microsoft.Interop.Excel File format.
Seems like I cannot use this without Microsoft office being installed. So need another approach for this problem.
EDIT :
Basically I want to be able to distinguish a valid excel from invalid one like an excel file which is converted from DLL. Because the file extension will say it is xls. If you try to open that file, it will open an empty workbook without any sheets. But I cannot decide an excel with no sheets is an invalid one.
If you want to idenify wheter file is a valid excel file or not, you can use 'trid file identifier'. TrID - File Identifier.
You can do it own your own also by reading magic number or file signaure bytes. Please go through this question Original file bytes from StreamReader, magic number detection.
I was developing an application which read data from an excel file, but when I try to open it, an exception was thrown if the source file is saved with the xls format (File contains corrupted data error when opening Excel sheet with OpenXML). indeed when I save this file with the xlsx format it works fine. please help me to solve this problem.
Use Free Spire.XLS dll available via NuGet.
Sample:
Workbook workbook = new Workbook();
workbook.LoadFromFile("Input.xls");
workbook.SaveToFile("Output.xlsx", ExcelVersion.Version2013);
For reliably reading XLS files you could use ExcelDataReader which is a lightweight and fast library written in C# for reading Microsoft Excel files. It supports the import of Excel files all the way back to version 2.0 of Excel (released in 1987!)
Alternatively you could use a file conversion API like Zamzar. This service has been around for 10+ years, and provides a simple REST API for file conversion - it supports XLS to XLSX conversion. You can use it in C# and it has extra features like allowing you to import and export files to and from Amazon S3, FTP servers etc.
Full disclosure: I'm the lead developer for the Zamzar API.
You cannot read xls files with OpenXML.
The solution from Microsoft is to read the xls file with Office Interop (but Interop is not recommended to be used on the server), transfer data from Interop step by step to OpenXML.
Another solution is to use an Excel library like EasyXLS and convert between these two Excel file formats:
ExcelDocument workbook = new ExcelDocument();
workbook.easy_LoadXLSFile("Excel.xls");
workbook.easy_WriteXLSXFile("Excel.xlsx");
Find more information about converting xls to xlsx.
I am not quite sure why you need to convert the file and why you don't just read the xls file, using a different technology then OpenXML, for sure.
XLS is the older Excel file format. XSLX is the newer format stored as OpenXML. XSLX is actually a zip file with the various components stored as files within it. You cannot simply rename the file to get it into the new format. To save the file in XSLX you'll have to save the file into the Excel 2010+ format.
If you're using Excel interop then it is an option on the SaveAs method.
for more info check the function: _Workbook.SaveAs Method
and the property: FileFormat:
Optional Object.
The file format to use when you save the file. For a list of valid choices,
see the FileFormat property. For an existing file, the default format is the
last file format specified; for a new file, the default is the format of the
version of Excel being used.
msdn info here:
https://msdn.microsoft.com/en-us/library/microsoft.office.interop.excel._workbook.saveas(v=office.11).aspx
I've been asked to strip an Excel file of macros, leaving only the data. I've been asked to do this by converting the Excel file to XML and then reading that file back into Excel using C#. This seems a bit inefficient to me and I was thinking that it would be easier to simply load the source Excel file into C# and then create a new target Excel file and add the sheets from the source back into the target.
I don't know where macros live inside an Excel file, so I'm not sure if this would accomplish the task or not. So, will this work? Will simply copying the sheets from one file to another strip it of it's macros or are they actually stored at the worksheet level?
As always, any and all suggestions are welcome, including alternate suggestions or even "why are you even doing this???". :)
To do this programmatically, you can use the ZipFile class from the System.IO.Compression library in .NET from C#. (.NET Framework 4.5)
Rename the file to add a ".zip" extension, and then open the file as a ZIP archive. Look for an element in the resultant "xl" folder called "vbproject.bin", and delete it. Remove the .zip extension. Macros gone.
Your best bet is to save the workbook as an xlsx, close it, open it, then save as a format of your choice.
This will strip the macros and is robust. It will also work if the VBA is locked for viewing.
Closing and reopening the workbook is necessary otherwise the macros are retained.
If you're needing to use C# to do this, I agree that it would be easier to load the source Excel file into C# and create a new target file only copying over the cells and sheets you need. Especially if you're doing this for a large amount of excel files I would recommend just creating a small console app that, when given an excel sheet, will automatically generate a new excel sheet with just the data for you.
One tool that I've found extremely useful and easy to use for such tasks is EPPlus.
I need to get data from the database and send it to the FTP server in XLS format.
I have done the same for XML using memory stream. But, I am not sure how to format and load XLS to memory stream.
Any idea or approach? Sample code always helps.
Please see this answer, which uses the Excel OpenXML format. The fastest way to get an Excel without any external library.
Export DataTable to Excel with Open Xml SDK in c#
I misunderstood your question. Read file to stream is standard for all files.
new MemoryStream(File.ReadAllBytes(inputFilename))
I have an xls file sitting in a byte[] as a result of a file upload on my asp.net web application. Is there a library that can read in and process the xls file as a byte[]? I do not want to save the file to disk.
All I need to do is be able to read the cell contents (I would prefer to accept csv file if I had the choice).
I discovered SpreadsheetGear which claims to do this, but I would rather not pay $1000 for software that does way more than I need it to.
Note that I am referring to XLS file and not XLSX file, but I would appreciate advice on both.
You may checkout excellibrary. And if you are dealing with OpenXML (.xlsx) you may checkout the Open XML SDK.
EPPlus is also a solid library for working with Excel files. It has some samples that will show how to interact with a file from a MemoryStream.
http://epplus.codeplex.com/
NOPI has a really good library and it picks up where EPPlus leaves off. http://npoi.codeplex.com/
Your reference to XLS suggests the older Excel 97 format, which in that case you can use the ExcelWorkbook / ExcelWorksheet reader code provided as part of the Tarantino project at the Tarantino Bitbucket Repository
You can pass your XLS in memory as a stream and the helper methods will return a DataSet with workbook data and Tables representing Sheets. You do not need the entire Tarantino project code and can simply grab:
ExcelWorkbookReader.cs
ExcelWorksheetReader.cs
IExcelWorkbookReader.cs
IExcelWorksheetReader.cs
and add these files to your solution.
Using the interface is simple:
[HttpPost]
public ActionResult Uploadfile(HttpPostedFileBase file)
{
var reader = new ExcelWorkbookReader();
var data = reader.GetWorkbookData(file.InputStream);
// Do something with the data here
return RedirectToAction("List");
}
You can read a .xls content without Excel library using ADO.NET and OLEDB driver. But the worksheet must be in "table" format. If this is true, its works fine.
The connection string should be something like this:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\MyExcel.xls;Extended Properties="Excel 8.0;HDR=Yes;IMEX=1";
Regards.