I have an excel sheet which is embeded with PDF files. Is there any way to read that embeded PDF files from excel work sheet and saving in to data base using C#.
I believe that u can find some classes in Microsoft.Office.Interop.Excel namespace which can help to extract object embedded into Excel sheet.
for example WorksheetClass class
Related
Updated - I am working on retrieving data from a large number of Excel workbooks using C#. There are some important pdf documents that are embedded in the workbooks. I need to save them as individual document for further processing.
I am able to loop through all oleObject in all worksheets and find all pdfs.
I used progID in DocumentFormat.OpenXml.Spreadsheet to identify the pdfs
https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.oleobjects?view=openxml-2.8.1
foreach(Worksheet ws in xlWb.Worksheets)
{
foreach (OLEObject ole in ws.OLEObjects())
{
//identify whether the oleObject is of AcroExch class type
if(ole.progID == "AcroExch.Document.DC")
{
//2. Cast oleObject to AcroExch and save it as a pdf separately
}
}
}
From what I gathered online, using acrobat dc sdk seems to be the only option.
Is there any other way to achieve what I want?
Thanks
To extract embedded pdf and save to pdf, please refer to this solution provided by GemBox Dev Team:
How to download embedded PDF files in an excel worksheet?
I was developing an application which read data from an excel file, but when I try to open it, an exception was thrown if the source file is saved with the xls format (File contains corrupted data error when opening Excel sheet with OpenXML). indeed when I save this file with the xlsx format it works fine. please help me to solve this problem.
Use Free Spire.XLS dll available via NuGet.
Sample:
Workbook workbook = new Workbook();
workbook.LoadFromFile("Input.xls");
workbook.SaveToFile("Output.xlsx", ExcelVersion.Version2013);
For reliably reading XLS files you could use ExcelDataReader which is a lightweight and fast library written in C# for reading Microsoft Excel files. It supports the import of Excel files all the way back to version 2.0 of Excel (released in 1987!)
Alternatively you could use a file conversion API like Zamzar. This service has been around for 10+ years, and provides a simple REST API for file conversion - it supports XLS to XLSX conversion. You can use it in C# and it has extra features like allowing you to import and export files to and from Amazon S3, FTP servers etc.
Full disclosure: I'm the lead developer for the Zamzar API.
You cannot read xls files with OpenXML.
The solution from Microsoft is to read the xls file with Office Interop (but Interop is not recommended to be used on the server), transfer data from Interop step by step to OpenXML.
Another solution is to use an Excel library like EasyXLS and convert between these two Excel file formats:
ExcelDocument workbook = new ExcelDocument();
workbook.easy_LoadXLSFile("Excel.xls");
workbook.easy_WriteXLSXFile("Excel.xlsx");
Find more information about converting xls to xlsx.
I am not quite sure why you need to convert the file and why you don't just read the xls file, using a different technology then OpenXML, for sure.
XLS is the older Excel file format. XSLX is the newer format stored as OpenXML. XSLX is actually a zip file with the various components stored as files within it. You cannot simply rename the file to get it into the new format. To save the file in XSLX you'll have to save the file into the Excel 2010+ format.
If you're using Excel interop then it is an option on the SaveAs method.
for more info check the function: _Workbook.SaveAs Method
and the property: FileFormat:
Optional Object.
The file format to use when you save the file. For a list of valid choices,
see the FileFormat property. For an existing file, the default format is the
last file format specified; for a new file, the default is the format of the
version of Excel being used.
msdn info here:
https://msdn.microsoft.com/en-us/library/microsoft.office.interop.excel._workbook.saveas(v=office.11).aspx
I found this article:
How can you programmatically import XML data into an Excel file?
which shows how to import an XML file into Excel.
My question is how do I import multiple XML files into multiple sheets in one workbook?
The "OpenXML()" method appears to relate to the Workbooks collection only,
and creates a new Workbook...
Thanks
Use XmlMaps instead of OpenXML and then create new sheets:
Initialize Excel interop
loop...
{
Excel.Worksheet newWorksheet;
newWorksheet = (Excel.Worksheet)Globals.ThisWorkbook.Worksheets.Add();
newWorksheet.Select();
' Run import code
}
We are using EPPlus.dll to generate Excel file. Using EPPlus.dll you can write Excel file in native format.
Following URL will definetly help you to wirte Excel file. But you have to write logic to read XML file and put in Excel workbook.
http://www.jimmycollins.org/blog/?p=547
https://epplus.codeplex.com/
I know about Word Automation Services, where I can start a ConversionJob which enable you to convert a .docx file to e.g. PDF or .doc.
I thought these services also allow the conversion of XLSX to XLS file - I was wrong. When looking at the SaveFormat Enumaration it only supports Word. Excel Automation Services don't seem to have such a conversion job?
How do I convert an XLS file to XLSX without using Excel automation (i.e. without having Excel installed on the server)?
EDIT:
In the end I used Aspose Cells for the conversion.
You might try ExcelLibrary or EPPlus those libraries allow you to write excel files without using excel COM object.
You may read cell by cell and create a new worksheet copied from the other one (copy cell by cell)
I'm note sure that you can do it (converting XLS with is a row MS Excel file without having either Excel Services or MS Excel installed [Using InterOp] to XLSX which an OpenXML format)!!
If you want a way to convert it on a PC which MS Excel install check this link out http://devville.net/blog/2011/02/05/how-to-convert-rtf-document-to-doc-using-c/
But if found a way I would be happy if u share it to use.
I have html table in Microsoft.Office.Interop.Outlook.MailItem body and I just need to fill excel sheet with this table using C# for desktop application. Could any one help me in this regard. Thanks
A quick and dirty way:
File.WriteAllText(#"C:\Temp\Table.xls", mailItem.Body);
Excel will open it even though the file does not contain a valid xls document
You can also use HtmlAgilityPack to parse the e-mail and EPPlus to write to Excel (only 2007/2010 xlsx version).