Updated - I am working on retrieving data from a large number of Excel workbooks using C#. There are some important pdf documents that are embedded in the workbooks. I need to save them as individual document for further processing.
I am able to loop through all oleObject in all worksheets and find all pdfs.
I used progID in DocumentFormat.OpenXml.Spreadsheet to identify the pdfs
https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.oleobjects?view=openxml-2.8.1
foreach(Worksheet ws in xlWb.Worksheets)
{
foreach (OLEObject ole in ws.OLEObjects())
{
//identify whether the oleObject is of AcroExch class type
if(ole.progID == "AcroExch.Document.DC")
{
//2. Cast oleObject to AcroExch and save it as a pdf separately
}
}
}
From what I gathered online, using acrobat dc sdk seems to be the only option.
Is there any other way to achieve what I want?
Thanks
To extract embedded pdf and save to pdf, please refer to this solution provided by GemBox Dev Team:
How to download embedded PDF files in an excel worksheet?
Related
I was developing an application which read data from an excel file, but when I try to open it, an exception was thrown if the source file is saved with the xls format (File contains corrupted data error when opening Excel sheet with OpenXML). indeed when I save this file with the xlsx format it works fine. please help me to solve this problem.
Use Free Spire.XLS dll available via NuGet.
Sample:
Workbook workbook = new Workbook();
workbook.LoadFromFile("Input.xls");
workbook.SaveToFile("Output.xlsx", ExcelVersion.Version2013);
For reliably reading XLS files you could use ExcelDataReader which is a lightweight and fast library written in C# for reading Microsoft Excel files. It supports the import of Excel files all the way back to version 2.0 of Excel (released in 1987!)
Alternatively you could use a file conversion API like Zamzar. This service has been around for 10+ years, and provides a simple REST API for file conversion - it supports XLS to XLSX conversion. You can use it in C# and it has extra features like allowing you to import and export files to and from Amazon S3, FTP servers etc.
Full disclosure: I'm the lead developer for the Zamzar API.
You cannot read xls files with OpenXML.
The solution from Microsoft is to read the xls file with Office Interop (but Interop is not recommended to be used on the server), transfer data from Interop step by step to OpenXML.
Another solution is to use an Excel library like EasyXLS and convert between these two Excel file formats:
ExcelDocument workbook = new ExcelDocument();
workbook.easy_LoadXLSFile("Excel.xls");
workbook.easy_WriteXLSXFile("Excel.xlsx");
Find more information about converting xls to xlsx.
I am not quite sure why you need to convert the file and why you don't just read the xls file, using a different technology then OpenXML, for sure.
XLS is the older Excel file format. XSLX is the newer format stored as OpenXML. XSLX is actually a zip file with the various components stored as files within it. You cannot simply rename the file to get it into the new format. To save the file in XSLX you'll have to save the file into the Excel 2010+ format.
If you're using Excel interop then it is an option on the SaveAs method.
for more info check the function: _Workbook.SaveAs Method
and the property: FileFormat:
Optional Object.
The file format to use when you save the file. For a list of valid choices,
see the FileFormat property. For an existing file, the default format is the
last file format specified; for a new file, the default is the format of the
version of Excel being used.
msdn info here:
https://msdn.microsoft.com/en-us/library/microsoft.office.interop.excel._workbook.saveas(v=office.11).aspx
I have an excel sheet which is embeded with PDF files. Is there any way to read that embeded PDF files from excel work sheet and saving in to data base using C#.
I believe that u can find some classes in Microsoft.Office.Interop.Excel namespace which can help to extract object embedded into Excel sheet.
for example WorksheetClass class
I'm generating a CSV file from the following code
public ActionResult Index()
{
var csv = "मानक हिन्दी;some other value";
var data = Encoding.UTF8.GetBytes(csv);
data = Encoding.UTF8.GetPreamble().Concat(data).ToArray();
var cd = new ContentDisposition
{
Inline = false,
FileName = "newExcelSheet.csv"
};
Response.AddHeader("Content-Disposition", cd.ToString());
return File(data, "text/csv");
}
Now I wish to insert Image in the top row of the excel, Please assist me in the following problem
Thanks :)
CSV is not a format capable of including binary data such as images. The only thing you can include in a CSV file is text.
If you need to add an image to an excel document you would have to use a proper excel file (i.e. a .xls or .xlsx file). There are various APIs that you can use to write to such files, including the Excel Object Model exposed through COM when you have Office installed.
See this question for details on how to insert images through COM.
You can't do it without using the interop assembly. You either go that route or download epplus, a free Excel .Net library that supports what you need.
Code examples on the website:
http://epplus.codeplex.com/
CSV doesn't support what you ask for AND Interop is officially NOT supported by MS in server-scenarios (like ASP.NET...).
You will need to create "real" Excel files (XLS or XLSX) - some options to create Excel files:
MS provides the free OpenXML SDK V 2.0 - see http://msdn.microsoft.com/en-us/library/bb448854%28office.14%29.aspx
This can read+write MS Office files (including Excel XLSX but not XLS!).
Another option see http://www.codeproject.com/KB/office/OpenXML.aspx
IF you need more like rendering, formulas etc. then there are different free and commercial libraries like ClosedXML, EPPlus, Aspose.Cells, SpreadsheetGear, LibXL and Flexcel.
I have a dropdownlist in my application with items document, excel and powerpoint. When i click on each of them say for example excel, an excel sheet should open and I write in it and I should be able to save the file in gridview. How can I do that?
Take a look at the NPOI project. From the website:
[N]POI is an open source project which can help you read/write xls,
doc, ppt files. It has a wide application. For example, you can use it
to generate a Excel report without Microsoft Office suite installed on
your server and more efficient than call Microsoft Excel ActiveX at
background; you can also use it to extract text from Office documents
to help you implement full-text indexing feature (most of time this
feature is used to create search engines).
Once you've written a file, you can present it for download in your GridView. Is that what you're after?
In the past, I have created a component to pass and retrieve values to/from excel using the excel libraries. The good thing is that once you have your workbook in memory and modify a cell (let's call it the origin cell) all the other cells with formulas that take this origin cell value are automatically refreshed.
Is this possible in OpenXml?
As far as I see, apparently this doesn't happen in OpenXml because the excel engine is not really executed in the background, OpenXml is just a group of classes to serialize, deserialize, read etc xml files right?
That's correct, Office Open XML SDK is just a set of libraries to read/write XML files. It does not have any functionality for performing calculations.
You can specify that Excel should recalculate everything upon load by setting the following attribute, but if you need to read the new values in code (prior to re-opening in Excel) this won't help.
<workbook>
<calcPr fullCalcOnLoad="1"/>
</workbook>
Or in code with the Office Open XML SDK..
using (var doc = SpreadsheetDocument.Open(path, false))
{
doc.WorkbookPart.Workbook.CalculationProperties.FullCalculationOnLoad = true;
.
.
.
}