Architecture for creating PDF files from Excel reports - c#

I have 1 routine that is structured like this:
C# console C opens xl workbook A
C then runs A's macro M which saves the target worksheet as a PDF using VBA
C then uses PDFsharp to encrypt the PDF file.
C then emails this file.
Currently this procedure is for one report so no problem if the architecture isn't textbook.
I imagine in the future there may be many target worksheets in many different workbooks all going to lots of different recipients. If this is the case then Step 2 will need to go as I will not want to have to copy this VBA code into every target workbook! The only alternative I can imagine as my experience is limited is the following:
Take the current VBA code out of Excel and move it into C using a reference to Excel.Interops
Assuming that the target worksheets are the finished article i.e. no further manipulation is required before going to PDF is the above the correct approach for moving this step out of VBA and into the console, or should I create the PDF using a different library?

The least-effort option is moving the code from the worksheet into a VBA add-in by deleting all the sheets/data out and then use 'save as' to turn the workbook into an add-in which you can then load into excel and will be available all the time. Depending on how you've written it some changes may be required but they won't be as big as a port to c#.
That said, having the code all in one place will make the whole process easier to look after in the future. Plus you've already got C# code automating excel to fire the VBA so it may be better to do it now if you have time.

Related

Prompted to Save Changes on file created with EPPlus

I am creating a series of Excel Workbooks using EPPlus v3.1.3. When I open the newly created files, if I close it without touching anything it asks me if I want to save my changes. The only thing I've noticed changes if I say "yes" is that the app.xml file is slightly altered - there is no visible difference in the workbook, and the rest of the XML files are the same. I have tried both of these approaches:
ExcelPackage p = new ExcelPackage(new FileInfo(filename));
p.Save();
as well as
ExcelPackage p = new ExcelPackage();
p.SaveAs(new FileInfo(filename));
and both have the same problem. Is there a way to have the app.xml file output in its final form?
The reason this is an issue is because we use a SAS program to QC, and when the SAS program opens the files as they have been directly output from the EPPlus program it doesn't pick up the values from cells that have formulas in them. If it is opened and "yes" is chosen for "do you want to save changes", it works fine. However, as we are creating several hundred of these, that is not practical.
Also, I am using a template. The template appears normal.
What is particularly strange is that we have been using this system for well over a year, and this is the first time we have encountered this issue.
Is there any way around this? On either the C# or SAS side?
What you are seeing is not unusual actually. Epplus does not actually generate a full XLSX file - rather it creates the raw XML content (all office 2007 document formats are xml-based) and places it in the zip file which is renamed to XLSX. Since it has not been ran through the Excel engine it has not be fully formatted to excels liking.
If it is a simple data sheet then chances are Excel does not have to do much calculation - just basic formatting. So in that case it will not prompt you to save. But even then if you do you will see it change the XLSX file a little. If you really want to see what it is doing behind the scenes rename the file to .zip and look at the xml files inside before and after.
The problem you are running in to is because it is not just a simple table export Excel has to run calculations when opened for the first time. This could be many things - formulas, autofilters, auto column/row height adustments, outlining, etc. Basically, anything that will make the sheet look a little "different" after excel gets done with it.
Unfortunately, there is no easy fix for this. Running it through excel's DOM somehow would be simplest which of course defeats the purpose of using EPPlus. The other thing you could do is see the difference between the before and after of the xml files (and there are a bunch in there you would have to look at) and mimic what excel would change/add in the "after" file version by manually editing the XML content. This is not a very pretty option depending on how extensive the changes would be. You can see how I have done it in other situations here:
Create Pivot Table Filters With EPPLUS
Adding a specific autofilter on a column
Set Gridline Color Using EPPlus?
I ran into this same issue using EPPlus (version 4.1.0, fyi) and found adding the following code before closing fixed the problem:
p.Workbook.Calculate();
p.Workbook.FullCalcOnLoad = false;

Strip Excel file of Macros with C#

I've been asked to strip an Excel file of macros, leaving only the data. I've been asked to do this by converting the Excel file to XML and then reading that file back into Excel using C#. This seems a bit inefficient to me and I was thinking that it would be easier to simply load the source Excel file into C# and then create a new target Excel file and add the sheets from the source back into the target.
I don't know where macros live inside an Excel file, so I'm not sure if this would accomplish the task or not. So, will this work? Will simply copying the sheets from one file to another strip it of it's macros or are they actually stored at the worksheet level?
As always, any and all suggestions are welcome, including alternate suggestions or even "why are you even doing this???". :)
To do this programmatically, you can use the ZipFile class from the System.IO.Compression library in .NET from C#. (.NET Framework 4.5)
Rename the file to add a ".zip" extension, and then open the file as a ZIP archive. Look for an element in the resultant "xl" folder called "vbproject.bin", and delete it. Remove the .zip extension. Macros gone.
Your best bet is to save the workbook as an xlsx, close it, open it, then save as a format of your choice.
This will strip the macros and is robust. It will also work if the VBA is locked for viewing.
Closing and reopening the workbook is necessary otherwise the macros are retained.
If you're needing to use C# to do this, I agree that it would be easier to load the source Excel file into C# and create a new target file only copying over the cells and sheets you need. Especially if you're doing this for a large amount of excel files I would recommend just creating a small console app that, when given an excel sheet, will automatically generate a new excel sheet with just the data for you.
One tool that I've found extremely useful and easy to use for such tasks is EPPlus.

Merging two excel files in c# without using interop

I have to merge two excel files containing one sheet in each of them and I have to generate a third file containing two sheets corresponding to the two original sheets.
This task can be done using "interop" and the code works but when the same code is run in a system that does not contain MS Office, the process fails and an error comes up.
Can you please guide me as to what dll files to be included or how this merging could be done without using interop?
Thanks in advance.
From what I've experienced, there is unfortunately no framework way of doing this (without writing your own excel file reader). I happened across this interesting library which does just that.
http://exceldatareader.codeplex.com/
So far it has worked for our needs and requires no interop.
You should use an external component to work with excel files. I use the syncfusion xslIo.
If you only have raw data (no formulas, etc) you could also just save the files using the XML Spreadsheet 2003 (*.xml) format (its very easy to read) and process the data using standard XML tools.

Access the VBA of a spreadsheet from within another sheet's VBA (or C# console app)?

I have a batch of spreadsheets with lots of VBA code and I wanted to write something which could check for function dependencies. I am changing code and need to find out which files depend on functions I am modifying.
Is there a way to access the VBA modules/functions of spreadsheet X either from a C# console application, or from VBA in spreadsheet Y (just a different spreadsheet to X)?
I want to access the VBA as text/anything parsable.
This page has examples of reading and writing code.
The Reference you need to add is this one:
The two sections you will be most interested in are:
Listing All Procedures In A Module
Reading A Procedure Declaration

excel processing in c#

I am developing a job application. Each job generates an excel file. I will have 50 parallel job. So 50 excel files will be generated parallelly.
I am using C#3.5 and Excel 2003. The problem is I am unable to instantiate more Excel objects. I am getting COMException. So do I need to create excel processing only one at a time? Do you have any solution for this?
Pls help me.
Edit:
The excel genration doesn't need user interaction. Its a scheduled job.
I am generating Excel (xls). I need to do formatting and coloring in excel, so I can't use csv. Now I have synchronized the code, so at a time only one job processes Excel. But its taking too much time, since only one excel processing at a time.
Any kind of Excel pooling logic will help? Please direct me.
To do this you would need to start 50 instances of Excel, that would not work.
You have 3 options:
Use Open XML SDK 2.0 for Microsoft Office. This allows you to write to an excel file as if it was an xml file. No need to start Excel.
Use SharePoint Excel Services. This allows you to do server side Excel processing. No need to start Excel. The problem with this is that the SharePoint version that includes Excel Services is expensive.
Process the files one at a time, as you suggested.
ooooh i once have that kind of problem b4.
my solution is i use html excel. honestly, its kinda stupid solution, but it work pretty well and very easy to implement.
1 create html template a bit like this
<html>
<body>
name - <div>$<name>$</div>
</body>
</html>
2 read your template as string
System.IO.StreamReader stream = new System.IO.StreamReader("");
string template = stream.ReadToEnd();
template = template.Replace("$<name>$", "John");
3 then save your string as .xls
**to create a template sheets is very easy.
1 create excel template like u normally do in excel
2 in the cell u want to replace yout value type in $$name$$ *note that above i use $<'value'>$. but if we do this way i suggest u cahnge to $$name$$ becus for < and > excel gonna do HtmlEncode for us
3 u can create many sheets as u like
4 'Save as' .mht
5 then later change file.mht to .xls and string.Replace("$$name$$", "John");
Does the generation of 50 excel files require user interaction?
If not, you can create a nightly job that generates files.
If it is a plain text kind of a format(or CSV), you don't need Excel.
Also, you can use one excel instance to generate all the files, one after the other.

Categories