I have been handed a critical macro that takes an old school file full of invoices which thankfully is quite consistent. The macro reads this file, moves the data around to make it consistant and then generates a three tab speadsheet which is pretty much three CSV's. It then generates off these three CSV's another speadsheet which has a tab for each invoice. The amount of invoices can really vary.
It works, everyone is happy. We would like to put this out on the web with some security. For now, have it so that the user:
1) Logs in, uploads the old school file and presses process which will then spit out the same speadsheet with each tab being an invoice.
2) Store the data in a database for future growth and use of this data, as well as reporting.
I'm teaching myself ASP.NET and C# and think this would be a great learning project. Before I jump into it, can this realistically be done and what would others recommend in this case? Should I simply re-write based off the logic in the macro or is there a way to port over existing VBA code?
You can do it with an Excel COM API. But this tends to lead to memory leaks, I would not recomend it.
Microsoft has Excel Services which allow you to run Excel Spreadsheets on the server. But it is very expensive and may not support Macros.
SpreadSheetGear may be able to do it. But I have not tested it myself.
I would recommend that you rewrite the application in C#, you would get a better solution, and it may not take you any longer than getting the spreadsheet running on the server.
Using the Excel COM API from a web application is difficult. There are security issues which are non-trivial to address. If you wanted to retain the excel processing then you could build some sort of an out of band process which monitors an upload directory and, when it detects a new file, kicks off a process of transforming the excel file as the old macro use to.
There is no easy transition from VBA to C# since all the VBA code assume the existence of excel which may not be the case. However you can call macros in workbooks using the COM API.
Driving Excel from C# is surprisingly hard to get 100% right. Conversely, driving Excel from a VB6 application is surprisingly easy. But, calling this from a web application makes it harder, since you need to deal both with security and concurrency (2 users at once will trip over each other).
Microsoft don't support the use of Excel on the server (apart from Excel Services), so don't expect any help there. SpreadsheetGear is suited to this, but you'd have to pay for it.
You say this would make a good learning project - I'd disagree; it's likely to put you off programming altogether. This particular mix doesn't have a "nice" solution - it's a case of finding the least-unpleasant hack. If you want to learn ASP.NET & C#, I'd say find another pet project.
Related
A client wants to "Web-enable" a spreadsheet calculation -- the user to specify the values of certain cells, then show them the resulting values in other cells.
(They do NOT want to show the user a "spreadsheet-like" interface. This is not a UI question.)
They have a huge spreadsheet with lots of calculations over many, many sheets. But, in the end, only two things matter -- (1) you put numbers in a couple cells on one sheet, and (2) you get corresponding numbers off a couple cells in another sheet. The rest of it is a black box.
I want to present a UI to the user to enter the numbers they want, then I'd like to programatically open the Excel file, set the numbers, tell it to re-calc, and read the result out.
Is this possible/advisable? Is there a commercial component that makes this easier? Are their pitfalls I'm not considering?
(I know I can use Office Automation to do this, but I know it's not recommended to do that server-side, since it tries to run in the context of a user, etc.)
A lot of people are saying I need to recreate the formulas in code. However, this would be staggeringly complex.
It is possible, but not advisable (and officially unsupported).
You can interact with Excel through COM or the .NET Primary Interop Assemblies, but this is meant to be a client-side process.
On the server side, no display or desktop is available and any unexpected dialog boxes (for example) will make your web app hang – your app will behave flaky.
Also, attaching an Excel process to each request isn't exactly a low-resource approach.
Working out the black box and re-implementing it in a proper programming language is clearly the better (as in "more reliable and faster") option.
Related reading: KB257757: Considerations for server-side Automation of Office
You definitely don't want to be using interop on the server side, it's bad enough using it as a kludge on the client side.
I can see two options:
Figure out the spreadsheet logic. This may benefit you in the long term by making the business logic a known quantity, and in the short term you may find that there are actually bugs in the spreadsheet (I have encountered tons of monster spreadsheets used for years that turn out to have simple bugs in them - everyone just assumed the answers must be right)
Evaluate SpreadSheetGear.NET, which is basically a replacement for interop that does it all without Excel (it replicates a huge chunk of Excel's non-visual logic and IO in .NET)
Although this is certainly possible using ASP.NET, it's very inadvisable. It's un-scalable and prone to concurrency errors.
Your best bet is to analyze the spreadsheet calculations and duplicate them. Now, granted, your business is not going to like the time it takes to do this, but it will (presumably) give them a more usable system.
Alternatively, you can simply serve up the spreadsheet to users from your website, in which case you do almost nothing.
Edit: If your stakeholders really insist on using Excel server-side, I suggest you take a good hard look at Excel Services as #John Saunders suggests. It may not get you everything you want, but it'll get you quite a bit, and should solve some of the issues you'll end up with trying to do it server-side with ASP.NET.
That's not to say that it's a panacea; your mileage will certainly vary. And Sharepoint isn't exactly cheap to buy or maintain. In fact, short-term costs could easily be dwarfed by long-term costs if you go the Sharepoint route--but it might the best option to fit a requirement.
I still suggest you push back in favor of coding all of your logic in a separate .NET module. That way you can use it both server-side and client-side. Excel can easily pass calculations to a COM object, and you can very easily publish your .NET library as COM objects. In the end, you'd have a much more maintainable and usable architecture.
Neglecting the discussion whether it makes sense to manipulate an excel sheet on the server-side, one way to perform this would probably look like adopting the
Microsoft.Office.Interop.Excel.dll
Using this library, you can tell Excel to open a Spreadsheet, change and read the contents from .NET. I have used the library in a WinForm application, and I guess that it can also be used from ASP.NET.
Still, consider the concurrency problems already mentioned... However, if the sheet is accessed unfrequently, why not...
The simplest way to do this might be to:
Upload the Excel workbook to Google Docs -- this is very clean, in my experience
Use the Google Spreadsheets Data API to update the data and return the numbers.
Here's a link to get you started on this, if you want to go that direction:
http://code.google.com/apis/spreadsheets/overview.html
Let me be more adamant than others have been: do not use Excel server-side. It is intended to be used as a desktop application, meaning it is not intended to be used from random different threads, possibly multiple threads at a time. You're better off writing your own spreadsheet than trying to use Excel (or any other Office desktop product) form a server.
This is one of the reasons that Excel Services exists. A quick search on MSDN turned up this link: http://blogs.msdn.com/excel/archive/category/11361.aspx. That's a category list, so contains a list of blog posts on the subject. See also Microsoft.Office.Excel.Server.WebServices Namespace.
It sounds like you're talking that the user has the spreadsheet open on their local system, and you want a web site to manipulate that local spreadsheet?
If that's the case, you can't really do that. Even Office automation won't help, unless you want to require them to upload the sheet to the server and download a new altered version.
What you can do is create a web service to do the calculations and add some vba or vsto code to the Excel sheet to talk to that service.
I have a program, which watches a folder on the server. when new files (flat file) come in, the program (C#) read data, bulk insert into the table. it works fine.
Now, we extend the system. It means the data files could be in different formats (flat file, csv, txt, excel..), or with different columns (we need map them to the columns in the table).
my question is: is C# the best choice for this? or, SSIS is a better choice?
Thanks
I wouldn’t necessarily choose one or the other but choose depending on the file type and the amount of processing. For some file types its probably easier to go with C# and for some other SSIS works better.
Do you have someone on your team who is good with SSIS? It’s much easier to find a C# dev to do the job for you than to find someone who knows SSIS.
How likely is that requirements/formats are going to be updated in the future? That’s also important thing to keep in mind.
I do agree with what others said that SSIS is more powerful and offers support for more complex transformations but the questions is do you really need this?
It's depends on your context. Different format should not decision go to SSIS. With solution C# program: you can continue go with it because it run stable before. Easy to deployment, specific into your domain, easy to configuration as well.
With solution SSIS: The configuration more complicate required developer has deep knowledge into SSIS. The administration fee required more than C# program. However it easy to visual (has diagram for you see the flow integration more easier).
From my viewpoint, if the integration process does not required complicated about business rule you should go with C# program. Otherwise, SSIS more powerful if integration process required rules complicated. Hope this help.
In C# application I guess you are using the SqlbulkCopy component and compared to SSIS its not that powerful. So if your data size becomes huge,then C# application will become slower.
If you are familiar with SSIS,my suggestion is to go with SSIS. In SSIS,you can implement end-to-end solution as you have developed in C#,right from checking the files in a specific folder to loading the data into database.
My friend said he was going to create an application inside of Excel. I told him that maybe he meant macros but he seemed convinced he could create a typical CRUD application INSIDE of Excel.
Is this true?
You're both right. You can use VBA inside Excel and some form functionality to create a fully functional CRUD process with a UI inside of Excel, and you could persist that data to your workbook or to some other storage area (text, XML, Access, another DBMS). It would not be a full application, per se, as it is limited to running inside of the Excel app, but it would be something more than a simple macro of "do these pre-defined steps in order."
Sure. Why would you want to?
The short answer is that using VBA, you can create background worker methods that can interface with other Office apps, or with .NET/COM code. However, if you want to add complex business logic to an Excel presentation layer, my first thought would be to create the application in C#, and use the .NET Framework wrappers for Office interop. The first advantage is that you use Excel SOLELY for presentation, supporting an MVC-ish software design. Second, you keep the code where you expect to find it; in code, not embedded in a document.
You might use Excel/VBA because:
You have VBA--a fully-loaded programming language (though the OO needs work).
Scalar functions are overloaded to work with arrays.
A decent IDE and debug facility.
Excel provides a rich event-driven platform and extends VBA's capabilities with spreadsheet behaviour that "just happens" but would take a lot of coding in a conventional language.
Form widgets that you can put anywhere, not just on a form.
Simple but adequate vector graphics.
Charts, charts and more charts--all dynamic.
Automatic persistence or, if it's called for, interfaces to just about every file and database medium, including XML and cloud services.
Relational tables are a native structure.
If it weren't past midnight, I'm sure I could think of some more good reasons, but hey....
Sure you can.... use VBA and populate cells with data from a DB, when the cells change values update the database
But why would you is the bigger question here
It is true. VBA can summon COM, which can do pretty powerful things. I used an excel file for receiving reports built by a macro inside, that searches many remote databases to group and aggregate information. You can modify the registry, make it run programs, make it restart the PC, show messages, create and edit files, make it use Word or Access, call .NET functionality. Anything that doesn't require complex rendering of something.
I'm working on an application that generates a relatively large amount of Word output. Currently, we're using Word Interop services to do the document creation, but it's quite slow, especially in older (pre-2007) versions of Office. We'd like to speed up the generation.
I haven't done a lot of profiling yet, but I'm pretty confident that the problem is that we're making tons of COM calls. I'm hoping that profiling will yield a subset of calls that are slower than the others, but my gut tells me that it's probably a question of COM overhead (or Word Interop overhead), and not just a few slow calls.
Also, the product can generate HTML output, and that process (a) is very fast, and (b) uses pretty much the same codepaths, just with a different subclass for the HTML-specific pieces of functionality. So I'm pretty sure that our algorithm isn't fundamentally slow.
So... I'm looking for suggestions for alternate ways to accelerate the generation of Word files.
We can't just rename the generated HTML files to .doc, and we can't generate RTF instead -- in both cases, important formatting information get lost, and in the RTF case, inlined graphics don't work robustly.
One of the approaches we're evaluating is programmatically generating and opening a Word file (via interop) from a template that has a macro that knows how to consume a flat file and create the requisite output. We're interested in feedback about that approach, as well as any other ideas for speeding things up.
If you can afford it, I'd recommend Aspose.Words product. Very fast and Word does not need to be installed.
Also it's much easier to use then office interop.
Your macro approach is exactly how we sped up slow excel interop (using version 2003 i think).
We found (at least with excel) that much of the slowness was due to repeated individual calls via the interop. We started to bunch up commands (ie. format large ranges, and then change specific cells as required rather than formating each cell individually), and logically moved on to macros.
I think that the macro + template approach would happily translate.
I find it quite unbelieavble that the interop API is such a mess
A lot of methods have no comments on and seems to have been done very poorly
Has anyone else experienced the same and if so what library do you use to control Exel from C#?
The obvious practical problem with the VSTO/COM Interop technology is the overhead incurred when transitioning between worksheet and managed code. (And if you're trying to talk to Excel without the help of VSTO, stop doing so and save yourself some huge headaches). I thought VSTO did a pretty good job of providing a close analog of the Excel object model in the managed environment - certainly I didn't need to spend much time trying to understand much more about .NET Interop.
For longer-running automation activities the overhead's not so much of a problem, similar concerns to VBA automation apply: reduce calls across the interface as far as possible to get best performance.
For smaller, faster worksheet function-type work (the sort of thing where we might write an XLL, say) that overhead can be a killer. ExcelDNA seems to be a great way into delivering managed code through the XLL model - and the price is right.
SpreadsheetGear for .NET is an Excel compatible spreadsheet component for .NET. It will not enable you to control Excel, but it will give you an Excel compatible spreadsheet engine for ASP.NET / WinForms / etc... that can create, read, modify, view, edit, format, calculate, print and write Excel workbooks and charts from .NET. Since SpreadsheetGear is 100% safe managed code, there is no per-call performance penalty like you get with Excel.
The SpreadsheetGear API is very similar to Excel's API - except for the fact that many APIs are more strongly typed so they tend to be easier to use from C# than Excel's API.
You can see a feature list here, live ASP.NET reporting / charting / dashboard / calculation samples for VB and C# here and download the free trial here.
Disclaimer: I own SpreadsheetGear LLC