Quickest to import data from database into Excel - c#

I am creating a C# plugin for Excel and have noticed that there seems to be a many ways that you can actually import data from a database into a sheet in excel.
I am targeting Excel 2010 and wondered whether anyone has already done this research and knows what the quickest way to load the data is?
I can already guess that anything breaching the COM boundary is going to be slow so I have to minimize that. So I can stick all the data into one 2d array and load it that way. Loading 0.5million rows with 10 columns takes around 5.5seconds (assuming I have all the data in the array already). I don't know whether that is good or bad.
...but like I said there are alot of ways to get the data in and I would like to use the fastest.

Have you ever tried SQLBulkCopy ?

Create a database query in the Excel sheet, specifying a connection string, target range and query string. Have the query executed from within Excel.
See http://www.dicks-clicks.com/excel/ExternalData3.htm for an example.

Related

C# Microsoft.Data.Analysis Dataframe to SQL Server

I want to load my Microsoft.Data.Analysis Dataframe into a SQL Server table. I somehow learnt that I should use Entity Framework for that, but I haven't found a solution similar to Pythons Sqlalchemy pandas.dataframe.tosql() method. Is there an easy way to achieve that?
I've already tried Googling that of course, but sadly that did not lead to any results, is it possible at all?
Thanks in advance for any help and have a lovely day
Right now, no. The Microsoft.Data.Analysis namespace is somewhat ... aspirational and can't even be used to load data from a database. It's an attempt to create something like Pandas Dataframes in the future and has nothing at all to do with Entity Framework.
If you want a DataFrame-like type in .NET, check the Deedle library which is used in F# data analysis programming.
Another option is to keep using Python, or learn Python, Pandas and Notebooks. Even Visual Studio Code and Azure Data Studio offer better support for Pandas and Notebooks than Microsoft.Data.Analysis.
The problem is that until recently Microsoft put all its effort in ML, not analysis. And Microsoft.Data.Analysis is part of the ML repository, so it got little attention since its introduction 2 years ago.
This changed on March 2022, when the DataFrame (Microsoft.Data.Analysis) Tracking Issue was created to track and prioritize what programmers wanted from a .NET Dataframe type. Loading from a database is open for 2 years without progress.
Loading from SQL
If you want to use Microsoft.Data.Analysis.DataFrame right now you'll have to write code similar to the CSV loading code:
Create a list of DataFrameColumns from a DataReader's schema. This can be retrieved with DbDataReader.GetSchemaTable.
Create a DataFrame with those columns
For each row, append the list of values to the dataframe. The values could be retrieved with DbDataReader.GetValues
Loading from Excel
The same technique can be used if the Excel file is loaded using a library like ExcelDataReader that exposed the data through a DataReader. The library doesn't implement all methods though, so some tweaking may be needed if eg GetValues doesn't work.
Writing to SQL
That's harder because you can't just send a stream of rows to a table. A DataFrame is a collection of Series too, not rows. You'd have to construct the INSERT SQL command from the column names, then execute it with data from each row.
A dirty way would be to convert the DataFrame into a DataTable and save that using a DataAdapter. That would create yet another copy of the data in memory though.
A better way would be to create a DataReader wrapper over a DataFrame, and pass that reader to eg SqlBulkCopy

Create an Excel PivotTable from a DataTable

I have a webservice that is hosting a large mass of pricing data, and returns the data relevant to some prescribed query parameters. The data comes back as a Datatable object (in C#) - the object type itself doesn't matter so much as the fact that the data goes directly into memory and is not on a spreadsheet in the host Excel object.
Now, I want to create a pivottable off of this data.
I've been looking high and low on the web, and I can't see anyone explaining how to do this. Is it impossible? It seems foolish to suggest VSTO as the only supported way of consuming webservice data going forward, but make pivottables off of that data impossible.
The only solutions I have are kludges, and I want to make sure there isn't a graceful solution before I do one of these ugly things:
Dump datatable to excel sheet and point pivottable to excel range.
This is far from ideal because I'm either doing rowwise deletion
over the entire dataset (slow as heck) or peaking at 2x memory
consumption
Dump datatable to filesystem and point pivottable to flatfile.
This is even worse but at least doesn't have the memory drawback.
Are these really the only ways to do this operation? There has to be something more graceful.
DataTable: http://msdn.microsoft.com/en-us/library/system.data.datatable.aspx
PivotCache: http://msdn.microsoft.com/en-us/library/microsoft.office.interop.excel.pivottable.pivotcache(v=office.11).aspx
Excel has to be able to see and access the data to make a PivotTable from it. So you have to make sure that the data is someplace that the PivotTable loader can read. Further, Excel is COM-based and can neither see nor process .NET objects.
It's pretty much just that simple.
Your choices are:
Load the data into an Excel range
Save the data to a file
Store the data into a database (Access, SQL Server, etc.)
Store the data in a data warehouse (SSAS, offline Cube, etc.)
That's it. The only other remotely possible way would be to implement the COM interfaces necessary to present as an OLE DB or an ODBC data source, but that would be one heck of a lot of work.

Excel Data Processing with VSTO?

I find myself in possession of an Excel Spreadsheet containing about 3,000 rows of data that represent either additions or changes to data that I need to make to an SQL Table. As you can imagine that's a bit too much to handle manually. For a number of reasons beyond my control, I can't simply use an SSIS package or other simpler method to get these changes into the database. The only option I have is to create SQL scripts that will make the changes represented in the spreadsheet to MS SQL 2005.
I have absolutely no experience with Office automation or VSTO. I've tried looking online, but most of the tutorials I've seen seem a bit confusing to me.
So, my thought is that I'd use .NET and VSTO to iterate through the rows of data (or use LINQ, whatever makes sense) and determine if the item involved is an insert or an update item. There is color highlighting in the sheet to show the delta, so I suppose I could use that or I could look up some key data to establish if the entry exists. Once I establish what I'm dealing with, I could call methods that generate a SQL statement that will either insert or update the data. Inserts would be extremely easy, and I could use the delta highlights to determine which fields need to be updated for the update items.
I would be fine with either outputting the SQL to a file, or even adding the test of the SQL for a given row in the final cell of that row.
Any direction to some sample code, examples, how-tos or whatever would lead me in the right direction would be most appreciated. I'm not picky. If there's some tool I'm unaware of or a way to use an existing tool that I haven't thought of to accomplish the basic mission of generating SQL to accomplish the task, then I'm all for it.
If you need any other information feel free to ask.
Cheers,
Steve
I suggest before trying VSTO, keep things simple and get some experience how to solve such a problem with Excel VBA. IMHO that is the easiest way of learning the Excel object model, especially because you have the macro recorder at hand. You can re-use this knowledge later when you think you have to switch to C#, VSTO or Automation or (better !) Excel DNA.
For Excel VBA, there are lots of tutorials out there, here is one:
http://www.excel-vba.com/excel-vba-contents.htm
If you need to know how to execute arbitrary SQL commands like INSERT or UPDATE within a VBA program, look into this SO post:
Excel VBA to SQL Server without SSIS
Here is another SO post showing how to get data from an SQL server into an Excel spreadsheet:
Accessing SQL Database in Excel-VBA

Excel file parsing/scraping using .NET

Hi experts am trying to parse an excel file. its structure is very complex. The possible way i know are.
Use Office introp libraries
Use OLEDB provider and read excel file in a dataset.
But the issue is of its complexity like some columns,cells or rows blank etc.
What are the best possible ways to do this ?
thanks in advance.
I can recommend the ExcelDataReader (licensed under LGPL I think). It loads both .xls and .xlsx files, and lets you get the spreadsheet as a DataSet, with each worksheet being an individual DataTable. As far as I know from the scenarios I have used it in, it honours blank rows, empty cells, etc. Try it and see if you think it will handle your "very complex" structure. [I do notice one negative review on the site - but the rest are pretty positive. I've experienced an issue reading .xlsx if a worksheet is renamed]
I've also used the OLEDB approach in the past, but be warned that this has real problems in the way it tries to infer datatypes in the first few rows. If the datatype changes for a column, then this may well infer it wrongly. To make matters worse, when it does get it wrong, it will often return null as the value, making it difficult (or impossible) to tell a true null value from a datatype that changed after the first six or seven rows.
Personally i prefer to either use the OLEDB way, which is a bit clunky at best at times, or you can use a third party library that has put in the time/effort/energy to get access to the data.
SyncFusion has a pretty nice library for this.
I have my users first save the Excel spreadsheet as a CSV file. Then they upload the CSV file to my app. That makes it much simpler to parse.
I've used OLEDB myself to read uploaded Excel files, and its presents no real problems (except for nulls in fields, instead of blanks, which can be checked with IsDBNull). Also, third party open source tools like NPOI and Excel2007ReadWrite (http://www.codeproject.com/KB/office/OpenXML.aspx) can be useful.
I have thoroughly evaluated both of these third party tools, and both are pretty stable and easy to integrate. I would recommend NPOI for Excel 2003 files, and Excel2007ReadWrite for Excel 2007 files.
It sounds like you have a good understanding of the task at hand. You'll have to write business logic to untangle the complexities of the spreadsheet format and extract the data you're looking for.
It seems to me that VTSO/Interop is the best platform strategy for 2 reasons:
Access to the spreadsheet data will be a small part of the effort needed for your solution. So if using OLEDB saves a little time in data access, it will probably be irrelevant in terms of the overall project scope.
You may need to examine the contents of individual cells closely and take context information like formatting into account. With interop, you get full visibility of cell contents, context, and other sheet level context information like named ranges and lists. It is a risk to assume you won't need this type of information while decoding the spreadsheet.

How can I save large amounts of data in C#?

I'm writing a program in C# that will save lots of data points and then later make a graph. What is the best way to save these points?
Can I just use a really long array or should I use a text file or excel file or something like that?
Additional information: It probably wont be more than a couple thousand. And it would be good if I could access it from a windows mobile app. Basically a user will be able to save times that something happens at, and then the app will use the data to find a cross correlation.
If it's millions or even thousands of records, I would probably look at using a database. You can get SQL Server 2008 Express for free, or use MySQL, or something like that.
If you go that route, LINQ to SQL makes database access a piece of cake in .NET. Entity Framework is also available, but LINQ to SQL probably has a quicker time-to-implement.
If you use a text file or excel file, etc. You'll still need to load them back into memory to plot the graph.
So if you're collecting data over a long period of time, or you want to plot the graph some time in the future, write them to a plain text file. When you're ready to plot the graph, load the file up and plot the graph.
If the data collection is within a short period of time, don't bother writing to a file - it'll just add steps to the process for nothing.
A really easy way of doing this would be to serialize your object list into a BinaryWriter or XMLWriter, which automatically format your data into a readable and writable format so that, when your program needs to load the data, all you have to do is deserialize it (1 line of code).
Alternatively, if you have very many records, I suggest trying to use a database. It's quite easy to interface C# with SQL Server (there's a free version called Express Edition) or MySQL, and storing and retrieving huge amounts of data is not a pain. This would be the most efficient way to accomplish your task.
Depending on how much data you have and whether you want to accomplish something like this with 1 line of code (serialization) or interface with a seperate product (the database approach), you can choose either one of the above. Of course, if you wanted to, you could just manually write the contents of your data to a text file or CSV file, as you suggested, but, from personal experience, I recommend the methods I explained above.
It probably wont be more than a couple thousand. And it would be good if I could access it from a windows mobile app. Basically a user will be able to save times that something happens at, and then the app will use the data to find a cross correlation.
Is there any need for interoperability with other processes? If so, time to swat-up on file formats.
However, from the sound of it, you're asking on a matter of "style", with no real requirement to open the file anywhere but your own app. I'd suggest using a BinaryWriter for the task.
If debugging is an issue, a human-readable format might be preferable, but would be considerably larger than the binary equivalent.
Probably the quickest way to do it would be using binary serialization.

Categories