I find myself in possession of an Excel Spreadsheet containing about 3,000 rows of data that represent either additions or changes to data that I need to make to an SQL Table. As you can imagine that's a bit too much to handle manually. For a number of reasons beyond my control, I can't simply use an SSIS package or other simpler method to get these changes into the database. The only option I have is to create SQL scripts that will make the changes represented in the spreadsheet to MS SQL 2005.
I have absolutely no experience with Office automation or VSTO. I've tried looking online, but most of the tutorials I've seen seem a bit confusing to me.
So, my thought is that I'd use .NET and VSTO to iterate through the rows of data (or use LINQ, whatever makes sense) and determine if the item involved is an insert or an update item. There is color highlighting in the sheet to show the delta, so I suppose I could use that or I could look up some key data to establish if the entry exists. Once I establish what I'm dealing with, I could call methods that generate a SQL statement that will either insert or update the data. Inserts would be extremely easy, and I could use the delta highlights to determine which fields need to be updated for the update items.
I would be fine with either outputting the SQL to a file, or even adding the test of the SQL for a given row in the final cell of that row.
Any direction to some sample code, examples, how-tos or whatever would lead me in the right direction would be most appreciated. I'm not picky. If there's some tool I'm unaware of or a way to use an existing tool that I haven't thought of to accomplish the basic mission of generating SQL to accomplish the task, then I'm all for it.
If you need any other information feel free to ask.
Cheers,
Steve
I suggest before trying VSTO, keep things simple and get some experience how to solve such a problem with Excel VBA. IMHO that is the easiest way of learning the Excel object model, especially because you have the macro recorder at hand. You can re-use this knowledge later when you think you have to switch to C#, VSTO or Automation or (better !) Excel DNA.
For Excel VBA, there are lots of tutorials out there, here is one:
http://www.excel-vba.com/excel-vba-contents.htm
If you need to know how to execute arbitrary SQL commands like INSERT or UPDATE within a VBA program, look into this SO post:
Excel VBA to SQL Server without SSIS
Here is another SO post showing how to get data from an SQL server into an Excel spreadsheet:
Accessing SQL Database in Excel-VBA
Related
I want to load my Microsoft.Data.Analysis Dataframe into a SQL Server table. I somehow learnt that I should use Entity Framework for that, but I haven't found a solution similar to Pythons Sqlalchemy pandas.dataframe.tosql() method. Is there an easy way to achieve that?
I've already tried Googling that of course, but sadly that did not lead to any results, is it possible at all?
Thanks in advance for any help and have a lovely day
Right now, no. The Microsoft.Data.Analysis namespace is somewhat ... aspirational and can't even be used to load data from a database. It's an attempt to create something like Pandas Dataframes in the future and has nothing at all to do with Entity Framework.
If you want a DataFrame-like type in .NET, check the Deedle library which is used in F# data analysis programming.
Another option is to keep using Python, or learn Python, Pandas and Notebooks. Even Visual Studio Code and Azure Data Studio offer better support for Pandas and Notebooks than Microsoft.Data.Analysis.
The problem is that until recently Microsoft put all its effort in ML, not analysis. And Microsoft.Data.Analysis is part of the ML repository, so it got little attention since its introduction 2 years ago.
This changed on March 2022, when the DataFrame (Microsoft.Data.Analysis) Tracking Issue was created to track and prioritize what programmers wanted from a .NET Dataframe type. Loading from a database is open for 2 years without progress.
Loading from SQL
If you want to use Microsoft.Data.Analysis.DataFrame right now you'll have to write code similar to the CSV loading code:
Create a list of DataFrameColumns from a DataReader's schema. This can be retrieved with DbDataReader.GetSchemaTable.
Create a DataFrame with those columns
For each row, append the list of values to the dataframe. The values could be retrieved with DbDataReader.GetValues
Loading from Excel
The same technique can be used if the Excel file is loaded using a library like ExcelDataReader that exposed the data through a DataReader. The library doesn't implement all methods though, so some tweaking may be needed if eg GetValues doesn't work.
Writing to SQL
That's harder because you can't just send a stream of rows to a table. A DataFrame is a collection of Series too, not rows. You'd have to construct the INSERT SQL command from the column names, then execute it with data from each row.
A dirty way would be to convert the DataFrame into a DataTable and save that using a DataAdapter. That would create yet another copy of the data in memory though.
A better way would be to create a DataReader wrapper over a DataFrame, and pass that reader to eg SqlBulkCopy
I am creating a C# plugin for Excel and have noticed that there seems to be a many ways that you can actually import data from a database into a sheet in excel.
I am targeting Excel 2010 and wondered whether anyone has already done this research and knows what the quickest way to load the data is?
I can already guess that anything breaching the COM boundary is going to be slow so I have to minimize that. So I can stick all the data into one 2d array and load it that way. Loading 0.5million rows with 10 columns takes around 5.5seconds (assuming I have all the data in the array already). I don't know whether that is good or bad.
...but like I said there are alot of ways to get the data in and I would like to use the fastest.
Have you ever tried SQLBulkCopy ?
Create a database query in the Excel sheet, specifying a connection string, target range and query string. Have the query executed from within Excel.
See http://www.dicks-clicks.com/excel/ExternalData3.htm for an example.
I would like to have fields in my Excel sheets be bound (in both directions) to a data source, in this case an Access DB.
For example, I would like to have an Excel sheet 'select' a particular record, say a customer, and then load information on that customer into the worksheet. Then, any changes made to that worksheet would be pushed back to the data store, making Excel a nice front end to the data.
Can this be done? From what I can tell "Get External Data" options in Excel are one way routes. My development background is heavy in ASP.NET C# and SQL.
Excel is designed to deal with datasets and not so much single records. For what you are trying to do with a single record, you would be far better off building a form in access, but as I don't know your environment/organisations limitations I'll make a suggestion.
Since you've obviously got a bit of SQL and coding skill check out this post for an option that would work for you - Updating Access Database from Excel Worksheet Data
You can get or put as much data as you want and can join tables too. It's a good basic get and then push set up.
Hi experts am trying to parse an excel file. its structure is very complex. The possible way i know are.
Use Office introp libraries
Use OLEDB provider and read excel file in a dataset.
But the issue is of its complexity like some columns,cells or rows blank etc.
What are the best possible ways to do this ?
thanks in advance.
I can recommend the ExcelDataReader (licensed under LGPL I think). It loads both .xls and .xlsx files, and lets you get the spreadsheet as a DataSet, with each worksheet being an individual DataTable. As far as I know from the scenarios I have used it in, it honours blank rows, empty cells, etc. Try it and see if you think it will handle your "very complex" structure. [I do notice one negative review on the site - but the rest are pretty positive. I've experienced an issue reading .xlsx if a worksheet is renamed]
I've also used the OLEDB approach in the past, but be warned that this has real problems in the way it tries to infer datatypes in the first few rows. If the datatype changes for a column, then this may well infer it wrongly. To make matters worse, when it does get it wrong, it will often return null as the value, making it difficult (or impossible) to tell a true null value from a datatype that changed after the first six or seven rows.
Personally i prefer to either use the OLEDB way, which is a bit clunky at best at times, or you can use a third party library that has put in the time/effort/energy to get access to the data.
SyncFusion has a pretty nice library for this.
I have my users first save the Excel spreadsheet as a CSV file. Then they upload the CSV file to my app. That makes it much simpler to parse.
I've used OLEDB myself to read uploaded Excel files, and its presents no real problems (except for nulls in fields, instead of blanks, which can be checked with IsDBNull). Also, third party open source tools like NPOI and Excel2007ReadWrite (http://www.codeproject.com/KB/office/OpenXML.aspx) can be useful.
I have thoroughly evaluated both of these third party tools, and both are pretty stable and easy to integrate. I would recommend NPOI for Excel 2003 files, and Excel2007ReadWrite for Excel 2007 files.
It sounds like you have a good understanding of the task at hand. You'll have to write business logic to untangle the complexities of the spreadsheet format and extract the data you're looking for.
It seems to me that VTSO/Interop is the best platform strategy for 2 reasons:
Access to the spreadsheet data will be a small part of the effort needed for your solution. So if using OLEDB saves a little time in data access, it will probably be irrelevant in terms of the overall project scope.
You may need to examine the contents of individual cells closely and take context information like formatting into account. With interop, you get full visibility of cell contents, context, and other sheet level context information like named ranges and lists. It is a risk to assume you won't need this type of information while decoding the spreadsheet.
I need to import the data form .csv file into the database table (MS SQL Server 2005). SQL BULK INSERT seems like a good option, but the problem is that my DB server is not on the same box as my WEB server. This question describes the same issue, however i don't have any control over my DB server, and can't share any folders on it.
I need a way to import my .csv programatically (C#), any ideas?
EDIT: this is a part of a website, where user can populate the table with .csv contents, and this would happen on a weekly basis, if not more often
You have several options:
SSIS
DTS
custom application
Any of these approaches ought to get the job done. If it is just scratch work it might be best to write a throwaway app in your favorite language just to get the data in. If it needs to be a longer-living solution you may want to look into SSIS or DTS as they are made for this type of situation.
Try Rhino-ETL, its an open source ETL engine written in C# that can even use BOO for simple ETL scripts so you don't need to compile it all the time.
The code can be found here:
https://github.com/hibernating-rhinos/rhino-etl
The guy who wrote it:
http://www.ayende.com/blog
The group lists have some discussions about it, I actually added bulk insert for boo scripts a while ago.
http://groups.google.com/group/rhino-tools-dev
http://groups.google.com/group/rhino-tools-dev/browse_thread/thread/2ecc765c1872df19/d640cd259ed493f1
If you download the code there are several samples, also check the google groups list if you need more help.
i ended up using CSV Reader. I saw a reference to it in one of the #Jon Skeet's answers, can't find it again to put the link to it
How big are your datasets? Unless they are very large you can get away with parameterized insert statements. You may want to load to a staging table first for peace of mind or performance reasons.