Best Practice: writing data from DGV to SQL Server table - c#

I have an unbound DataGridView with one visible field.
The user can copy data, from the clipboard, into this DGV in a similar manner to this article
Now I'd like to move this data into a table on SQL Server.
It's been suggested to me to do the following:
Create a stored procedure that takes a single parameter and writes that input to a table
Loop through the items in the DGV feeding each into the stored procedure and therefore writing them to the table
Can I not just grab all the items in the DGV and insert them into the target table at once, without having to loop?
Or is the loop method (with upto 2,000 iterations) the best practice in such a situation? (Or is there no particular best practice?!)

If you are looking at using a stored proc, then you can follow some of the examples of passing arrays of values proposed examples by Erland Sommarskog;
Take a look at;
http://www.sommarskog.se/arrays-in-sql-2008.html <- For SS 2008 based around Table Valued Parameters.
http://www.sommarskog.se/arrays-in-sql-2005.html <- Options for SS 2005. I've used the XML method quite a few times and found it quite useful.
If you are using SS 2008, then you could possibly investigate his example of using the datatable as a source.
Not sure if these are considered best practice or not, but it is certainly food for thought.

Related

Trying to get an UPSERT working on a set of data using dapper

I'm trying to get an upsert working on a collection of IDs (not the primary key - that's an identity int column) on a table using dapper. This doesn't need to be a dapper function, just including in case that helps.
I'm wondering if it's possible (either through straight SQL or using a dapper function) to run an upsert on a collection of IDs (specifically an IEnumerable of ints).
I really only need a simple example to get me started, so an example would be:
I have three objects of type Foo:
{ "ExternalID" : 1010101, "DescriptorString" : "I am a descriptive string", "OtherStuff" : "This is some other stuff" }
{ "ExternalID" : 1010122, "DescriptorString" : "I am a descriptive string123", "OtherStuff" : "This is some other stuff123" }
{ "ExternalID" : 1033333, "DescriptorString" : "I am a descriptive string555", "OtherStuff" : "This is some other stuff555" }
I have a table called Bar, with those same column names (where only 1033333 exists):
Table Foo
Column ID | ExternalID | DescriptorString | OtherStuff
Value [1]|[1033333] |["I am a descriptive string555"]|["This is some other stuff555"]
Well, since you said that this didn't need to be dapper-based ;-), I will say that the fastest and cleanest way to get this data upserted is to use Table-Valued Parameters (TVPs) which were introduced in SQL Server 2008. You need to create a User-Defined Table Type (one time) to define the structure, and then you can use it in either ad hoc queries or pass to a stored procedure. But this way you don't need to export to a file just to import, nor do you need to convert it to XML just to convert it back to a table.
Rather than copy/paste a large code block, I have noted three links below where I have posted the code to do this (all here on S.O.). The first two links are the full code (SQL and C#) to accomplish this (the 2nd link being the most analogous to what you are trying to do). Each is a slight variation on the theme (which shows the flexibility of using TVPs). The third is another variation but not the full code as it just shows the differences from one of the first two in order to fit that particular situation. But in all 3 cases, the data is streamed from the app into SQL Server. There is no creating of any additional collection or external file; you use what you currently have and only need to duplicate the values of a single row at a time to be sent over. And on the SQL Server side, it all comes through as a populated Table Variable. This is far more efficient than taking data you already have in memory, converting it to a file (takes time and disk space) or XML (takes cpu and memory) or a DataTable (for SqlBulkCopy; takes cpu and memory) or something else, only to rely on an external factor such as the filesystem (the files will need to be cleaned up, right?) or need to parse out of XML.
How can I insert 10 million records in the shortest time possible?
Pass Dictionary<string,int> to Stored Procedure T-SQL
Storing a Dictionary<int,string> or KeyValuePair in a database
Now, there are some issues with the MERGE command (see Use Caution with SQL Server's MERGE Statement) that might be a reason to avoid using it. So, I have posted the "upsert" code that I have been using for years to an answer on DBA.StackExchange:
How to avoid using Merge query when upserting multiple data using xml parameter?

Reduce number of database calls

I have a stored-procedure which accepts five parameters and performing a update on a table
Update Table
Set field = #Field
Where col1= #Para1 and Col2=#Para and Col3=#Para3 and col4 =#aPara4
From the user prospective you can select multiple values for all the condition parameters.
For example you can select 2 options which needs to match Col1 in database table (which need to pass as #Para1)
So I am storing all the selected values in separates lists.
At the moment I am using foreach loop to do the update
foreach (var g in _list1)
{
foreach (var o in _list2)
{
foreach (var l in _list3)
{
foreach (var a in _list4)
{
UpdateData(g, o, l,a);
}
}
}
}
I am sure this is not a good way of doing this since this will call number of database call. Is there any way I can ignore the loop and do a minimum number of db calls to achieve the same result?
Update
I am looking for some other approach than Table-Valued Parameters
You can bring query to this form:
Update Table Set field = #Field Where col1 IN {} and Col2 IN {} and Col3 IN {} and col4 IN {}
and pass parameters this way: https://stackoverflow.com/a/337792/580053
One possible way would be to use Table-Valued Parameters to pass the multiple values per condition to the stored procedure. This would reduce the loops in your code and should still provide the functionality that you are looking for.
If I am not mistaken they were introduced in SQL Server 2008, so as long as you don't have to support 2005 or earlier they should be fine to use.
Consider using the MS Data Access Application Block from the Enterprise Library for the UpdateDataSet command.
Essentially, you would build a datatable where each row is a parameter set, then you execute the "batch" of parameter sets against the open connection.
You can do the same without that of course, by building a string that has several update commands in it and executing it against the DB.
Since table-valued parameters are off limits to you, you may consider an XML-based approach:
Build an XML document containing the four columns that you would like to pass.
Change the signature of your stored procedure to accept a single XML-valued parameter instead of four scalar parameters
Change the code of your stored procedure to perform the updates based on the XML that you get
Call your new stored procedure once with the XML that you constructed in memory using the four nested loops.
This should reduce the number of round-trips, and speed up the overall execution time. Here is a link to an article explaining how inserting many rows can be done at once using XML; your situation is somewhat similar, so you should be able to use the approach outlined in that article.
So long as you have the freedom to update the structure of the stored procedure; the method I would suggest for this would be to use a table value parameter instead of the multiple parameters.
A good example which goes into both server and database code for this can be found at: http://www.codeproject.com/Articles/39161/C-and-Table-Value-Parameters
Why are you using a stored procedure for this? In my opinion you shouldn't use SP to do simple CRUD operations. The real power of stored procedures is for heavy calculations and things like that.
Table-valued parameters would be my choice, but since you are looking for other approach why don't you go the simpler way and just dynamically construct a bulk/mass update query on your server side code and run it against the DB?

Updating millions of Row after Calculation

I am looking for advice on how should I do following:
I have a table in SQL server with about 3 -6 Million Records and 51 Columns.
only one column needs to be updated after calculating a value from 45 columns data been taken in mathematical calculation.
I already have maths done through C#, and I am able to create Datatable out of it [with millions record yes].
Now I want to update them into database with most efficient manner. Options I know are
Run update query with every record, as I use loop on data reader to do math and create DataTable.
Create A temporary table and use SQLBulkCopy to copy data and later use MERGE statement
Though it is very HARD to do, but can try to make Function within SQL to do all math and just run simple update without any condition to update all in once.
I am not sure which method is faster one or better one. Any idea?
EDIT: Why I am afraid of using Stored Procedure
First I have no idea how i wrote it, I am pretty new to do this. Though maybe it is time to do it now.
My Formula is Take one column, apply one forumla on them, along with additional constant value [which is also part of Column name], then take all 45 columns and apply another formula.
The resultant will be stored in 46th column.
Thanks.
If you have a field that contains a calculation from other fields in the database, it is best to make it a calculated field or to maintain it through a trigger so that anytime the data is changed from any source, the calculation is maintained.
You can create a .net function which can be called directly from sql here is the link how to create one http://msdn.microsoft.com/en-us/library/w2kae45k%28v=vs.90%29.aspx. After you created the function run the simple update statement
Can't you create a scalar valued function in c#, and call it in as part of a computed column?

best way to split up a long file. Programming or SQL?

I have a database Table (in MS-Access) of GPS information with a record of Speed, location (lat/long) and bearing of a vehicle for every second. There is a field that shows time like this 2007-09-25 07:59:53. The problem is that this table has has merged information from several files that were collected on this project. So, for example, 2007-09-25 07:59:53 to 2007-09-25 08:15:42 could be one file and after a gap of more than 10 seconds, the next file will start, like 2007-09-25 08:15:53 to 2007-09-25 08:22:12. I need to populate a File number field in this table and the separating criterion for each file will be that the gap in time from the last and next file is more than 10 sec. I did this using C# code by iterating over the table and comparing each record to the next and changing file number whenever the gap is more than 10 sec.
My question is, should this type of problem be solved using programming or is it better solved using a SQL query? I can load the data into a database like SQL Server, so there is no limitation to what tool I can use. I just want to know the best approach.
If it is better to solve this using SQL, will I need to use cursors?
When solving this using programming (for example C#) what is an efficient way to update a Table when 20000+ records need to be updated based on an updated DataSet? I used the DataAdapter.Update() method and it seemed to take a long time to update the table (30 mins or so).
Assuming SQL Server 2008 and CTEs from your comments:
The best time to use SQL is generally when you are comparing or evaluating large sets of data.
Iterative programming languages like C# are better suited to more expansive analysis of individual records or analysis of rows one at a time (*R*ow *B*y *A*gonizing *R*ow).
For examples of recursive CTEs, see here. MS has a good reference.
Also, depending on data structure, you could do this with a normal JOIN:
SELECT <stuff>
FROM MyTable T
INNER JOIN MyTable T2
ON t2.timefield = DATEADD(minute, -10, t.timefield)
WHERE t2.pk = (SELECT MIN(pk) FROM MyTable WHERE pk > t.pk)

What's the best way to use SqlBulkCopy to fill a really large table?

Nightly, I need to fill a SQL Server 2005 table from an ODBC source with over 8 million records. Currently I am using an insert statement from linked server with syntax select similar to this:
Insert Into SQLStagingTable from Select * from OpenQuery(ODBCSource, 'Select * from SourceTable')
This is really inefficient and takes hours to run. I'm in the middle of coding a solution using SqlBulkInsert code similar to the code found in this question.
The code in that question is first populating a datatable in memory and then passing that datatable to the SqlBulkInserts WriteToServer method.
What should I do if the populated datatable uses more memory than is available on the machine it is running (a server with 16GB of memory in my case)?
I've thought about using the overloaded ODBCDataAdapter fill method which allows you to fill only the records from x to n (where x is the start index and n is the number of records to fill). However that could turn out to be an even slower solution than what I currently have since it would mean re-running the select statement on the source a number of times.
What should I do? Just populate the whole thing at once and let the OS manage the memory? Should I populate it in chunks? Is there another solution I haven't thought of?
The easiest way would be to use ExecuteReader() against your odbc data source and pass the IDataReader to the WriteToServer(IDataReader) overload.
Most data reader implementations will only keep a very small portion of the total results in memory.
SSIS performs well and is very tweakable. In my experience 8 million rows is not out of its league. One of my larger ETLs pulls in 24 million rows a day and does major conversions and dimensional data warehouse manipulations.
If you have indexes on the destination table, you might consider disabling those till the records get inserted?

Categories