I´ve to do several inserts into one table. Now, I´ve a list of objects that i iterate, and for each object i do an ExecuteNonQuery with the insert statement.
I want to know if there are a faste way to do this.
In this way, inserting 800 registers takes a couple of minutes.
I´ve used in Java the executeBatch method, who is for this propose, is there anything similar in c#.
Regards !
Inserts are [relatively] fast/cheap. Commits are slow/expensive.
Unless it is a high latency connection, multiple (as in, hundreds of) insert statements should be just fine.
"...takes a couple minutes..." sounds like transactions are not being used (and thus there are likely 800 commits -- ouch!). One of the easiest ways to control transactions in C# is to use a TransactionScope:
using (var connection = ...)
using (var tx = new TransactionScope()) {
foreach (var row in rows) {
// insert row
}
// commit all at once
tx.Complete();
}
I would only consider approaches if limiting the number of transactions (and thus commits) is not sufficient to meet functional requirements.
Happy coding.
Your statement could be something like this:
insert into my_table (field1, field2, ...)
values (row1_value1, row2_value2, ...),
(row2_value1, row2_value2, ...), ...
So if you create for example a StringBuilder, you could batch up 20, 50 or 100 inserts and issue much less insert statements.
Take a look at Performing Batch Operations Using DataAdapters (ADO.NET) if you want to do everything from your .NET app.
If you want the fastest possible performance you could use the bcp utility (assuming you're using SQL Server.) You would write your data to a delimited file and use bcp to perform the inserts.
What about writting all your insert statement in a string separated by ';' and pass this string to ExecuteNonQuery ?
to check for error just compare the number of insert with the row affected that ExecuteNonQuery return
Related
I am using SqlDataReader for data migration. How can I increase the number of records inserting to the destination at a time?
I want to increase the number of records inserting to the destination at a time.
Then this is unrelated to SqlDataReader, and you'll need to look at whatever tool you're using for the insert. If you're using SqlBulkCopy, then this should be as simple as changing the .BatchSize. If you're using other mechanisms, you'll have to be specific. For example, if you're using an SP to do the inserts that only handles one row at a time, one option might be to use MARS and overlapping async operations; I have a utility method that I use for this type of sequential fixed-depth overlapping (which is very different to what Parallel.ForEach would do, for example, even with a fixed max-DOP). If you're using an insert that works via TDS-based table-parameters, then: just buffer that much data locally before calling the operation. If you're using an ORM such as EF: refer to the ORM's insert documentation.
But: to emphasize: the one thing that doesn't get a vote on this is: the data-reader.
I have a stored-procedure which accepts five parameters and performing a update on a table
Update Table
Set field = #Field
Where col1= #Para1 and Col2=#Para and Col3=#Para3 and col4 =#aPara4
From the user prospective you can select multiple values for all the condition parameters.
For example you can select 2 options which needs to match Col1 in database table (which need to pass as #Para1)
So I am storing all the selected values in separates lists.
At the moment I am using foreach loop to do the update
foreach (var g in _list1)
{
foreach (var o in _list2)
{
foreach (var l in _list3)
{
foreach (var a in _list4)
{
UpdateData(g, o, l,a);
}
}
}
}
I am sure this is not a good way of doing this since this will call number of database call. Is there any way I can ignore the loop and do a minimum number of db calls to achieve the same result?
Update
I am looking for some other approach than Table-Valued Parameters
You can bring query to this form:
Update Table Set field = #Field Where col1 IN {} and Col2 IN {} and Col3 IN {} and col4 IN {}
and pass parameters this way: https://stackoverflow.com/a/337792/580053
One possible way would be to use Table-Valued Parameters to pass the multiple values per condition to the stored procedure. This would reduce the loops in your code and should still provide the functionality that you are looking for.
If I am not mistaken they were introduced in SQL Server 2008, so as long as you don't have to support 2005 or earlier they should be fine to use.
Consider using the MS Data Access Application Block from the Enterprise Library for the UpdateDataSet command.
Essentially, you would build a datatable where each row is a parameter set, then you execute the "batch" of parameter sets against the open connection.
You can do the same without that of course, by building a string that has several update commands in it and executing it against the DB.
Since table-valued parameters are off limits to you, you may consider an XML-based approach:
Build an XML document containing the four columns that you would like to pass.
Change the signature of your stored procedure to accept a single XML-valued parameter instead of four scalar parameters
Change the code of your stored procedure to perform the updates based on the XML that you get
Call your new stored procedure once with the XML that you constructed in memory using the four nested loops.
This should reduce the number of round-trips, and speed up the overall execution time. Here is a link to an article explaining how inserting many rows can be done at once using XML; your situation is somewhat similar, so you should be able to use the approach outlined in that article.
So long as you have the freedom to update the structure of the stored procedure; the method I would suggest for this would be to use a table value parameter instead of the multiple parameters.
A good example which goes into both server and database code for this can be found at: http://www.codeproject.com/Articles/39161/C-and-Table-Value-Parameters
Why are you using a stored procedure for this? In my opinion you shouldn't use SP to do simple CRUD operations. The real power of stored procedures is for heavy calculations and things like that.
Table-valued parameters would be my choice, but since you are looking for other approach why don't you go the simpler way and just dynamically construct a bulk/mass update query on your server side code and run it against the DB?
History
I have a list of "records" (3,500) which I save to XML and compress on exit of the program. Since:
the number of the records increases
only around 50 records need to be updated on exit
saving takes about 3 seconds
I needed another solution -- embedded database. I chose SQL CE because it works with VS without any problems and the license is OK for me (I compared it to Firebird, SQLite, EffiProz, db4o and BerkeleyDB).
The data
The record structure: 11 fields, 2 of them make primary key (nvarchar + byte). Other records are bytes, datatimes, double and ints.
I don't use any relations, joins, indices (except for primary key), triggers, views, and so on. It is flat Dictionary actually -- pairs of Key+Value. I modify some of them, and then I have to update them in database. From time to time I add some new "records" and I need to store (insert) them. That's all.
LINQ approach
I have blank database (file), so I make 3500 inserts in a loop (one by one). I don't even check if the record already exists because db is blank.
Execution time? 4 minutes, 52 seconds. I fainted (mind you: XML + compress = 3 seconds).
SQL CE raw approach
I googled a bit, and despite such claims as here:
LINQ to SQL (CE) speed versus SqlCe
stating it is SQL CE itself fault I gave it a try.
The same loop but this time inserts are made with SqlCeResultSet (DirectTable mode, see: Bulk Insert In SQL Server CE) and SqlCeUpdatableRecord.
The outcome? Do you sit comfortably? Well... 0.3 second (yes, fraction of the second!).
The problem
LINQ is very readable, and raw operations are quite contrary. I could write a mapper which translates all column indexes to meaningful names, but it seems like reinventing the wheel -- after all it is already done in... LINQ.
So maybe it is a way to tell LINQ to speed things up? QUESTION -- how to do it?
The code
LINQ
foreach (var entry in dict.Entries.Where(it => it.AlteredByLearning))
{
PrimLibrary.Database.Progress record = null;
record = new PrimLibrary.Database.Progress();
record.Text = entry.Text;
record.Direction = (byte)entry.dir;
db.Progress.InsertOnSubmit(record);
record.Status = (byte)entry.LastLearningInfo.status.Value;
// ... and so on
db.SubmitChanges();
}
Raw operations
SqlCeCommand cmd = conn.CreateCommand();
cmd.CommandText = "Progress";
cmd.CommandType = System.Data.CommandType.TableDirect;
SqlCeResultSet rs = cmd.ExecuteResultSet(ResultSetOptions.Updatable);
foreach (var entry in dict.Entries.Where(it => it.AlteredByLearning))
{
SqlCeUpdatableRecord record = null;
record = rs.CreateRecord();
int col = 0;
record.SetString(col++, entry.Text);
record.SetByte(col++,(byte)entry.dir);
record.SetByte(col++,(byte)entry.LastLearningInfo.status.Value);
// ... and so on
rs.Insert(record);
}
Do more work per transaction.
Commits are generally very expensive operations for a typical relational database as the database must wait for disk flushes to ensure data is not lost (ACID guarantees and all that). Conventional HDD disk IO without specialty controllers is very slow in this sort of operation: the data must be flushed to the physical disk -- perhaps only 30-60 commits can occur a second with an IO sync between!
See the SQLite FAQ: INSERT is really slow - I can only do few dozen INSERTs per second. Ignoring the different database engine, this is the exact same issue.
Normally, LINQ2SQL creates a new implicit transaction inside SubmitChanges. To avoid this implicit transaction/commit (commits are expensive operations) either:
Call SubmitChanges less (say, once outside the loop) or;
Setup an explicit transaction scope (see TransactionScope).
One example of using a larger transaction context is:
using (var ts = new TransactionScope()) {
// LINQ2SQL will automatically enlist in the transaction scope.
// SubmitChanges now will NOT create a new transaction/commit each time.
DoImportStuffThatRunsWithinASingleTransaction();
// Important: Make sure to COMMIT the transaction.
// (The transaction used for SubmitChanges is committed to the DB.)
// This is when the disk sync actually has to happen,
// but it only happens once, not 3500 times!
ts.Complete();
}
However, the semantics of an approach using a single transaction or a single call to SubmitChanges are different than that of the code above calling SubmitChanges 3500 times and creating 3500 different implicit transactions. In particular, the size of the atomic operations (with respect to the database) is different and may not be suitable for all tasks.
For LINQ2SQL updates, changing the optimistic concurrency model (disabling it or just using a timestamp field, for instance) may result in small performance improvements. The biggest improvement, however, will come from reducing the number of commits that must be performed.
Happy coding.
i'm not positive on this, but it seems like the db.SubmitChanges() call should be made outside of the loop. maybe that would speed things up?
I'm right now using SQL server 2008 in my project to store and fetch data . this is going perfect till now . I can fetch 20000 records in less than 50ms (JSON) . but facing a problem with inserts stuff . in my project I need to be able to insert something like 100000 records every minute . and this is seems to be very slow with SQL server .
I've tried to use another database (NOSQL DB) like mongoDB which are very fast in storing data (5s) comparing to SQLServer(270s) but not fast as sql in fetching data(20000 => 180ms) .
So I'm asking here if there any way to make SQL faster in storing . or to make mongoDB faster in fetching ( I'm not an expert in mongoDB I know the very basic things about it ) .
public static void ExecuteNonQuery(string sql)
{
SqlConnection con = GetConnection();
con.Open();
SqlCommand cmd = new SqlCommand(sql, con);
try
{
cmd.ExecuteNonQuery();
}
finally
{
con.Close();
}
}
SQL's Insert function
public IEnumerable<T> GetRecords<T>(System.Linq.Expressions.Expression<Func<T, bool>> expression, int from, int to) where T : class, new()
{
return _db.GetCollection<T>(collectionName).Find<T>(expression).Skip(from).Limit(to).Documents;
}
Mongo's Select function ( MongoDB 1.6 )
Update
: data structure : (int) Id , (string) Data
I guess that you are executing each insert in a transaction of its own (an implicit transaction might have been created if you do not provide one explicitly). As SQL server needs to ensure that the transaction is committed to the hard drive each transaction has a overhead that is very significant.
To get things to go faster, try to perform many inserts (try with a thousand or so) in a single ExecuteNonQuery() call. Also do not open and close, but keep the connection open (thus being in the same transaction) for several inserts.
You should have a look at the SqlBulkCopy Class
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx
MongoDB is very fast on reads and writes. 50k reads and writes per second is doable on commodity hardware - depending on the data size. In addition to that you always have the option to scale out with sharding and replica sets but as said: 20k operations per seconds
with MongoDB is nothing.
Generally the speed on inserting data into the database is a function on the complexity of the operation.
If your inserts are significantly slow, then it points to optimisation problems with the inserts. Identify exaxtly what SQL insert statements your program is generating and then use the database EXPLAIN function to figure out what operations the underlying database is using. This often gives you a clue as to how you need to change your setup to increase the speed of these operations.
It might mean you have to change your database, or it might mean batching your inserts into a single call rather than inserting each item separately.
I see you are setting up and closing the connection each time.. this takes a significant time in itself. Try using a persistent connection.
I'm trying to write a program to convert a large amount of data from a legacy SQL Express system to a newer system based on SQL CE. Here's a quick snapshot of what's going on:
Most of the tables in the SQL Express install are small (< 10K records)
One table is --extremely-- large, and is well over 1 million records
For the smaller tables I can use LINQ just fine -- but the large table gives me problems. The standard way of:
foreach(var dataRow in ...)
{
table.InsertOnSubmit(dataRow);
}
database.SubmitChanges();
Is painfully slow and takes several hours to complete. I've even tried doing some simple "bulk" operations to try and eliminate one giant insertion at the end of the loop, ie:
foreach(var dataRow in ...)
{
if(count == BULK_LIMIT)
{
count = 0;
database.SubmitChanges();
}
count++;
table.InsertOnSubmit(dataRow);
}
// Final submit, to catch the last BULK_LIMIT item block
database.SubmitChanges();
I've tried a variety of bulk sizes, from relatively small values like 1K-5K to larger sizes up to 300K.
Ultimately I'm stuck and the process takes roughly the same amount of time (several hours) regardless of the bulk size.
So - does anyone know of a way to crank up the speed? The typical solution would be to use SqlBulkCopy, but that isn't compatible with SQL CE.
A couple of notes:
Yes I really do want all the records in SQL CE, and yes I've setup the connection to allow the database to max out at 4 GB.
Yes I really do need every last of the 1M+ records.
The stuff in each data row is all primitive, and is a mix of strings and timestamps.
The size of the legacy SQL Express database is ~400 MB.
Thanks in advance - all help is appreciated!
-- Dan
Use a parameterised INSERT statement: Prepare a command, set the parameter values in a loop and reuse the same command for each INSERT.
Remove any indexes and re-apply after you have performed all INSERTs.
Update: Chris Tacke has the fastest solution here using SqlCeResultset: Bulk Insert In SQLCE