I want to insert many rows (constructed from Entity Framework objects) to SQL Server. The problem is, some of string properties have length exceeded length of column in database, which causes an exception, and then all of rows will unable to insert into database.
So I wonder that if there is a way to tell SqlBulkCopy to automatically truncate any over-length rows? Of course, I can check and substring each property if it exceeds the limited length, before insert it in to a DataTable, but it would slow down the whole program.
Always use a staging/load table for bulk actions.
Then you can process, clean, scrub etc the data before flushing to the real table. This includes, LEFTs, lookups, de-duplications etc
So:
Load a staging table with wide columns
Flush from staging to "real" table using INSERT realtable (..) SELECT LEFT(..), .. FROM Staging
Unfortunately there is no direct way of doing that with SqlBulkCopy. SQL Bulk Inserts are by nature almost "dumb" but that's why they are so fast. They aren't even logged (except capturing SqlRowsCopied event) so if something fails, there's not much information. What you're looking for would in a way, be counter to the purpose of this class
But there can be 2 possible ways:
You can try using SqlBulkCopyOptionsEnumeration (passed to SqlBulkCopy() Constructor) and use SqlBulkCopyOptions.CheckConstraints (Check constraints while data is being inserted. By default, constraints are not checked.).
Or you can use SqlBulkCopyOptions.FireTriggers Enumeration (When specified, cause the server to fire the insert triggers for the rows being inserted into the database.) and handle the exception in SQL Server Insert Trigger.
Try Using SQLTransaction Class while using SQLBulkCopy Class
Related
I have 5 tables, for each table i want to delete data and insert 100 records.
all operation should happen in single transaction.
In real terms, 100 records is not much; if performance is your aim, you could try using SqlBulkCopy - this uses raw TDS, and supports transactions via the optional constructor argument. In terms of how to feed data in: SqlBulkCopy accepts either DataTable or a data reader; "FastMember" allows you to treat a List<T> or similar as a data-reader, if that helps (see example at the bottom of this page). However, you should be careful to actually time that against doing the same thing with regular parameterized TSQL operations, as for a number like 100 it isn't guaranteed that SqlBulkCopy would be faster.
An alternative to SqlBulkCopy would be "table valued parameters"; this requires more config, as the user type needs to be defined on the server etc, but; it can work.
I am facing an issue I hope to get it solved by here. I have 3 different tables in a DataSet and I want to insert it in the database table.
I know I can do this using SqlBulkCopy but there is a catch and that is I want to check if the data already exists in the database then I want it to get updated instead of insert.
And if the data doesn't exist in the database table, I want to insert it then. Any help on this would be appreciated.
I know I can iterate it through each record and then fire a procedure which will check for its existence if it exists den update or else insert. But the data size is huge and iterating through each record would be a time taking process, I don't want to use this approach.
Regards
Disclaimer: I'm the owner of the project Bulk Operations
This project allows to BulkInsert, BulkUpdate, BulkDelete, and BulkMerge (Upsert).
Under the hood, it does almost what #marc_s have suggested (Use SqlBulkCopy into a temporary table and perform a merge statement to insert or update depending on the primary key).
var bulk = new BulkOperation(connection);
bulk.BulkMerge(dt);
I have a web application that is written in MVC.Net using C# and LINQ-to-SQL (SQL Server 2008 R2).
I'd like to query the database for some values, and also insert those values into another table for later use. Obviously, I could do a normal select, then take those results and do a normal insert, but that will result in my application sending the values back to the SQL server, which is a waste as the server is where the values came from.
Is there any way I can get the select results in my application and insert them into another table without the information making a roundtrip from the the SQL server to my application and back again?
It would be cool if this was in one query, but that's less important than avoiding the roundtrip.
Assume whatever basic schema you like, I'll be extrapolating your simple example to a much more complex query.
Can I Insert the Results of a Select Statement Into Another Table Without a Roundtrip?
From a "single-query" and/or "avoid the round-trip" perspective: Yes.
From a "doing that purely in Linq to SQL" perspective: Well...mostly ;-).
The three pieces required are:
The INSERT...SELECT construct:
By using this we get half of the goal in that we have selected data and inserted it. And this is the only way to keep the data entirely at the database server and avoid the round-trip. Unfortunately, this construct is not supported by Linq-to-SQL (or Entity Framework): Insert/Select with Linq-To-SQL
The T-SQL OUTPUT clause:
This allows for doing what is essentially the tee command in Unix shell scripting: save and display the incoming rows at the same time. The OUTPUT clause just takes the set of inserted rows and sends it back to the caller, providing the other half of the goal. Unfortunately, this is also not supported by Linq-to-SQL (or Entity Framework). Now, this type of operation can also be achieved across multiple queries when not using OUTPUT, but there is really nothing gained since you then either need to a) create a temp table to dump the initial results into that will be used to insert into the table and then selected back to the caller, or b) have some way of knowing which rows that were just inserted into the table are new so that they can be properly selected back to the caller.
The DataContext.ExecuteQuery<TResult> (String, Object[]) method:
This is needed due to the two required T-SQL pieces not being supported directly in Linq-to-SQL. And even if the clunky approach to avoiding the OUTPUT clause is done (assuming it could be done in pure Linq/Lambda expressions), there is still no way around the INSERT...SELECT construct that would not be a round-trip.
Hence, multiple queries that are all pure Linq/Lambda expressions equates to a round-trip.
The only way to truly avoid the round-trip should be something like:
var _MyStuff = db.ExecuteQuery<Stuffs>(#"
INSERT INTO dbo.Table1 (Col1, Col2, Col2)
OUTPUT INSERTED.*
SELECT Col1, Col2, Col3
FROM dbo.Table2 t2
WHERE t2.Col4 = {0};",
_SomeID);
And just in case it helps anyone (since I already spent the time looking it up :), the equivalent command for Entity Framework is: Database.SqlQuery<TElement> (String, Object[])
try this query according your requirement
insert into IndentProcessDetails (DemandId,DemandMasterId,DemandQty) ( select DemandId,DemandMasterId,DemandQty from DemandDetails)
I'm looking for an efficient way of inserting records into SQL server for my C#/MVC application. Anyone know what the best method would be?
Normally I've just done a while loop and insert statement within, but then again I've not had quite so many records to deal with. I need to insert around half a million, and at 300 rows a minute with the while loop, I'll be here all day!
What I'm doing is looping through a large holding table, and using it's rows to create records in a different table. I've set up some functions for lookup data which is necessary for the new table, and this is no doubt adding to the drain.
So here is the query I have. Extremely inefficient for large amounts of data!
Declare #HoldingID int
Set #HoldingID = (Select min(HoldingID) From HoldingTable)
While #JourneyHoldingID IS NOT NULL
Begin
Insert Into Journeys (DepartureID, ArrivalID, ProviderID, JourneyNumber, Active)
Select
dbo.GetHubIDFromName(StartHubName),
dbo.GetHubIDFromName(EndHubName),
dbo.GetBusIDFromName(CompanyName),
JourneyNo, 1
From Holding
Where HoldingID = #HoldingID
Set #HoldingID = (Select MIN(HoldingID) From Holding Where HoldingID > #HoldingID)
End
I've heard about set-based approaches - is there anything that might work for the above problem?
If you want to insert a lot of data into a MSSQL Server then you should use BULK INSERTs - there is a command line tool called the bcp utility for this, and also a C# wrapper for performing Bulk Copy Operations, but under the covers they are all using BULK INSERT.
Depending on your application you may want to insert your data into a staging table first, and then either MERGE or INSERT INTO SELECT... to transfer those rows from the staging table(s) to the target table(s) - if you have a lot of data then this will take some time, however will be a lot quicker than performing the inserts individually.
If you want to speed this up then are various things that you can do such as changing the recovery model or tweaking / removing triggers and indexes (depending on whether or not this is a live database or not). If its still really slow then you should look into doing this process in batches (e.g. 1000 rows at a time).
This should be exactly what you are doing now.
Insert Into Journeys(DepartureID, ArrivalID, ProviderID, JourneyNumber, Active)
Select
dbo.GetHubIDFromName(StartHubName),
dbo.GetHubIDFromName(EndHubName),
dbo.GetBusIDFromName(CompanyName),
JourneyNo, 1
From Holding
ORDER BY HoldingID ASC
you (probably) are able to do it in one statement of the form
INSERT INTO JOURNEYS
SELECT * FROM HOLDING;
Without more information about your schema it is difficult to be absolutely sure.
SQLServer 2008 introduced Table Parameters. These allow you to insert multiple rows in a single trip to the database (send it as a large blob). Without using a temporary table. This article describes how it works (step four in the article)
http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/
It differs from bulk inserts in that you do not need special utilities and that all constraints and foreign keys are checked.
I quadrupled my throughput using this and parallelizing the inserts. Now at 15.000 inserts/second in the same table sustained. Regular table with indexes and over a billion rows.
How would I get the primary key ID number from a Table without making a second trip to the database in LINQ To SQL?
Right now, I submit the data to a table, and make another trip to figure out what id was assigned to the new field (in an auto increment id field). I want to do this in LINQ To SQL and not in Raw SQL (I no longer use Raw SQL).
Also, second part of my question is: I am always careful to know the ID of a user that's online because I'd rather call their information in various tables using their ID as opposed to using a GUID or a username, which are all long strings. I do this because I think that SQL Server doing a numeric compare is much (?) more efficient than doing a username (string) or even a guid (very long string) compare. My questions is, am I more concerned than I should be? Is the difference worth always keeping the userid (int32) in say, session state?
#RedFilter provided some interesting/promising leads for the first question, because I am at this stage unable to try them, if anyone knows or can confirm these changes that he recommended in the comments section of his answer?
If you have a reference to the object, you can just use that reference and call the primary key after you call db.SubmitChanges(). The LINQ object will automatically update its (Identifier) primary key field to reflect the new one assigned to it via SQL Server.
Example (vb.net):
Dim db As New NorthwindDataContext
Dim prod As New Product
prod.ProductName = "cheese!"
db.Products.InsertOnSubmit(prod)
db.SubmitChanges()
MessageBox.Show(prod.ProductID)
You could probably include the above code in a function and return the ProductID (or equivalent primary key) and use it somewhere else.
EDIT: If you are not doing atomic updates, you could add each new product to a separate Collection and iterate through it after you call SubmitChanges. I wish LINQ provided a 'database sneak peek' like a dataset would.
Unless you are doing something out of the ordinary, you should not need to do anything extra to retrieve the primary key that is generated.
When you call SubmitChanges on your Linq-to-SQL datacontext, it automatically updates the primary key values for your objects.
Regarding your second question - there may be a small performance improvement by doing a scan on a numeric field as opposed to something like varchar() but you will see much better performance either way by ensuring that you have the correct columns in your database indexed. And, with SQL Server if you create a primary key using an identity column, it will by default have a clustered index over it.
Linq to SQL automatically sets the identity value of your class with the ID generated when you insert a new record. Just access the property. I don't know if it uses a separate query for this or not, having never used it, but it is not unusual for ORMs to require another query to get back the last inserted ID.
Two ways you can do this independent of Linq To SQL (that may work with it):
1) If you are using SQL Server 2005 or higher, you can use the OUTPUT clause:
Returns information from, or
expressions based on, each row
affected by an INSERT, UPDATE, or
DELETE statement. These results can be
returned to the processing application
for use in such things as confirmation
messages, archiving, and other such
application requirements.
Alternatively, results can be inserted
into a table or table variable.
2) Alternately, you can construct a batch INSERT statement like this:
insert into MyTable
(field1)
values
('xxx');
select scope_identity();
which works at least as far back as SQL Server 2000.
In T-SQL, you could use the OUTPUT clause, saying:
INSERT table (columns...)
OUTPUT inserted.ID
SELECT columns...
So if you can configure LINQ to use that construct for doing inserts, then you can probably get it back easily. But whether LINQ can get a value back from an insert, I'll let someone else answer that.
Calling a stored procedure from LINQ that returns the ID as an output parameter is probably the easiest approach.