Lately in apps I've been developing I have been checking the number of rows affected by an insert, update, delete to the database and logging an an error if the number is unexpected. For example on a simple insert, update, or delete of one row if any number of rows other than one is returned from an ExecuteNonQuery() call, I will consider that an error and log it. Also, I realize now as I type this that I do not even try to rollback the transaction if that happens, which is not the best practice and should definitely be addressed. Anyways, here's code to illustrate what I mean:
I'll have a data layer function that makes the call to the db:
public static int DLInsert(Person person)
{
Database db = DatabaseFactory.CreateDatabase("dbConnString");
using (DbCommand dbCommand = db.GetStoredProcCommand("dbo.Insert_Person"))
{
db.AddInParameter(dbCommand, "#FirstName", DbType.Byte, person.FirstName);
db.AddInParameter(dbCommand, "#LastName", DbType.String, person.LastName);
db.AddInParameter(dbCommand, "#Address", DbType.Boolean, person.Address);
return db.ExecuteNonQuery(dbCommand);
}
}
Then a business layer call to the data layer function:
public static bool BLInsert(Person person)
{
if (DLInsert(campusRating) != 1)
{
// log exception
return false;
}
return true;
}
And in the code-behind or view (I do both webforms and mvc projects):
if (BLInsert(person))
{
// carry on as normal with whatever other code after successful insert
}
else
{
// throw an exception that directs the user to one of my custom error pages
}
The more I use this type of code, the more I feel like it is overkill. Especially in the code-behind/view. Is there any legitimate reason to think a simple insert, update, or delete wouldn't actually modify the correct number of rows in the database? Is it more plausible to only worry about catching an actual SqlException and then handling that, instead of doing the monotonous check for rows affected every time?
Thanks. Hope you all can help me out.
UPDATE
Thanks everyone for taking the time to answer. I still haven't 100% decided what setup I will use going forward, but here's what I have taken away from all of your responses.
Trust the DB and .Net libraries to handle a query and do their job as they were designed to do.
Use transactions in my stored procedures to rollback the query on any errors and potentially use raiseerror to throw those exceptions back to the .Net code as a SqlException, which could handle these errors with a try/catch. This approach would replace the problematic return code checking.
Would there be any issue with the second bullet point that I am missing?
I guess the question becomes, "Why are you checking this?" If it's just because you don't trust the database to perform the query, then it's probably overkill. However, there could exist a logical reason to perform this check.
For example, I worked at a company once where this method was employed to check for concurrency errors. When a record was fetched from the database to be edited in the application, it would come with a LastModified timestamp. Then the standard CRUD operations in the data access layer would include a WHERE LastMotified=#LastModified clause when doing an UPDATE and check the record modified count. If no record was updated, it would assume a concurrency error had occurred.
I felt it was kind of sloppy for concurrency checking (especially the part about assuming the nature of the error), but it got the job done for the business.
What concerns me more in your example is the structure of how this is being accomplished. The 1 or 0 being returned from the data access code is a "magic number." That should be avoided. It's leaking an implementation detail from the data access code into the business logic code. If you do want to keep using this check, I'd recommend moving the check into the data access code and throwing an exception if it fails. In general, return codes should be avoided.
Edit: I just noticed a potentially harmful bug in your code as well, related to my last point above. What if more than one record is changed? It probably won't happen on an INSERT, but could easily happen on an UPDATE. Other parts of the code might assume that != 1 means no record was changed. That could make debugging very problematic :)
On the one hand, most of the time everything should behave the way you expect, and on those times the additional checks don't add anything to your application. On the other hand, if something does go wrong, not knowing about it means that the problem may become quite large before you notice it. In my opinion, the little bit of additional protection is worth the little bit of extra effort, especially if you implement a rollback on failure. It's kinda like an airbag in your car... it doesn't really serve a purpose if you never crash, but if you do it could save your life.
I've always prefered to raiserror in my sproc and handle exceptions rather than counting. This way, if you update a sproc to do something else, like logging/auditing, you don't have to worry about keeping the row counts in check.
Though if you like the second check in your code or would prefer not to deal with exceptions/raiserror, I've seen teams return 0 on successful sproc executions for every sproc in the db, and return another number otherwise.
It is absolutely overkill. You should trust that your core platform (.Net libraries, Sql Server) work correctly -you shouldn't be worrying about that.
Now, there are some related instances where you might want to test, like if transactions are correctly rolled back, etc.
If there's is a need for that check, why not do that check within the database itself? You save yourself from doing a round trip and it's done at a more 'centralized' stage - If you check in the database, you can be assured it's being applied consistently from any application that hits that database. Whereas if you put the logic in the UI, then you need to make sure that any UI application that hits that particular database applies the correct logic and does it consistently.
Related
Scenario: my query variable is dynamic, there are 4 possible values for that depending on the report type (_reportType). Meaning there are 4 different queries and some of it doesn't have #STAFF in the where condition, so my question is, is it safe to just leave my
dBCommand.AddParameter("#STAFF", staff)
there or should I include if else condition just to be safe?
Like this
if(_reportType == 1)
{
dBCommand.AddParameter("#STAFF", staff);
}
else if (_reportType == 2)
{
//code
}
else if (_reportType == 3)
{
//code
}
else
{
//Don't add dBCommand.AddParameter("#STAFF", staff);
}
Is it safe just to leave addParameter("#STAFF", staff) even though I'm not going to use it in a query?
Example I'm going to write
dBCommand.Initialize(string.Format(query, "RetailTable"), batch);
dBCommand.AddParameter("#STAFF", staff);
But the query value doesn't have #STAFF in the WHERE condition
It should generally be ok to specify unused parameters, aside from the minor overhead of sending the value to the server. The exception is if you execute DDL queries that have a restriction of being the only statement in the batch (e.g. CREATE VIEW). Those would fail due to the parameter.
There are 2 glaring bad practices in your approach:
1. Generating dynamic query within the code.
This approach has many drawbacks and possible security loopholes. You should almost always avoid doing that.
Please go through the following links to understand this more:
https://codingsight.com/dynamic-sql-vs-stored-procedure/
https://www.ecanarys.com/Blogs/ArticleID/112/SQL-injection-attack-and-prevention-using-stored-procedure
2. Trying to use generic Where Clause that fits all your variations.
This approach is disaster in waiting, regardless of the query being written in your application code OR in a Stored Procedure.
This is an ugly code-smell and a maintenance nightmare.
No developer can ever be 100% sure that there will not be any change required during the lifespan of the application due to a simple fact that the client WILL need enhancements on regular bases.
So, even if this approach may work for you for a small period of time, this will blow back.
Assume, over the period, there are few more filter parameters added due to new requirements. Now, imagine how your code would look like and the possibilities it creates of problems you may get if they are not handled properly. Specially when YOU are not making those changes. Scary, right?
Always write code that will not only be easier to read and understand, but also easy to enhance and maintain, regardless of the person writing the code.
So, IMHO, you should add those if-else conditions OR use switch-case blocks to safeguard yourself and your client. It may look overkill in the start, but will surely payoff in future.
Hope this help!
I get the feeling that this is, but I wanted to confirm-
is it bad form to do something like:
try
{
SqlUpload(table);
}
catch(PrimaryKeyException pke)
{
DeleteDuplicatesInTable(table);
SqlUpload(table);
}
catch(Exception ex)
{
Console.Write(ex);
}
Is it better to do this to potentially save on efficiency in case the table doesn't have duplicates, or is it better to run the delete duplicates bit anyway? (I'm assuming conditions where the table will upload as long as there are no duplicates within the table itself). Also, given how try-catch statements impact performance, would it even be faster to do it this way at all?
I apologize for the crude nature of this example, but it was just to illustrate a point.
Exceptions can be used correctly in transaction management, but it is not the case in this example. On my first look, it appeared that this seemed similar to what Linq2Sql's DataContext class does in its call to SubmitChanges(). However, that analogy was incorrect. (Please see at Chris Marisic's comment to my post for an accurate criticism of the comparison).
On exceptions
In general, if there is some issue that is likely to be encountered, you should check for it first. Exceptions should be used when a response is truly "exceptional" (meaning it is unexpected in the context of proper usage). If the proper usage of a function within a completely valid context throws an exception , then you are probably using exceptions incorrectly.
Excerpt from DataContext.SubmitChanges
This code shows an example of the correct usage of exceptions in transaction management.
Note: Just because Microsoft does it, doesn't automatically mean its right. However, their code does have a pretty good track record.
DbTransaction transaction = null;
try
{
try
{
if (this.provider.Connection.State == ConnectionState.Open)
{
this.provider.ClearConnection();
}
if (this.provider.Connection.State == ConnectionState.Closed)
{
this.provider.Connection.Open();
flag = true;
}
transaction = this.provider.Connection.BeginTransaction(IsolationLevel.ReadCommitted);
this.provider.Transaction = transaction;
new ChangeProcessor(this.services, this).SubmitChanges(failureMode);
this.AcceptChanges();
this.provider.ClearConnection();
transaction.Commit();
}
catch
{
if (transaction != null)
{
try
{
transaction.Rollback();
}
catch
{
}
}
throw;
}
return;
}
finally
{
this.provider.Transaction = null;
if (flag)
{
this.provider.Connection.Close();
}
}
}
Yes, this type of code is considered bad form in .NET.
You would be better off writing either code similar to
if(HasPrimaryKeyViolations(table))
DeletePrimaryKeyViolations(table)
SqlUpload(table)
By reading this code I would assume that primary key violations are an exceptional case and not expected - if they are I think you should remove the duplicates beforehand, do not rely on exception handling for an expected case.
In all common languages/interpreters/compilers, exception handling is implemented to have a minimal performance impact when an exception isn't raised -- under the hood, adding an exception handler is usually just pushing a single value onto a stack or something similar. Just adding a try block wont usually have a performance impact.
On the other hand, when an exception is actually raised, things can get very slow very fast. Its the trade-off for being able to add the try block without worrying about worrying, and its usually seen as acceptable, because you only take the performance hit if something unexpected has already gone wrong somewhere else.
So, in theory, if there is a condition that you expect to happen, use an if instead. Its semantically better because it expresses to the reader that the bad condition is probably going to happen from time to time (e.g., user types in some invalid input), while the try expresses something that you hope never happens (the data source is corrupt). As above, its also going to be easier on performance.
Of course, rules are defined by their exceptions (no pun intended). In practice, there are two situations where this becomes the wrong answer:
First, if you are performing a complex operation like parsing a file, and its and all-or-nothing -- if one field in the file is corrupt, you want to bail on the whole operation. Exceptions allow you to jump out of the whole complex process up to an exception handler encapsulating the entire parse. Sure, you could litter the parsing code with checks and return values and checks on the return values -- but its going to be a lot cleaner just to throw the exception and let it rise up to the top of the operation. Even if you expect that the input is going to be bad sometimes, if there isn't a reasonable way to handle the error exactly at the point where the error occurs, use exceptions to let the error rise up to a more appropriate place to handle it. Thats really what exceptions were for in the first place -- getting rid of all that error handling code down in the details, and moving it to one, consolidated, reasonable place.
Second, a library might not let you make the choice. For example, int.TryParse is a good alternative to int.Parse if the input hasn't already been vetted. On the other hand, a library might not have a non-exception-throwing option. In that case, don't brew up your own code to check without exceptions -- though it might be bad form to use exception handling to check for an expected condition, its worse form to duplicate the functionality of the library, just to avoid an exception. Just use the try/catch, and maybe add a snide comment about how you didn't WANT to do it, but the library authors MADE you :).
For your particular example, probably stick with the exception. While exception handling isn't considered 'fast,' its still faster than a round trip to the database; and there isn't going to be a reasonable way to check for that exception without sending the command anyways. Plus, hitting a database in interfacing with an external system -- that in itself is a pretty to good reason to expect the unexpected -- exception handling almost always makes sense when you are leaving your particular controlled environment.
Or, more specifically to your example, you may consider using a stored procedure with a MERGE statement, to use the source data in table to update or insert as appropriate; the update will be a little friendlier on all fronts than doing a delete-then-insert for existing keys.
try catches are expensive on performance, so don't use them as control structures. Instead use triggers at the database level.
One problem is that any exception caused by call of SqlUpload() in the catch block causes the application to crash, because there is no further exception handling.
You'll probably get a few different opinions on this, but mine is that try...catch should be used for things that shouldn't normally happen (although sometimes it's unavoidable). So by that rule, you should ask if the duplicates are in the table by normal execution of the program or if they should not exist given allowable execution of the program.
Just to clarify, I'm saying "normal" usage of the program, not "correct" usage (e.g. when I test it the duplicates don't appear (correct usage) but when the customer uses it they do (perhaps incorrect but normal), so I need to get rid of them). It's more that the duplicates would only appear in a way that the program cannot control (e.g. sometimes someone goes into the database and adds a duplicate row (hopefully not a normal situation)).
Something like duplicate rows, though, is likely a symptom of some other bug, so you shouldn't mask it but try and find the root cause so that the deletion isn't necessary.
I have to import about 30k rows from a CSV file to my SQL database, this sadly takes 20 minutes.
Troubleshooting with a profiler shows me that DbSet.Add is taking the most time, but why?
I have these Entity Framework Code-First classes:
public class Article
{
// About 20 properties, each property doesn't store excessive amounts of data
}
public class Database : DbContext
{
public DbSet<Article> Articles { get; set; }
}
For each item in my for loop I do:
db.Articles.Add(article);
Outside the for loop I do:
db.SaveChanges();
It's connected with my local SQLExpress server, but I guess there isn't anything written till SaveChanges is being called so I guess the server won't be the problem....
As per Kevin Ramen's comment (Mar 29)
I can confirm that setting db.Configuration.AutoDetectChangesEnabled = false makes a huge difference in speed
Running Add() on 2324 items by default ran 3min 15sec on my machine, disabling the auto-detection resulted in the operation completing in 0.5sec.
http://blog.larud.net/archive/2011/07/12/bulk-load-items-to-a-ef-4-1-code-first-aspx
I'm going to add to Kervin Ramen's comment by saying that if you are only doing inserts (no updates or deletes) then you can, in general, safely set the following properties before doing any inserts on the context:
DbContext.Configuration.AutoDetectChangesEnabled = false;
DbContext.Configuration.ValidateOnSaveEnabled = false;
I was having a problem with a once-off bulk import at my work. Without setting the above properties, adding about 7500 complicated objects to the context was taking over 30 minutes. Setting the above properties (so disabling EF checks and change tracking) reduced the import down to seconds.
But, again, I stress only use this if you are doing inserts. If you need to mix inserts with updates/deletes you can split your code into two paths and disable the EF checks for the insert part and then re-enable the checks for the update/delete path. I have used this approach succesfully to get around the slow DbSet.Add() behaviour.
Each item in a unit-of-work has overhead, as it must check (and update) the identity manager, add to various collections, etc.
The first thing I would try is batching into, say, groups of 500 (change that number to suit), starting with a fresh (new) object-context each time - as otherwise you can reasonably expect telescoping performance. Breaking it into batches also prevents a megalithic transaction bringing everything to a stop.
Beyond that; SqlBulkCopy. It is designed for large imports with minimal overhead. It isn't EF though.
There is an extremely easy to use and very fast extension here:
https://efbulkinsert.codeplex.com/
It's called "Entity Framework Bulk Insert".
Extension itself is in namespace EntityFramework.BulkInsert.Extensions. So to reveal the extension method add using
using EntityFramework.BulkInsert.Extensions;
And then you can do this
context.BulkInsert(entities);
BTW - If you do not wish to use this extension for some reason, you could also try instead of running db.Articles.Add(article) for each article, to create each time a list of several articles and then use AddRange (new in EF version 6, along with RemoveRange) to add them together to the dbcontext.
I haven't really tried this, but my logic would be to hold on to ODBC driver to load file into datatable and then to use sql stored procedure to pass table to procedure.
For the first part, try:
http://www.c-sharpcorner.com/UploadFile/mahesh/AccessTextDb12052005071306AM/AccessTextDb.aspx
For the second part try this for SQL procedure:
http://www.builderau.com.au/program/sqlserver/soa/Passing-table-valued-parameters-in-SQL-Server-2008/0,339028455,339282577,00.htm
And create SqlCommnand object in c# and add to its Parameters collection SqlParameter that is SqlDbType.Structured
Well, I hope it helps.
I am going though this msdn article by noted DDD expert Udi Dahan, where he makes a great observation that he said took him years to realize; "Bringing all e-mail addresses into memory would probably get you locked up by the performance police. Even having the domain model call some service, which calls the database, to see if the e-mail address is there is unnecessary. A unique constraint in the database would suffice."
In a LOB presentation that captured some add or edit scenario, you wouldn't enable the Save type action until all edits were considered valid, so the first trade-off doing the above is that you need to enable Save and be prepared to notify the user if the uniqueness constraint is violated. But how best to do that, say with NHibernate?
I figure it needs to follow the lines of the pseudo-code below. Does anyone do something along these lines now?
Cheers,
Berryl
try {}
catch (GenericADOException)
{
// "Abort due to constraint violation\r\ncolumn {0} is not unique", columnName)
//(1) determine which db column violated uniqueness
//(2) potentially map the column name to something in context to the user
//(3) throw that can be translated into a BrokenRule for the UI presentation
//(4) reset the nhibernate session
}
The pessimistic approach is to check for unique-ness before saving; the optimistic approach is to attempt the save and handle the exception. The biggest problem with the optimistic approach is that you have to be able to parse the exception returned by the database to know that it's a specific unique constraint violation rather than the myriad of other things that can go wrong.
For this reason, it's much easier to check unique-ness before saving. It's a trivial database call to make this check: select 1 where email = 'newuser#somewhere.com'. It's also a better user experience to notify the user that the value is a duplicate (perhaps they already registered with the site?) before making them fill out the rest of the form and click save.
The unique constraint should definitely be in place, but the UI should check that the address is unique at the time it is entered on the form.
I have a simple question for you (i hope) :)
I have pretty much always used void as a "return" type when doing CRUD operations on data.
Eg. Consider this code:
public void Insert(IAuctionItem item) {
if (item == null) {
AuctionLogger.LogException(new ArgumentNullException("item is null"));
}
_dataStore.DataContext.AuctionItems.InsertOnSubmit((AuctionItem)item);
_dataStore.DataContext.SubmitChanges();
}
and then considen this code:
public bool Insert(IAuctionItem item) {
if (item == null) {
AuctionLogger.LogException(new ArgumentNullException("item is null"));
}
_dataStore.DataContext.AuctionItems.InsertOnSubmit((AuctionItem)item);
_dataStore.DataContext.SubmitChanges();
return true;
}
It actually just comes down to whether you should notify that something was inserted (and went well) or not ?
I typically go with the first option there.
Given your code, if something goes wrong with the insert there will be an Exception thrown.
Since you have no try/catch block around the Data Access code, the calling code will have to handle that Exception...thus it will know both if and why it failed. If you just returned true/false, the calling code will have no idea why there was a failure (it may or may not care).
I think it would make more sense if in the case where "item == null" that you returned "false". That would indicate that it was a case that you expect to happen not infrequently, and that therefore you don't want it to raise an exception but the calling code could handle the "false" return value.
As it standards, you'll return "true" or there'll be an exception - that doesn't really help you much.
Don't fight the framework you happen to be in. If you are writing C code, where return values are the most common mechanism for communicating errors (for lack of a better built in construct), then use that.
.NET base class libraries use Exceptions to communicate errors and their absence means everything is okay. Because almost all code uses the BCL, much of it will be written to expect exceptions, except when it gets to a library written as if C# was C with no support for Exceptions, each invocation will need to be wrapped in a if(!myObject.DoSomething){ System.Writeline("Damn");} block.
For the next developer to use your code (which could be you after a few years when you've forgotten how you originally did it), it will be a pain to start writing all the calling code to take advantage of having error conditions passed as return values, as changes to values in an output parameter, as custom events, as callbacks, as messages to queue or any of the other imaginable ways to communicate failure or lack thereof.
I think it depends. Imaging that your user want to add a new post onto a forum. And the adding fail by some reason, then if you don't tell the user, they will never know that something wrong. The best way is to throw another exception with a nice message for them
And if it does not relate to the user, and you already logged it out to database log, you shouldn't care about return or not any more
I think it is a good idea to notify the user if the operation went well or not. Regardless how much you test your code and try to think out of the box, it is most likely that during its existence the software will encounter a problem you did not cater for, thus making it behave incorrectly. The use of notifications, to my opinion, allow the user to take action, a sort of Plan B if you like when the program fails. This action can either be a simple work around or else, inform people from the IT department so that they can fix it.
I'd rather click that extra "OK" button than learn that something went wrong when it is too late.
You should stick with void, if you need more data - use variables for it, as either you'll need specific data (And it can be more than one number/string) and an excpetion mechanism is a good solution for handling errors.
so.. if you want to know how many rows affected, if a sp returned something ect... - a return type will limit you..