I have studied about Dapper and ADO.NET and performed select tests on both and found that sometimes ADO.NET is faster than Dapper and sometimes is reversed. I understand this could be database issues as i am using SQL Server. As it is stated that reflection is slow and i am using reflection in ADO.NET. So can anyone tell me which approach is the fastest?
Here what i coded.
Using ADO.NET
DashboardResponseModel dashResp = null;
SqlConnection conn = new SqlConnection(connStr);
try
{
SqlCommand cmd = new SqlCommand("spGetMerchantDashboard", conn);
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.AddWithValue("#MID", mid);
conn.Open();
var dr = cmd.ExecuteReader();
List<MerchantProduct> lstMerProd = dr.MapToList<MerchantProduct>();
List<MerchantPayment> lstMerPay = dr.MapToList<MerchantPayment>();
if (lstMerProd != null || lstMerPay != null)
{
dashResp = new DashboardResponseModel();
dashResp.MerchantProduct = lstMerProd == null ? new
List<MerchantProduct>() : lstMerProd;
dashResp.MerchantPayment = lstMerPay == null ? new
List<MerchantPayment>() : lstMerPay;
}
dr.Close();
}
return dashResp;
Using Dapper
DashboardResponseModel dashResp = null;
var multipleresult = db.QueryMultiple("spGetMerchantDashboard", new { mid =
mid }, commandType: CommandType.StoredProcedure);
var merchantproduct = multipleresult.Read<MerchantProduct>().ToList();
var merchantpayment = multipleresult.Read<MerchantPayment>().ToList();
if (merchantproduct.Count > 0 || merchantpayment.Count > 0)
dashResp = new DashboardResponseModel { MerchantProduct =
merchantproduct, MerchantPayment = merchantpayment };
return dashResp;
Dapper basically straddles ADO.NET as a very thin abstraction - so in theory it can't be faster than well written ADO.NET code (although to be honest: most people don't write well written ADO.NET code).
It can be virtually indistinguishable, though; assuming you're using just Dapper (not any of the things that sit on top of it) then it doesn't include any query generation, expression tree / DSL parsing, complex model configuration, or any of those other things that tend to make full ORMs more flexible but more expensive.
Instead: it focuses just on executing user-supplied queries and mapping results; what it does is to generate all of the materialization code (how to map MerchantProduct to your columns) via IL-emit and cache that somewhere. Likewise it prepares much of the parameter preparation code in the same way. So at runtime it is usually just fetching two delegate instances from cache and invoking them.
Since the combination of (latency to the RDBMS + query execution cost + network bandwidth cost of the results) is going to be much higher than the overhead of fetching two delegates from dictionaries, we can essentially ignore that cost.
In short: it would be rare that you can measure a significant overhead here.
As a minor optimization to your code: prefer AsList() to ToList() to avoid creating a copy.
Theory:
Dapper is micro-ORM or a Data Mapper. It internally uses ADO.NET. Additionally, Dapper maps the ADO.NET data structures (DataReader for say) to your custom POCO classes. As this is additional work Dapper does, in theory, it cannot be faster than ADO.NET.
Following is copied from one of the comments (#MarcGravell) for this answer:
it can't be faster than the raw API that it sits on top of; it can, however, be faster than the typical ADO.NET consuming code - most code that consumes ADO.NET tends to be badly written, inefficient etc; and don't even get me started on DataTable :)
It is assumed that ADO.NET is used properly in optimized ways while this comparison. Otherwise, the result may be opposite; but that is not fault of ADO.NET. If ADO.NET used incorrectly, it may under-perform than Dapper. This is what happens while using ADO.NET directly bypassing Dapper.
Practical:
Dapper in most of the cases perform equally (negligible difference) compared to ADO.NET. Dapper internally implements many optimizations recommended for ADO.NET those are in its scope. Also, it forces many good ADO.NET coding practices those ultimately improve performance (and security).
As mapping is core part of Dapper, it is much optimized by use of IL. This makes Dapper better choice than manually mapping in code.
Refer this blog which explains how Dapper was invented and how it is optimized for performance: https://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM
In following scenario, Dapper MAY be slower:
If returned data structure is large enough (which increases the mapping time), Dapper will be slightly slower. But, this is equally true for ADO.NET as well. As said earlier, mapper part of Dapper is much optimized; so it is still better choice than manual-mapping in code. Further, Dapper provides buffered parameter; if set to false, Dapper does not materialize the list. It simply hands over each item to you in iterator. Refer comment on this answer by #Marc.
Dapper does not implement provider-specific features as it is written over IDbConnection. This may hit the performance in those very rare cases. But this can be done if you implement an interface to tell Dapper how to do this.
Dapper does not support preparing the statements. That may be an issue in very few cases. Read this blog.
With this slight and rare performance hit, you get huge benefits including strongly typed data structure and much less and manageable code. This is really a big gain.
There are many performance comparison statistics of Dapper (with other ORMs and ADO.NET) available on net; have a look just in case you are interested.
Related
I'm doing a Lot of Work With EntityFramework, like millions Inserts and Updates.
However, by time it Get Slower and Slower...
I tried usign some ways to improve performance. Like:
db.Configuration.AutoDetectChangesEnabled = false;
db.Configuration.ValidateOnSaveEnabled = false;
tried too:
db.Table.AsNoTracking();
When i change all this things it really gets Faster. However Memory used start to increases and until it give me exception.
Has anyone had this situation?
Thanks
The DbContext stores all the entities you have fetched or added to a DbSet. As others have suggested, you need to dispose of the context after each group of operations (a set of closely-related operations - e.g. a web request) and create a new one.
In the case of inserting millions of entities, that might mean creating a new context every 1,000 entities for example. This answer gives you all you need to know about inserting thousands of entities.
If you are doing only insertion and updates - try to use db.Database.SqlQuery(queryString, object).
Entity framework keeps in memory all attached objects. So having millions of them may cause a memory leak.
https://github.com/loresoft/EntityFramework.Extended offers a clean interface for doing faster bulk updates, and deletes. I think it only works with SQL Server, but it may give you a quick solution to your performance issue.
Updates can be done like this:
context.Users.Where(u => u.FirstName == "Firstname").Delete();
Deletes can be done in a similar fashion:
context.Tasks.Where(t => t.StatusId == 1).Update(t => new Task { StatusId = 2 });
For millions insert and Update, Everything give out of memory, i've tried all..
Only worked for me when i stop use the context and use ADO or Another Micro ORM like Dapper.
I felt some delay on Loading Contents while Using Transactions to Edit the contents,
(Testing this situation is a bit hard for me as I don't know how could be better to test it)
I have some doubts about Transactions usages:
There are some minor issues and things I should understand about Transactions
and these parts are related to this question :
When should we use Transactions in a Own-Made CMS ?
My-case-specific notes :
Should I use transactions on any CMS , While we have sprocs on Insert,Update,Retrieve, .... ?
Is the necessity of using transactions just when we are working on more tables than one ?
The Transaction strategy I used :
Adding Product Method ( Which uses add Product sproc ) :
TransactionOptions txOptions = new TransactionOptions();
using (TransactionScope txScope = new TransactionScope
(TransactionScopeOption.Required, txOptions))
{
try
{
connection.Open();
command.ExecuteNonQuery();
LastInserted = (int)pInsertedID.Value;
txScope.Complete();
}
catch (Exception ex)
{
logErrors.Warn(ex.Message);
}
finally
{
command.Dispose();
connection.Close();
}
Transactions may help to ensure consistency of the database. For example, if a stored procedure used to add a product inserts data in more than one table, and something fails along the way, a transaction might be helpful to rollback the whole operation, thus the database is free of half-baked products (e.g. lacking some critical info in related tables).
Transaction scopes (TransactionScope) are used to provide an ambient implicit transaction for whatever code runs inside a code block. These scopes may help to severely simplify the code, however, they also may add complexities in multithreaded environments (unfortunately, I don't know quite a lot about such cases).
Therefore, the code you provided would probably make sense to ensure database's consistency, especially if the command uses more than one table. It may add some performance overhead; however, you would be better off relying on gathered profiling data rather than any sort of feelings before conducting any optimizations (i.e. try to gather some quantitative data as to how slower things are under transactions). Modern database engines usually handle transactions quite efficiently; in my own experience, there were no transactions for removal due to their performance overhead.
What's the common approach to design applications, which strongly rely on lazy evaluation in C# (LINQ, IEnumerable, IQueryable, ...)?
Right now I usually attempt to make every query as lazy as possible, using yield return and LINQ queries, but in runtime this could usually lead to "too lazy" behavior, when every query gets builts from it's beginning obviously resulting in severe visual performance degradation.
What I usually do means putting ToList() projection operators somewhere to cache the data, but I suspect this approach might be incorrect.
What's the appropriate / common ways to design this sort of applications from the very beginning?
I find it useful to classify each IEnumerable into one of three categories.
fast ones - e.g. lists and arrays
slow ones - e.g. database queries or heavy calculations
non-deterministic ones - e.g. list.Select(x => new { ... })
For category 1, I tend keep the concrete type when appropriate, arrays or IList etc.
For category 3, those are best to keep local within a method, to avoid hard-to find bugs.
Then we have category 2, and as always when optimizing performance, measure first to find the bottlenecks.
A few random thoughts - as the question itself is loosely defined:
Lazy is good only when the result might not be used hence loaded only when needed. Most operations, however, would need the data to be loaded so laziness is not good in that term.
Laziness can cause difficult bugs. We have seen it all with data contexts in ORMs
Lazy is good when it comes to MEF
Pretty broad question and unfortunately you're going to hear this a lot: It depends. Lazy-loading is great until it's not.
In general, if you're using the same IEnumerables over and over it might be best to cache them as lists.
But rarely does it make sense for your callers to know this either way. That is, if you're getting IEnumerables from a repository or something, it is best to let the repository do its job. It might cache it as a list internally or it might build it up every time. If your callers try to get too clever they might miss changes in the data, etc.
I would suggest doing a ToList in your DAL before returning the DTO
public IList<UserDTO> GetUsers()
{
using (var db = new DbContext())
{
return (from u in db.tblUsers
select new UserDTO()
{
Name = u.Name
}).ToList();
}
}
In the example above you have to do a ToList() before the DbContext scope ends.
I you need a certain sequence of data to be cached, call one of the aggregation operators (ToList, ToArray, etc.) on that sequence. Otherwise just use lazy evaluation.
Build your code around your data. What data is volatile and needs to be pulled fresh each time? Use lazy evaluation and don't cache. What data is relatively static and only needs to be pulled once? Cache that data in memory so you don't pull it unnecessarily.
Deferred execution and caching all items with .ToList() are not the only options. The third option is to cache the items while you are iterating by using a lazy List.
The execution is still deferred but all items are only yielded once. An example of how this work:
public class LazyListTest
{
private int _count = 0;
public void Test()
{
var numbers = Enumerable.Range(1, 40);
var numbersQuery = numbers.Select(GetElement).ToLazyList(); // Cache lazy
var total = numbersQuery.Take(3)
.Concat(numbersQuery.Take(10))
.Concat(numbersQuery.Take(3))
.Sum();
Console.WriteLine(_count);
}
private int GetElement(int value)
{
_count++;
// Some slow stuff here...
return value * 100;
}
}
If you run the Test() method, the _count is only 10. Without caching it would be 16 and with .ToList() it would be 40!
An example of the implementation of LazyList can be found here.
I am wondering about general performance of LINQ. I admit, that it comes handy but how performant is LINQ? I know that is a broad question. So I want to ask about a particular example:
I have an anonymous type:
var users = reader.Select(user => new MembershipUser(reader.Name, reader Age));
And now, I want to convert it to the MembershipUserCollection.
So I do it like this:
MembershipUserCollection membershipUsers = new MembershipUserCollection();
users.ToList().ForEach(membershipUsers.Add); //what is the complexity of this line?
What is the complexity of the last line? Is it n^2 ?
Is ToList() method iterates for each element of the users and adds it to the list?
Or does ToList() works differently? Because if it is not, I find hard to justice the reason of using the last line of the code instead of simply:
foreach (var user in users)
{
membershipUsers.Add(user);
}
Your example isn't particularly good for your question because ToList() isn't really in the same class of extension methods as the other ones supporting LINQ. The ToList() extension method is a conversion operation, not a query operation. The real values in LINQ are deferred execution of a composite query built by combining several LINQ query operations and improved readability. In LINQ2SQL you also get the advantage of constructing arbitrary queries that get pushed to the DB server for actual execution, taking advantage of optimizations that the DB may have in place to improve performance.
In general, I would expect that the question of performance largely comes down to how well you construct the actual queries and has a lot more to do with how well the programmer knows the tools and data than how well the tool is implemented. In your case, it makes no sense to construct a temporary list just to be able to invoke the convenience ForEach method on it if all you care about is performance. You'd be better off simply iterating over the enumeration you already have (as you suspect). LINQ won't stop a programmer from writing bad code, though it may disguise bad code for the person who doesn't understand how LINQ works.
It's always the case that you can construct an equivalent program not using LINQ for any program using LINQ. It may be that you can actually improve on the performance. I would submit, though, that LINQ makes it much easier to write readable code than non-LINQ solutions. By that, I mean more compact and understandable. It also makes it easier to write composable code, which when executed in a deferred manner performs better than, non-LINQ compositions. By breaking the code into composable parts, you simplify it and improve understandability.
I think the trick here is to really understand where LINQ makes sense rather than treat it as a shiny, new tool that you need to now use for every problem you have. The nice part of this shiny, new tool, though, is that it really does come in handy in a lot of situations.
It's O(n) - since .ToList() iterates once through the enumeration and copys the elements into the resulting List<T> (whose insertion is O(1)). Thus the complexity is fine.
The actual issue you might see is that you create a completely new, temporary List<T> just to copy its contents into another list (and afterwards discard it).
I suspect that's just due to the convenience of having a .ForEach()-method on List<T>s. One could nonetheless code a direct implementation for IEnumerable<T>s, which would save this one superfluous copying - or just write
foreach (var user in users) membershipUsers.Add(user)
which is basically what you want to express after all ;-)
Converting to a list will have the same complexity as iterating over the sequence, which may really by anything depending on how the sequence is generated. A normal Select over an in-memory list is O(n).
The performance of using ForEach on a List vs a foreach loop comes down to the overhead of invoking a delegate vs the overhead of creating and using a enumerator, I cannot say which one is quicker, but if both are used on an in-memory list, the complexity is the same.
I'm having let's say thousands of Customer records and I have to show them on a webform. Also, I have one CustomerEntity which has 10 properties. So when I fetch data in using a DataReader and convert it into List<CustomerEntity> I am required to loop through the data two times.
So is the use of generics suggestable in such a scenario? If yes then what will be my applications performance?
For E.g.
In CustomerEntity class, i'm having CustomerId & CustomerName propeties. And i'm getting 100 records from Customer Table
Then for Preparing List i've wrote following code
while (dr.Read())
{
// creation of new object of customerEntity
// code for getting properties of CustomerEntity
for (var index = 0; index < MyProperties.Count; index++)
{
MyProperty.setValue(CustEntityObject,dr.GetValue(index));
}
//adding CustEntity object to List<CustomerEntity>
}
How can i avoid these two loops. Is their any other mechanism?
I'm not really sure how generics ties into data-volume; they are unrelated concepts... it also isn't clear to me why this requires you to read everything twice. But yes: generics are fine when used in volume (why wouldn't they be?). But of course, the best way to find a problem is profiling (either server performance or bandwidth - perhaps more the latter in this case).
Of course the better approach is: don't show thousands of records on a web form; what is the user going to do with that? Use paging, searching, filtering, ajax, etc - every trick imaginable - but don't send thousands of records to the client.
Re the updated question; the loop for setting properties isn't necessarily bad. This is an entirely appropriate inner loop. Before doing anything, profile to see if this is actually a problem. I suspect that sheer bandwidth (between server and client, or server and database) is the biggest issue. If you can prove that this loop is a problem there are things you can do do optimise:
switch to using PropertyDescriptor (rather than PropertyInfo), and use HyperDescriptor to make it a lot faster
write code with DynamicMethod to do the job - requires some understanding of IL, but very fast
write a .NET 3.5 / LINQ Expression to do the same and use .Compile() - like the second point, but (IMO) a bit easier
I can add examples for the first and third bullets; I don't really want to write an example for the second, simply because I wouldn't write that code myself that way any more (I'd use the 3rd option where available, else the 1st).
It is very difficult what to say the performance will be, but consider these things -
Generics provides type saftey
If you're going to display 10,000 records in the page, your application will probably be unusable. If records are being paged, consider returning only those records that are actually needed for the page you are on.
You shouldn't need to loop through the data twice. What are you doing with the data?