What does Future() means in NHibernate? - c#

I'm new to NHibernate
the description for IEnumerable Future(); says the following
// Summary:
// Get a enumerable that when enumerated will execute a batch of queries in
// a single database roundtrip
Just wondering what does it means, the description has nothing to do with the word 'future'

Future allows to execute two or more sql in a single roundtrip, as long as the database supports it.
It's also almost transparent, so you'll want to use Futures whenever possible. If NHibernate can't execute the queries in a single roundtrip, it will execute the queries in two or more, as expected.
From http://ayende.com/blog/3979/nhibernate-futures
Let us take a look at the following piece of code:
using (var s = sf.OpenSession())
using (var tx = s.BeginTransaction())
{
var blogs = s.CreateCriteria<Blog>()
.SetMaxResults(30)
.List<Blog>();
var countOfBlogs = s.CreateCriteria<Blog>()
.SetProjection(Projections.Count(Projections.Id()))
.UniqueResult<int>();
Console.WriteLine("Number of blogs: {0}", countOfBlogs);
foreach (var blog in blogs)
{
Console.WriteLine(blog.Title);
}
tx.Commit();
}
This code would generate two queries to the database
Two queries to the database is a expensive, we can see that it took us
114ms to get the data from the database. We can do better than that, let us
tell NHibernate that it is free to do the optimization in any way that it likes
using (var s = sf.OpenSession())
using (var tx = s.BeginTransaction())
{
var blogs = s.CreateCriteria<Blog>()
.SetMaxResults(30)
.Future<Blog>();
var countOfBlogs = s.CreateCriteria<Blog>()
.SetProjection(Projections.Count(Projections.Id()))
.FutureValue<int>();
Console.WriteLine("Number of blogs: {0}", countOfBlogs.Value);
foreach (var blog in blogs)
{
Console.WriteLine(blog.Title);
}
tx.Commit();
}
Instead of going to the database twice, we only go once, with both
queries at once. The speed difference is quite dramatic, 80 ms instead
of 114 ms, so we saved about 30% of the total data access time and a
total of 34 ms.

Related

Does the for loop invocation result in fetching all records from the database and is there a way to control this behaviour?

using (AdventureWorksEntities context = new AdventureWorksEntities())
{
var query = context.Products
.Select(product => new
{
ProductId = product.ProductID,
ProductName = product.Name
});
Console.WriteLine("Product Info:");
foreach (var productInfo in query)
{
Console.WriteLine("Product Id: {0} Product name: {1} ",
productInfo.ProductId, productInfo.ProductName);
}
}
I understand that the query gets executed on the database server when the for loop starts executing. Assuming the database server returns the entire result back to the .net application and the entire data is held in query variable.
Is there any way to limit this so as to prevent memory overflow issues.
As stated here, using foreach directly on an EF IQueryable streams the data, loading each entity into memory individually.
This reduces memory consumption, with the caveat that the underlying table(s) will be locked for the duration of the loop execution.
Note that as highlighted by David in the comments, you should also disable tracking to make entities eligible for garbage collection after each iteration.
Thais can be done for an individual query using AsNoTracking:
foreach (var productInfo in query.AsNoTracking())
Or globally using QueryTrackingBehavior.

Big Batch of Entity Framework Updates Much Slower Than Batching Myself

Updating a bunch of records is much slower using what I think are standard entity framework techniques than batching the same queries it would generate myself. For 250 records I see entity framework about 10 times as slow. For 1000 records it goes up to about 20 times slower.
When I log the database activity for entity framework, I see it is generating the same basic queries I would generate myself, but it seems to be running them one at a time instead of all at once, even though I only call SaveChanges once. Is there any way to ask it to run the queries all at once?
I can't do a simple mass SQL update because in my real use case each row needs to be processed separately to determine what to set the fields to.
Sample timing code is below:
var stopwatchEntity = new System.Diagnostics.Stopwatch();
var stopwatchUpdate = new System.Diagnostics.Stopwatch();
using (var dbo = new ProjDb.dbo("Server=server;Database=database;Trusted_Connection=True;"))
{
var resourceIds = dbo.Resources.Select(r => r.ResourceId).Take(250).ToList();
//dbo.Database.Log += (s) => System.Diagnostics.Debug.WriteLine(s);
stopwatchEntity.Start();
foreach (var resourceId in resourceIds)
{
var resource = new ProjDb.Models.dbo.Resource { ResourceId = resourceId };
dbo.Resources.Attach(resource);
resource.IsBlank = false;
}
dbo.SaveChanges();
stopwatchEntity.Stop();
stopwatchUpdate.Start();
var updateStr = "";
foreach (var resourceId in resourceIds)
updateStr += "UPDATE Resources SET IsBlank = 0 WHERE ResourceId = " + resourceId + ";";
dbo.Database.ExecuteSqlCommand(updateStr);
stopwatchUpdate.Stop();
MessageBox.Show(stopwatchEntity.Elapsed.TotalSeconds.ToString("f") + ", " + stopwatchUpdate.Elapsed.TotalSeconds.ToString("f"));
}
As #EricEJ and #Kirchner reported, EF6 doesn't support batch update. However, some third-party libraries do.
Disclaimer: I'm the owner of the project Entity Framework Plus
EF+ Batch Update allows updating multiples rows with the same value/formula.
For example:
context.Resources
.Where(x => resourceIds.Contains(x => x.ResourceId)
.Update(x => new Resource() { IsBlank = false });
Since entities are not loaded in the context, you should get the best performance available.
Read more: http://entityframework-plus.net/batch-update
Disclaimer: I'm the owner of the project Entity Framework Extensions
If the value must differ from a row to another, this library allows BulkUpdate features. This library is a paid library but pretty much supports everything you need for performance:
Bulk SaveChanges
Bulk Insert
Bulk Delete
Bulk Update
Bulk Merge
For example:
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);
// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);
context.BulkMerge(customers);
Entity Framework 6 does not support batching, EF Core does

EF 6 performance while updating multiple records with different values in same table

I have a table , whose values are updated conditional basis, and when I am calling
db.SaveChanges()
there is a huge performance drop.
I am also setting properties
db.Configuration.AutoDetectChangesEnabled = false;
db.Configuration.ValidateOnSaveEnabled = false;
still results are not as expected.
Edit 1:
using(var db= new MyEntities())
{
db.Configuration.AutoDetectChangesEnabled = false;
db.Configuration.ValidateOnSaveEnabled = false;
foreach(var acc in myacclist)
{
//will update my account objects here
}
db.SaveChanges();
}
Unfortunately, there is no way you will be able to have good performance with Entity Framework and SaveChanges.
SaveChange makes a database round-trip for every record update. So if you currently have 10,000 accounts, 10k database round-trip is performed.
Setting AutoDetectChangesEnabled and ValidateOnSaveEnabled is usually a very bad idea and will not really improve the performance since it the number of database round-trip the real issue.
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library allows to dramatically improve performance by performing:
BulkSaveChanges
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
Example:
using(var db= new MyEntities())
{
foreach(var acc in myacclist)
{
//will update my account objects here
}
db.BulkSaveChanges();
}
Building a Update query in string builder and saving it for every 1k records improved my performance
using (var db = new MyEntities())
{
StringBuilder finalquery = new StringBuilder();
int i = 0;
foreach (var acc in myacclist)
{
i++;
//will update my account objects here
finalquery.Append(stmnt);
if (1 % 1000 = 0) { db.Database.ExecuteSqlCommand(finalquery.ToString()); }
}
db.SaveChanges();
}

sql nhibernate performance for loop

I have the following logic:
loop through a list of ids, get the associated entity, and for that entity, loop through another list of ids and get another entity. Code is below:
foreach (var docId in docIds)
{
var doc = new EntityManager<Document>().GetById(docId);
foreach (var tradeId in tradeIds)
{
var trade = new EntityManager<Trade>().GetById(tradeId);
if (doc.Trade.TradeId != trade.TradeId)
{
Document newDoc = new Document(doc, trade, 0);
new EntityManager<Document>().Add(newDoc);
}
}
}
my question is mainly about sql performance. Obviously there will be a bunch of selects happening, as well as some adds. Is this a bad way to go about doing something like this?
Should I, instead, use a session and get a list of all entities that match the list of ids (with 1 select statement) and then loop after?
It depends only on my expirience. But you can test it yourselve.
If Trade entity isn't very big and count of entities wouldnt be over 1000 - reading all entities and loop after will be much preferable.
If count is more 1k - its better to call stored procedure with joining temp table, containing your ids.

How do I reduce the memory footprint with large datasets in EF5?

I'm trying to pull a large-ish dataset (1.4 million records) from a SQL Server and dump to a file in a WinForms application. I've attempted to do it with paging, so that I'm not holding too much in memory at once, but the process continues to grow it's memory footprint as it runs. About 25% through, it was taking up 600,000K. Am I doing the paging wrong? Can I get some suggestions on how to keep the memory usage from growing so much?
var query = (from organizations in ctxObj.Organizations
where organizations.org_type_cd == 1
orderby organizations.org_ID
select organizations);
int recordCount = query.Count();
int skipTo = 0;
int take = 1000;
if (recordCount > 0)
{
while (skipTo < recordCount)
{
if (skipTo + take > recordCount)
take = recordCount - skipTo;
foreach (Organization o in query.Skip(skipTo).Take(take))
{
writeRecord(o);
}
skipTo += take;
}
}
The object context will keep on objects in memory until it's disposed. I would recommend disposing the context after each batch to prevent the memory footprint from continuing to grow.
You can also use AsNoTracking() (http://msdn.microsoft.com/en-us/library/gg679352(v=vs.103).aspx) since you are not saving back to the database.
Get rid of paging and use AsNoTracking.
Test Code
static void Main(string[] args)
{
var sw = new Stopwatch();
sw.Start();
using (var context = new MyEntities())
{
var query = (from organizations in context.LargeSampleTable.AsNoTracking()
where organizations.ErrorID != null
orderby organizations.ErrorID
select organizations);//large sample table, 146994 rows
foreach (MyObject o in query)
{
writeRecord(o);
}
}
sw.Stop();
Console.WriteLine("Completed after: {0}", sw.Elapsed);
Console.ReadLine();
}
private static void writeRecord(ApplicationErrorLog o)
{
;
}
Test Case Result:
Memory Consumption reduced: 96%
Execution Time reduced: 50%
Interpretation
AsNoTracking provides benefits to memory usage for obvious reasons, we don't have to maintain references to the entities as we load them into memory. Objects are GC elegible almost immediately. Combine lazy evaluation and AsNoTracking and there is no need for paging and context destruction can be deferred.
While this is a single test the large number of rows and exclusion of most external factors make this a good representation for the general case.
A few things.
Calling Count() runs your query. You then run it a second time to get the results. You don't need to do this.
The memory you're seeing is due to loading entities into memory. If you only need a subset of fields, project to an anonymous type (or a simpler named type.) This will avoid any change tracking and overhead.
Used in this way, EF can be a nice strongly typed API to lightweight SQL queries.
Something like this should do the trick:
var query = from organizations in ctxObj.Organizations
where organizations.org_type_cd == 1
orderby organizations.org_ID
select new { o.Id, o.Name };
foreach (var org in query)
{
write(org.Id, org.Name);
}
Why don't you just use a standard System.Data.SqlClient.SqlConnection class? You can read the results of a command line by line using the SqlDataReader class and write each line to a file. You have full control to guarantee that your code is only referencing one line of records at a time.
using (var writer = new System.IO.StreamWriter(fileName))
using (var conn = new SqlConnection(connectionString))
{
using (var cmd = new SqlCommand())
{
cmd.CommandText = "SELECT * FROM Organizations WHERE org_type_cd = 1 ORDER BY org_ID";
using (var reader = cmd.ExecuteReader())
{
while (reader.Read())
{
int id = (int)reader["org_ID"];
int org_type_cd = (int)reader["org_type_cd"];
writer.WriteLine(...);
}
}
}
}
Entity Framework isn't meant to solve every problem or to be your exclusive data access framework. It's meant to things easier to write for simple CRUD operations. Dealing with millions of rows is a good use case for a more specialized solution.

Categories