In Memory Database Unit Testing - Seeding data

In Memory Database Unit Testing - Seeding data - c#

So I set up an Sqlite In memory database for my unit testing.
I'm planning to copy constants/lookups from the real database to the in memory one.
I got stuck when inserting the records to the in memory dabase.
List<SqlLookup> lookups;
using(var session = new DataSession()) {
lookups = session.QueryOver<SqlLookup>().List()
}
using (var session = new InMemoryDataSession()) {
session.Merge(lu);
}
Problem is that even if the original data has Id already set, it is
not saved to the db. Nhibernate still generates an auto incremented
one
Since new Id's are being saved, the mappings are updated and an exception is thrown when updating an old record id that has been used.
{"Row was updated or deleted by another transaction (or unsaved-value
mapping was incorrect)
Is there a workaround for this?
If not, is this the right way to seed these constant data to my in memory database?
Are there other solutions asides from running SQL scripts of the data? Maintaining hundreds of lookup scripts would be hard.
Thank you.

Related

Querying using linq in EF6 is giving old values while database is having updated value

Querying using Linq in EF6 is giving old values while the database is having updated values
In the code below for the first run it works fine, But after the table tblReferenceNumber gets new rows, the query is returning only old values while I expect it to retrieve including the new records.
AuthDBEntities db = New AuthDBEntities()
tblReferenceNumber LRefNum = db.tblReferenceNumber.OrderByDescending(ab => ab.ID).First();
string lrNum = LRefNum.ReferenceNumber;
why I am getting the old values?
How can I fix it?

Each instance of DbContext has a cache. If you are using the same instance of db than the last time you got that record, then you will end up getting the cached data.
You can use:
db.Entry(LRefNum).Reload();
to force it to get fresh data.
More info about the caching here: http://codethug.com/2016/02/19/Entity-Framework-Cache-Busting/
But that also raises the question of why you are getting the same record twice with the same instance of db. Did you perhaps declare db as static?

When you use .First() you materialize this query and load data into memory.
So when u try to get values from LRefNum you refer to data in memory instead of database.

Entity Framework performance optimization

Yet another EF performance question... I have an XLSX file which I'm processing and inserting the records from it to the database. The problem is that the XLSX data is un-normalized and I have to normalize it within the DB which means to do a lot of DB calls while processing. So I have a main for loop which goes through the XLSX and then checks within the DB if records already exist. It goes something like this:
List<MainEntity> mainEntities= new List<MainEntity>();
for (int xlsIowIterator = 2; xlsIterator <= 300; xlsIterator++) {
MainEntity mainEntity = new MainEntity();
mainEntity.data1 = XLSData1;
mainEntity.data2 = XLSData2;
RelatedEntity relatedEntity= repository.FindSingleBy(x => x.MyId.Equals(data));
mainEntity.RelatedEntity = relatedEntity;
mainEntities.Add(mainEntity);
}
mainEntities.ForEach(s => context.MainEntities.Add(s));
context.SaveChanges();
Saving the changes is fine - the problem is with that call from the repository to the database within the for loop:
RelatedEntity relatedEntity= repository.FindSingleBy(x => x.MyId.Equals(data));
For each record form the XLSX file, I'm querying the database and it kills the performance. 300 records take about a minute to process and I have other related entities that I need to retrieve and attach to the main entity... Is there any way to optimize this approach or I should just give up on Entity Framework and move this functionality to a stored procedure on the database side? I really would like to keep all logic in the business layer, but I will be processing thousands of records and waiting more than 10 minutes for a page to finish is ridiculous. I'm thinking of just Bulk Loading the entire Excel file in a table and then writing a stored procedure to process it...
Any help/insights are highly appreciated! Thanks!

Entity Framework insertion performance

Im testing using Entity Framework with a Azure Sql db.
When inserting 1 record, the action takes 400ms. When adding 20 it is 2500ms.
400ms for inserting 1 record via EF seems like a lot.
What is the normal performance rate for EF?
Am I doing something wrong?
Im aware that bulk insertion can be improved, but I thought that a single insert could be done a lot faster!?
var start = DateTime.Now;
testdbEntities testdbEntities = new testdbEntities();
for (int i = 0; i < 20; i++)
testdbEntities.Users.Add(new User{Name = "New user"});
testdbEntities.SaveChanges();
var end = DateTime.Now;
var timeElapsed = (end - start).TotalMilliseconds;

All common tricks like:
AutoDetectChangesEnabled = false
Use AddRange over Add
Etc.
Will not work like you already have noticed since the performance problem is not within Entity Framework but with SQL Azure
SQL Azure may look pretty cool at first but it's slow as hell unless you paid for a very good Premium Database Tier.
As Evk recommended, you should try to execute a simple SQL Command like "SELECT 1" and you will notice this probably take more than 100ms which is ridiculously slow.
Solution:
Move to a better SQL Azure Tier
Move away from SQL Azure
Disclaimer: I'm the owner of the project Entity Framework Extensions
Another solution is using this library which will batch multiple queries/bulk operations. However again, even if this library is very fast, you will need a better SQL Azure Tier since it look every database round-trip take more than 200ms in your case.

Each insert results in a commit and causes log harden (flush to disk). In case of writing in batches this may not result in one flush per insert (until log buffers full). So try to batch the results somehow, for example using TVFs

You can disable the auto detect changes during your insert. It can really improve performance. https://msdn.microsoft.com/en-us/data/jj556205.aspx
I hope it helps :)

Most EF applications make use of persistent ignorant POCO entities and snapshot change tracking. This means that there is no code in the entities themselves to keep track of changes or notify the context of changes.
When using most POCO entities the determination of how an entity has changed (and therefore which updates need to be sent to the database) is handled by the Detect Changes algorithm. Detect Changes works by detecting the differences between the current property values of the entity and the original property values that are stored in a snapshot when the entity was queried or attached.
Snapshot change detection takes a copy of every entity in the system when they are added to the Entity Framework tracking graph. Then as entities change each entity is compared to its snapshot to see any changes. This occurs by calling the DetectChanges method. Whats important to know about DetectChanges is that it has to go through all of your tracked entities each time its called, so the more stuff you have in your context the longer it takes to traverse.
What Auto Detect Changes does is plugs into events which happen on the context and calls detect changes as they occur.
Whenever you are adding a new User object, EF is internally tracking it & keeping the current state of newly added object in its snapshot.
For bulk insert operations, EF will first insert all records into the DB & then call DetectChanges function. So execution time required for bulk insert is (time required to insert all records + time required for updating EF context).
You can make your DB insertion relatively faster by disabling AutoDetectChanges. So your code will look like,
using (var context = new YourContext())
{
try
{
context.Configuration.AutoDetectChangesEnabled = false;
// do your DB operations
}
finally
{
context.Configuration.AutoDetectChangesEnabled = true;
}
}

How to totally lock a row in Entity Framework

I am working with a situation where we are dealing with money transactions.
For example, I have a table of users wallets, with their balance in that row.
UserId; Wallet Id; Balance
Now in our website and web services, every time a certain transaction happens, we need to:
check that there is enough funds available to perform that transaction:
deduct the costs of the transaction from the balance.
How and what is the correct way to go about locking that row / entity for the entire duration of my transaction?
From what I have read there are some solutions where EF marks an entity and then compares that mark when it saves it back to the DB, however what does it do when another user / program has already edited the amount?
Can I achieve this with EF? If not what other options do I have?
Would calling a stored procedure possibly allow for me to lock the row properly so that no one else can access that row in the SQL Server whilst program A has the lock on it?

EF doesn't have built-in locking mechanism, you probably would need to use raw query like
using (var scope = new TransactionScope(...))
{
using (var context = new YourContext(...))
{
var wallet =
context.ExecuteStoreQuery<UserWallet>("SELECT UserId, WalletId, Balance FROM UserWallets WITH (UPDLOCK) WHERE ...");
// your logic
scope.Complete();
}
}

you can set the isolationlevel on the transaction in Entity framework to ensure no one else can change it:
YourDataContext.Database.BeginTransaction(IsolationLevel.RepeatableRead)
RepeatableRead
Summary:
Locks are placed on all data that is used in a query, preventing other users from updating the data. Prevents non-repeatable reads but phantom rows are still possible.

The whole point of a transactional database is that the consumer of the data determines how isolated their view of the data should be.
Irrespective of whether your transaction is serialized someone else can perform a dirty read on the same data that you just changed, but did not commit.
You should firstly concern yourself with the integrity of your view and then only accept a degredation of the quality of that view to improve system performance where you are sure it is required.
Wrap everthing in a TransactionScope with Serialized isolation level and you personally cannot really go wrong. Only drop the isolation level when you see it is genuinely required (i.e. when getting things wrong sometimes is OK).
Someone asks about this here: SQL Server: preventing dirty reads in a stored procedure

Entity framework large data set, out of memory exception

I am working the a very large data set, roughly 2 million records. I have the code below but get an out of memory exception after it has process around three batches, about 600,000 records. I understand that as it loops through each batch entity framework lazy loads, which is then trying to build up the full 2 million records into memory. Is there any way to unload the batch one I've processed it?
ModelContext dbContext = new ModelContext();
IEnumerable<IEnumerable<Town>> towns = dbContext.Towns.OrderBy(t => t.TownID).Batch(200000);
foreach (var batch in towns)
{
SearchClient.Instance.IndexMany(batch, SearchClient.Instance.Settings.DefaultIndex, "Town", new SimpleBulkParameters() { Refresh = false });
}
Note: The Batch method comes from this project: https://code.google.com/p/morelinq/
The search client is this: https://github.com/Mpdreamz/NEST

The issue is that when you get data from EF there are actually two copies of the data created, one which is returned to the user and a second which EF holds onto and uses for change detection (so that it can persist changes to the database). EF holds this second set for the lifetime of the context and its this set thats running you out of memory.
You have 2 options to deal with this
renew your context each batch
Use .AsNoTracking() in your query eg:
IEnumerable<IEnumerable<Town>> towns = dbContext.Towns.AsNoTracking().OrderBy(t => t.TownID).Batch(200000);
this tells EF not to keep a copy for change detection. You can read a little more about what AsNoTracking does and the performance impacts of this on my blog: http://blog.staticvoid.co.nz/2012/4/2/entity_framework_and_asnotracking

I wrote a migration routine that reads from one DB and writes (with minor changes in layout) into another DB (of a different type) and in this case, renewing the connection for each batch and using AsNoTracking() did not cut it for me.
Note that this problem occurs using a '97 version of JET. It may work flawlessly with other DBs.
However, the following algorithm did solve the Out-of-memory issue:
use one connection for reading and one for writing/updating
Read with AsNoTracking()
every 50 rows or so written/updated, check the memory usage, recover memory + reset output DB context (and connected tables) as needed:
var before = System.Diagnostics.Process.GetCurrentProcess().VirtualMemorySize64;
if (before > 800000000)
{
dbcontextOut.SaveChanges();
dbcontextOut.Dispose();
GC.Collect();
GC.WaitForPendingFinalizers();
dbcontextOut = dbcontextOutFunc();
tableOut = Dynamic.InvokeGet(dbcontextOut, outputTableName);
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.