Multi Threading with LINQ to SQL - c#

I am writing a WinForms application. I am pulling data from my database, performing some actions on that data set and then plan to save it back to the database. I am using LINQ to SQL to perform the query to the database because I am only concerned with 1 table in our database so I didn't want to implement an entire ORM for this.
I have it pulling the dataset from the DB. However, the dataset is rather large. So currently what I am trying to do is separate the dataset into 4 relatively equal sized lists (List<object>).
Then I have a separate background worker to run through each of those lists, perform the action and report its progress while doing so. I have it planned to consolidate those sections into one big list once all 4 background workers have finished processing their section.
But I keep getting an error while the background workers are processing their unique list. Do the objects maintain their tie to the DataContext for the LINQ to SQL even though they have been converted to List objects? Any ideas how to fix this? I have minimal experience with multi-threading so if I am going at this completely wrong, please tell me.
Thanks guys. If you need any code snippets or any other information just ask.
Edit: Oops. I completely forgot to give the error message. In the DataContext designer.cs it gives the error An item with the same key has already been added. on the SendPropertyChanging function.
private void Setup(){
List<MyObject> quarter1 = _listFromDB.Take(5000).ToList();
bgw1.RunWorkerAsync();
}
private void bgw1_DoWork(object sender, DoWorkEventArgs e){
e.Result = functionToExecute(bgw1, quarter1);
}
private List<MyObject> functionToExecute(BackgroundWorker caller, List<MyObject> myList)
{
int progress = 0;
foreach (MyObject obj in myList)
{
string newString1 = createString();
obj.strText = newString;
//report progress here
caller.ReportProgress(progress++);
}
return myList;
}
This same function is called by all four workers and is given a different list for myList based on which worker is called the function.

Because a real answer has yet to be posted, I'll give it a shot.
Given that you haven't shown any LINQ-to-SQL code (no usage of DataContext) - I'll take an educated guess that the DataContext is shared between the threads, for example:
using (MyDataContext context = new MyDataContext())
{
// this is just some random query, that has not been listed - ToList()
// thus query execution is defered. listFromDB = IQueryable<>
var listFromDB = context.SomeTable.Where(st => st.Something == true);
System.Threading.Tasks.Task.Factory.StartNew(() =>
{
var list1 = listFromDB.Take(5000).ToList(); // runs the SQL query
// call some function on list1
});
System.Threading.Tasks.Task.Factory.StartNew(() =>
{
var list2 = listFromDB.Take(5000).ToList(); // runs the SQL query
// call some function on list2
});
}
Now the error you got - An item with the same key has already been added. - was because the DataContext object is not thread safe! A lot of stuff happens in the background - DataContext has to load objects from SQL, track their states, etc. This background work is what throws the error (because each thread is running the query, the DataContext gets accessed).
At least this is my own personal experience. Having come across the same error while sharing the DataContext between multiple threads. You only have two options in this scenario:
1) Before starting the threads, call .ToList() on the query, making listFromDB not an IQueryable<>, but an actual List<>. This means that the query has already ran and the threads operate on an actual List, not on the DataContext.
2) Move the DataContext definition into each thread. Because the DataContext is no longer shared, no more errors.
The third option would be to re-write the scenario into something else, like you did (for example, make everything sequential on a single background thread)...

First of all, I don't really see why you'd need multiple worker threads at all. (are theses lists in seperate databases / tables / servers? Do you really want to show 4 progress bars if you have 4 lists or are you somehow merging these progress reportings into one weird progress bar:D
Also, you're trying to speed up processing updates to your databases, but you don't send linq to sql any SAVES, so you're not really batching transactions, you'll just save everything at the end in one big transaction, is that really what you're aiming for? the progress bar will just stop at 100% and then spend a lot of time on the SQL side.
Just create one background thread and process everything synchronously, but batch a save transaction every couple of rows (i'd suggest something like every 1000 rows, but you should experiment with this) , it'll be fast, even with millions of rows,
If you really need this multithreaded solution:
The "another blabla with the same key has been added" error suggests that you are adding the same item to multiple "mylists", or adding the same item to the same list twice, otherwise how would there be any errors at all?

Using Parallel LINQ (PLINQ), you can take benefit of multiple CPU cores for processing your data. But if your application is going to run on single-core CPU, then splitting data into peaces wouldn't give you performance benefits instead it will incur some context-change overhead.
Hope it Helps

Related

BackgroundWorker with SqlConnection

I know I'm having a massive derp moment here and this is probably quite easy to actually do - I have had a search around and read a few articles but i'm still struggling a little, so any feedback or pointers to useful resources would be greatly appreciated!
Anyway I have a class called PopulateDatagridViews which I have various functions in, one of which is called ExecuteSqlStatement, this function is simple enough, it initializes an SQL connection and returns a DataTable populated with the results of the SQL query. Within the same class I also have various functions that use string builders to build up SQL statements. (Not ideal, I know.)
I create a PopulateDatagridViews object in my GUI thread and use it to set various datagrid views with with the returned DataTables. For example:
dataGridViewVar.DataSource = populateDgv.GetCustomers();
Naturally a problem I'm having is that the more data to be read from the database, the longer the U.I is unresponsive. I would like to shift the process of retrieving data via the PopulateDatagridViews to a separate thread or BackgroundWorker so as prevent the main GUI thread from locking up whilst this is processed.
I realise I can create a BackgroundWorker to do this and place in the DoWork handler a call to the appropriate function within my PopulateDatagridViews.
I figure I could create a BackgroundWorker for each individual function inside my PopulateDatagridViews class, but surely there is a more efficient way to do this? I'd very much appreciate a point in the right direction on this as it's driving me around the bend!
Additional Info: I use version 4.0 of the .Net framework.
I strongly suggest that you use TPL (Task Parallel Library) http://msdn.microsoft.com/en-us/library/dd537609.aspx
In your case you will create first task to pull some data and than start second task after first is completed to update UI.
I`ll try to find code that i write for similar problem.
Edit: Adding code
Task<return_type> t1 = new Task<return_type>(() =>
{
//do something to take some result
return some_result; //return it
});
t1.Start();
Task t2 = t1.ContinueWith((some_arg_that_represent_previous_task_obj) =>{//ContinueWith guarantees that t2 is started AFTER t1 is executed!
//Update your GUI here
//if you need result from previos task: some_arg_that_represent_previous_task_obj.Result //Your dataset or whatever
}, TaskScheduler.FromCurrentSynchronizationContext()); //VERY important - you must update gui from same thread that created it! (you will have cross thread exeption if you dont add TaskScheduler.FromCurrentSynchronizationContext()
Hope it helps.
Well in that case I recommend reading this msdn article to get some ideas. Afterwards you should look for some tutorials, because the msdn is not the best source to learn things. ;o)

How to call multiple invoke operations one by one?

I have some invoke operations (all different) that I need to call based on the user selected items that are stored in an ObservableCollection and return a string/int value.
Now when the selection is only one item it is straight forward, I can call and use the Completed event and get my return value.
I have approx <= 8 items in the list I need to iterate and perform the invoke operation on each.
I see the foreach will not really wait for the InvokeOperation to finish and just continuously iterates till the end of the list making them run in parallel..sort of...
How do I perform only one operation at once and iterate only when the previous operation is completed (irrespective of the result)?
1 by 1 execution of InvokeOperations is what I'm looking for..any clues, hints..?
Let me know if I'm unclear..
Cheers.
EDIT: The InvokeOperation(s) are different from each other. Each of them doin different operations on the DB which can be time consuming.The main reason for looking into executing them 1-1 is to update the user screen with the output (success/fail) for each of them and not do all at once.
//Pseudo-code
foreach (var item in SelectedItems)
{
var id = item.ID;
switch(id)
{
case: 1
InvokeOperataion<int> inv = context.PerformUpdateFor_1(item);
inv.Completed += (s,a) => {
//get the value assign it to Textblock.
};
break;
case: 2
InvokeOperataion<int> inv = context.PerformUpdateFor_2(item);
inv.Completed += (s,a) => {
//get the value assign it to Textblock.
};
break;
//Other cases similar to this
}
}
Take a look at Reactive Extensions for Silverlight. You will easily be able to wrap your RIA Services InvokeOperation in an IObservable which represents its asynchronous completion. One way to do this involves the Observable.Create method.
After that, you will be able to compose the collection of IObservables in many ways: executing them in sequence using Observable.Concat, executing them in parallel using Observable.ForkJoin, filtering them, joining them etc.
I have used this successfully for RIA Services' load operations, and to a smaller extent invoke operations. The composability of observables in my opinion is much nicer to work with than writing callback methods explicitly.
Here is a blog post outlining the approach for load operations, but the approach will be very similar: Linq Query Expression Syntax with RIA Domain Context

Multi threading C# application with SQL Server database calls

I have a SQL Server database with 500,000 records in table main. There are also three other tables called child1, child2, and child3. The many to many relationships between child1, child2, child3, and main are implemented via the three relationship tables: main_child1_relationship, main_child2_relationship, and main_child3_relationship. I need to read the records in main, update main, and also insert into the relationship tables new rows as well as insert new records in the child tables. The records in the child tables have uniqueness constraints, so the pseudo-code for the actual calculation (CalculateDetails) would be something like:
for each record in main
{
find its child1 like qualities
for each one of its child1 qualities
{
find the record in child1 that matches that quality
if found
{
add a record to main_child1_relationship to connect the two records
}
else
{
create a new record in child1 for the quality mentioned
add a record to main_child1_relationship to connect the two records
}
}
...repeat the above for child2
...repeat the above for child3
}
This works fine as a single threaded app. But it is too slow. The processing in C# is pretty heavy duty and takes too long. I want to turn this into a multi-threaded app.
What is the best way to do this? We are using Linq to Sql.
So far my approach has been to create a new DataContext object for each batch of records from main and use ThreadPool.QueueUserWorkItem to process it. However these batches are stepping on each other's toes because one thread adds a record and then the next thread tries to add the same one and ... I am getting all kinds of interesting SQL Server dead locks.
Here is the code:
int skip = 0;
List<int> thisBatch;
Queue<List<int>> allBatches = new Queue<List<int>>();
do
{
thisBatch = allIds
.Skip(skip)
.Take(numberOfRecordsToPullFromDBAtATime).ToList();
allBatches.Enqueue(thisBatch);
skip += numberOfRecordsToPullFromDBAtATime;
} while (thisBatch.Count() > 0);
while (allBatches.Count() > 0)
{
RRDataContext rrdc = new RRDataContext();
var currentBatch = allBatches.Dequeue();
lock (locker)
{
runningTasks++;
}
System.Threading.ThreadPool.QueueUserWorkItem(x =>
ProcessBatch(currentBatch, rrdc));
lock (locker)
{
while (runningTasks > MAX_NUMBER_OF_THREADS)
{
Monitor.Wait(locker);
UpdateGUI();
}
}
}
And here is ProcessBatch:
private static void ProcessBatch(
List<int> currentBatch, RRDataContext rrdc)
{
var topRecords = GetTopRecords(rrdc, currentBatch);
CalculateDetails(rrdc, topRecords);
rrdc.Dispose();
lock (locker)
{
runningTasks--;
Monitor.Pulse(locker);
};
}
And
private static List<Record> GetTopRecords(RecipeRelationshipsDataContext rrdc,
List<int> thisBatch)
{
List<Record> topRecords;
topRecords = rrdc.Records
.Where(x => thisBatch.Contains(x.Id))
.OrderBy(x => x.OrderByMe).ToList();
return topRecords;
}
CalculateDetails is best explained by the pseudo-code at the top.
I think there must be a better way to do this. Please help. Many thanks!
Here's my take on the problem:
When using multiple threads to insert/update/query data in SQL Server, or any database, then deadlocks are a fact of life. You have to assume they will occur and handle them appropriately.
That's not so say we shouldn't attempt to limit the occurence of deadlocks. However, it's easy to read up on the basic causes of deadlocks and take steps to prevent them, but SQL Server will always surprise you :-)
Some reason for deadlocks:
Too many threads - try to limit the number of threads to a minimum, but of course we want more threads for maximum performance.
Not enough indexes. If selects and updates aren't selective enough SQL will take out larger range locks than is healthy. Try to specify appropriate indexes.
Too many indexes. Updating indexes causes deadlocks, so try to reduce indexes to the minimum required.
Transaction isolational level too high. The default isolation level when using .NET is 'Serializable', whereas the default using SQL Server is 'Read Committed'. Reducing the isolation level can help a lot (if appropriate of course).
This is how I might tackle your problem:
I wouldn't roll my own threading solution, I would use the TaskParallel library. My main method would look something like this:
using (var dc = new TestDataContext())
{
// Get all the ids of interest.
// I assume you mark successfully updated rows in some way
// in the update transaction.
List<int> ids = dc.TestItems.Where(...).Select(item => item.Id).ToList();
var problematicIds = new List<ErrorType>();
// Either allow the TaskParallel library to select what it considers
// as the optimum degree of parallelism by omitting the
// ParallelOptions parameter, or specify what you want.
Parallel.ForEach(ids, new ParallelOptions {MaxDegreeOfParallelism = 8},
id => CalculateDetails(id, problematicIds));
}
Execute the CalculateDetails method with retries for deadlock failures
private static void CalculateDetails(int id, List<ErrorType> problematicIds)
{
try
{
// Handle deadlocks
DeadlockRetryHelper.Execute(() => CalculateDetails(id));
}
catch (Exception e)
{
// Too many deadlock retries (or other exception).
// Record so we can diagnose problem or retry later
problematicIds.Add(new ErrorType(id, e));
}
}
The core CalculateDetails method
private static void CalculateDetails(int id)
{
// Creating a new DeviceContext is not expensive.
// No need to create outside of this method.
using (var dc = new TestDataContext())
{
// TODO: adjust IsolationLevel to minimize deadlocks
// If you don't need to change the isolation level
// then you can remove the TransactionScope altogether
using (var scope = new TransactionScope(
TransactionScopeOption.Required,
new TransactionOptions {IsolationLevel = IsolationLevel.Serializable}))
{
TestItem item = dc.TestItems.Single(i => i.Id == id);
// work done here
dc.SubmitChanges();
scope.Complete();
}
}
}
And of course my implementation of a deadlock retry helper
public static class DeadlockRetryHelper
{
private const int MaxRetries = 4;
private const int SqlDeadlock = 1205;
public static void Execute(Action action, int maxRetries = MaxRetries)
{
if (HasAmbientTransaction())
{
// Deadlock blows out containing transaction
// so no point retrying if already in tx.
action();
}
int retries = 0;
while (retries < maxRetries)
{
try
{
action();
return;
}
catch (Exception e)
{
if (IsSqlDeadlock(e))
{
retries++;
// Delay subsequent retries - not sure if this helps or not
Thread.Sleep(100 * retries);
}
else
{
throw;
}
}
}
action();
}
private static bool HasAmbientTransaction()
{
return Transaction.Current != null;
}
private static bool IsSqlDeadlock(Exception exception)
{
if (exception == null)
{
return false;
}
var sqlException = exception as SqlException;
if (sqlException != null && sqlException.Number == SqlDeadlock)
{
return true;
}
if (exception.InnerException != null)
{
return IsSqlDeadlock(exception.InnerException);
}
return false;
}
}
One further possibility is to use a partitioning strategy
If your tables can naturally be partitioned into several distinct sets of data, then you can either use SQL Server partitioned tables and indexes, or you could manually split your existing tables into several sets of tables. I would recommend using SQL Server's partitioning, since the second option would be messy. Also built-in partitioning is only available on SQL Enterprise Edition.
If partitioning is possible for you, you could choose a partion scheme that broke you data in lets say 8 distinct sets. Now you could use your original single threaded code, but have 8 threads each targetting a separate partition. Now there won't be any (or at least a minimum number of) deadlocks.
I hope that makes sense.
Overview
The root of your problem is that the L2S DataContext, like the Entity Framework's ObjectContext, is not thread-safe. As explained in this MSDN forum exchange, support for asynchronous operations in the .NET ORM solutions is still pending as of .NET 4.0; you'll have to roll your own solution, which as you've discovered isn't always easy to do when your framework assume single-threadedness.
I'll take this opportunity to note that L2S is built on top of ADO.NET, which itself fully supports asynchronous operation - personally, I would much prefer to deal directly with that lower layer and write the SQL myself, just to make sure that I fully understood what was transpiring over the network.
SQL Server Solution?
That being said, I have to ask - must this be a C# solution? If you can compose your solution out of a set of insert/update statements, you can just send over the SQL directly and your threading and performance problems vanish.* It seems to me that your problems are related not to the actual data transformations to be made, but center around making them performant from .NET. If .NET is removed from the equation, your task becomes simpler. After all, the best solution is often the one that has you writing the smallest amount of code, right? ;)
Even if your update/insert logic can't be expressed in a strictly set-relational manner, SQL Server does have a built-in mechanism for iterating over records and performing logic - while they are justly maligned for many use cases, cursors may in fact be appropriate for your task.
If this is a task that has to happen repeatedly, you could benefit greatly from coding it as a stored procedure.
*of course, long-running SQL brings its own problems like lock escalation and index usage that you'll have to contend with.
C# Solution
Of course, it may be that doing this in SQL is out of the question - maybe your code's decisions depend on data that comes from elsewhere, for example, or maybe your project has a strict 'no-SQL-allowed' convention. You mention some typical multithreading bugs, but without seeing your code I can't really be helpful with them specifically.
Doing this from C# is obviously viable, but you need to deal with the fact that a fixed amount of latency will exist for each and every call you make. You can mitigate the effects of network latency by using pooled connections, enabling multiple active result sets, and using the asynchronous Begin/End methods for executing your queries. Even with all of those, you will still have to accept that there is a cost to shipping data from SQL Server to your application.
One of the best ways to keep your code from stepping all over itself is to avoid sharing mutable data between threads as much as possible. That would mean not sharing the same DataContext across multiple threads. The next best approach is to lock critical sections of code that touch the shared data - lock blocks around all DataContext access, from the first read to the final write. That approach might just obviate the benefits of multithreading entirely; you can likely make your locking more fine-grained, but be ye warned that this is a path of pain.
Far better is to keep your operations separate from each other entirely. If you can partition your logic across 'main' records, that's ideal - that is to say, as long as there aren't relationships between the various child tables, and as long as one record in 'main' doesn't have implications for another, you can split your operations across multiple threads like this:
private IList<int> GetMainIds()
{
using (var context = new MyDataContext())
return context.Main.Select(m => m.Id).ToList();
}
private void FixUpSingleRecord(int mainRecordId)
{
using (var localContext = new MyDataContext())
{
var main = localContext.Main.FirstOrDefault(m => m.Id == mainRecordId);
if (main == null)
return;
foreach (var childOneQuality in main.ChildOneQualities)
{
// If child one is not found, create it
// Create the relationship if needed
}
// Repeat for ChildTwo and ChildThree
localContext.SaveChanges();
}
}
public void FixUpMain()
{
var ids = GetMainIds();
foreach (var id in ids)
{
var localId = id; // Avoid closing over an iteration member
ThreadPool.QueueUserWorkItem(delegate { FixUpSingleRecord(id) });
}
}
Obviously this is as much a toy example as the pseudocode in your question, but hopefully it gets you thinking about how to scope your tasks such that there is no (or minimal) shared state between them. That, I think, will be the key to a correct C# solution.
EDIT Responding to updates and comments
If you're seeing data consistency issues, I'd advise enforcing transaction semantics - you can do this by using a System.Transactions.TransactionScope (add a reference to System.Transactions). Alternately, you might be able to do this on an ADO.NET level by accessing the inner connection and calling BeginTransaction on it (or whatever the DataConnection method is called).
You also mention deadlocks. That you're battling SQL Server deadlocks indicates that the actual SQL queries are stepping on each other's toes. Without knowing what is actually being sent over the wire, it's difficult to say in detail what's happening and how to fix it. Suffice to say that SQL deadlocks result from SQL queries, and not necessarily from C# threading constructs - you need to examine what exactly is going over the wire. My gut tells me that if each 'main' record is truly independent of the others, then there shouldn't be a need for row and table locks, and that Linq to SQL is likely the culprit here.
You can get a dump of the raw SQL emitted by L2S in your code by setting the DataContext.Log property to something e.g. Console.Out. Though I've never personally used it, I understand the LINQPad offers L2S facilities and you may be able to get at the SQL there, too.
SQL Server Management Studio will get you the rest of the way there - using the Activity Monitor, you can watch for lock escalation in real time. Using the Query Analyzer, you can get a view of exactly how SQL Server will execute your queries. With those, you should be able to get a good notion of what your code is doing server-side, and in turn how to go about fixing it.
I would recommend moving all the XML processing into the SQL server, too. Not only will all your deadlocks disappear, but you will see such a boost in performance that you will never want to go back.
It will be best explained by an example. In this example I assume that the XML blob already is going into your main table (I call it closet). I will assume the following schema:
CREATE TABLE closet (id int PRIMARY KEY, xmldoc ntext)
CREATE TABLE shoe(id int PRIMARY KEY IDENTITY, color nvarchar(20))
CREATE TABLE closet_shoe_relationship (
closet_id int REFERENCES closet(id),
shoe_id int REFERENCES shoe(id)
)
And I expect that your data (main table only) initially looks like this:
INSERT INTO closet(id, xmldoc) VALUES (1, '<ROOT><shoe><color>blue</color></shoe></ROOT>')
INSERT INTO closet(id, xmldoc) VALUES (2, '<ROOT><shoe><color>red</color></shoe></ROOT>')
Then your whole task is as simple as the following:
INSERT INTO shoe(color) SELECT DISTINCT CAST(CAST(xmldoc AS xml).query('//shoe/color/text()') AS nvarchar) AS color from closet
INSERT INTO closet_shoe_relationship(closet_id, shoe_id) SELECT closet.id, shoe.id FROM shoe JOIN closet ON CAST(CAST(closet.xmldoc AS xml).query('//shoe/color/text()') AS nvarchar) = shoe.color
But given that you will do a lot of similar processing, you can make your life easier by declaring your main blob as XML type, and further simplifying to this:
INSERT INTO shoe(color)
SELECT DISTINCT CAST(xmldoc.query('//shoe/color/text()') AS nvarchar)
FROM closet
INSERT INTO closet_shoe_relationship(closet_id, shoe_id)
SELECT closet.id, shoe.id
FROM shoe JOIN closet
ON CAST(xmldoc.query('//shoe/color/text()') AS nvarchar) = shoe.color
There are additional performance optimizations possible, like pre-computing repeatedly invoked Xpath results in a temporary or permanent table, or converting the initial population of the main table into a BULK INSERT, but I don't expect that you will really need those to succeed.
sql server deadlocks are normal & to be expected in this type of scenario - MS's recommendation is that these should be handled on the application side rather than the db side.
However if you do need to make sure that a stored procedure is only called once then you can use a sql mutex lock using sp_getapplock. Here's an example of how to implement this
BEGIN TRAN
DECLARE #mutex_result int;
EXEC #mutex_result = sp_getapplock #Resource = 'CheckSetFileTransferLock',
#LockMode = 'Exclusive';
IF ( #mutex_result < 0)
BEGIN
ROLLBACK TRAN
END
-- do some stuff
EXEC #mutex_result = sp_releaseapplock #Resource = 'CheckSetFileTransferLock'
COMMIT TRAN
This may be obvious, but looping through each tuple and doing your work in your servlet container involves a lot of per-record overhead.
If possible, move some or all of that processing to the SQL server by rewriting your logic as one or more stored procedures.
If
You don't have a lot of time to spend on this issue and need it to fix it right now
You are sure that your code is done so that different thread will NOT modify the same record
You are not afraid
Then ... you can just add "WITH NO LOCK" to your queries so that MSSQL doesn't apply the locks.
To use with caution :)
But anyway, you didn't tell us where the time is lost (in the mono-threaded version). Because if it's in the code, I'll advise you to write everything in the DB directly to avoid continuous data exchange. If it's in the db, I'll advise to check index (too much ?), i/o, cpu etc.

new objects added during long loop

We currently have a production application that runs as a windows service. Many times this application will end up in a loop that can take several hours to complete. We are using Entity Framework for .net 4.0 for our data access.
I'm looking for confirmation that if we load new data into the system, after this loop is initialized, it will not result in items being added to the loop itself. When the loop is initialized we are looking for data "as of" that moment. Although I'm relatively certain that this will work exactly like using ADO and doing a loop on the data (the loop only cycles through data that was present at the time of initialization), I am looking for confirmation for co-workers.
Thanks in advance for your help.
//update : here's some sample code in c# - question is the same, will the enumeration change if new items are added to the table that EF is querying?
IEnumerable<myobject> myobjects = (from o in db.theobjects where o.id==myID select o);
foreach (myobject obj in myobjects)
{
//perform action on obj here
}
It depends on your precise implementation.
Once a query has been executed against the database then the results of the query will not change (assuming you aren't using lazy loading). To ensure this you can dispose of the context after retrieving query results--this effectively "cuts the cord" between the retrieved data and that database.
Lazy loading can result in a mix of "initial" and "new" data; however once the data has been retrieved it will become a fixed snapshot and not susceptible to updates.
You mention this is a long running process; which implies that there may be a very large amount of data involved. If you aren't able to fully retrieve all data to be processed (due to memory limitations, or other bottlenecks) then you likely can't ensure that you are working against the original data. The results are not fixed until a query is executed, and any updates prior to query execution will appear in results.
I think your best bet is to change the logic of your application such that when the "loop" logic is determining whether it should do another interation or exit you take the opportunity to load the newly added items to the list. see pseudo code below:
var repo = new Repository();
while (repo.HasMoreItemsToProcess())
{
var entity = repo.GetNextItem();
}
Let me know if this makes sense.
The easiest way to assure that this happens - if the data itself isn't too big - is to convert the data you retrieve from the database to a List<>, e.g., something like this (pulled at random from my current project):
var sessionIds = room.Sessions.Select(s => s.SessionId).ToList();
And then iterate through the list, not through the IEnumerable<> that would otherwise be returned. Converting it to a list triggers the enumeration, and then throws all the results into memory.
If there's too much data to fit into memory, and you need to stick with an IEnumerable<>, then the answer to your question depends on various database and connection settings.
I'd take a snapshot of ID's to be processed -- quickly and as a transaction -- then work that list in the fashion you're doing today.
In addition to accomplishing the goal of not changing the sample mid-stream, this also gives you the ability to extend your solution to track status on each item as it's processed. For a long-running process, this can be very helpful for progress reporting restart / retry capabilities, etc.

Autocomplete textbox freezes while executing query. Must be a better way!

everyone! I searched the best I could and did not find exactly the help I was looking for.
Problem
AutoCompleteTextbox FREEZES and "eats" characters while query is performed
Asking for
Mimic Google Instant functionality
Background
First things first: C#, WPF, .NET 4.0
Ok, now that's out of the way, I'm trying to find the best way to implement a dynamic AutoComplete Textbox, which queries a database for results after each letter typed.
The following code gets executed when the AutoCompleteTextBox's TextChanged event is fired:
public void Execute(object sender, object parameter)
{
//removed some unnecessary code for the sake of being concise
var autoCompleteBox = sender as AutoCompleteTextBox;
var e = parameter as SearchTextEventArgs;
var result = SearchUnderlyings(e.SearchText);
autoCompleteBox.ItemsSource = result;
}
Now, let's say that SearchUnderlyings(e.SearchText) takes an average of 600-1100ms - during that time, the textbox is frozen and it "eats" any keys pressed. This is an annoying problem I've been having. For some reason, the LINQ in SearchUnderlyings(e.SearchText) is running in the GUI thread. I tried delegating this to a background thread, but still same result.
Ideally, I would like the textbox to work the way Google Instant does - but I don't want to be "killing" threads before the server/query can return a result.
Anyone have experience or can offer some guidance which will allow me to query as I type without freezing the GUI or killing the server?
Thank you guys!
This line:
var result = SearchUnderlyings(e.SearchText);
Runs synchronously, locking the UI thread. The way to cure this would be to switch to an asynchronous pattern, where you start the query, and then do something when it finishes.
This article demonstrates it pretty nicely, and shows some solutions - http://www.codeproject.com/KB/cs/AsyncMethodInvocation.aspx
What is probably killing you is setting the binding source over and over again (which is why running the query on a background thread doesn't make a difference).
You might consider the algorithm as a whole. Depending on your data, you could wait until the user enters the first three characters and then do one large query against the database. Bind the item source once. Each character typed afterwards just performs a filter against your data that is already cached on the client. That way you are not hitting the database over and over (which is going to be terribly expensive).
Or consider just bringing back three or so results from the DB to keep your service serialization time down.
So, we kind of hacked something quick. By making the calls to SearchUnderlyings(e.SearchText) asynchronous, my GUI thread is no longer blocked and the textbox is no longer "eating" key presses. By adding the lastQueryTag == _lastQuery check, we are trying to ensure some thread-safety, allowing only the most recent query to set the ItemsSource.
Perhaps not the most ideal or elegant solution. I am still open to further critiques and suggestions. Thank you!
private long _lastQuery = DateTime.Now.Ticks;
public void Execute(object sender, object parameter)
{
var autoCompleteBox = sender as AutoCompleteTextBox;
var e = parameter as SearchTextEventArgs;
// removed unecessary code for clarity
long lastQueryTag = _lastQuery = DateTime.Now.Ticks;
Task.Factory.StartNew(() =>
{
var result = SearchUnderlyings(e.SearchText);
System.Windows.Application.Current.Dispatch(() =>
{
if (lastQueryTag == _lastQuery)
autoCompleteBox.ItemsSource = result;
});
});
}

Categories