How to separate parallel requests? - c#

I'll try to explain the issue with a simplified console application example, however the real project is a ASP.NET MVC3 application.
Having the following tables:
imagine the following scenario:
user creates a report (a line in TestReport, where Text is the report string content, and Ready is a bool flag, saying, if the report is ready to be processed); by default Ready is set to false, i.e. not ready.
user wants the report to be processed, so he submits it; Ready is set to true here.
The system gives an opportunity to recall the report back, if it has not been processed yet. So, when the report is recalled, Ready is set to false back. On the contrary, when the report is processed, a line in TestReportRef, referencing report by its Id, is created.
Now imagine that at one and the same moment
user wants to recall the report;
the report is added to the process list;
As soon as this can happen simultaneously, errors may occur. That is the report will have Ready == false and it'll be referenced in TestReportRef.
Here is a simple console example of how this may happen:
var dc = new TestDataContext('my connection string');
dc.TestReport.InsertOnSubmit(new TestReport
{
Text = "My report content",
Ready = true //ready at once
});
dc.SubmitChanges();
Action recallReport = () =>
{
var _dc = new TestDataContext(cs);
var report = _dc.TestReport.FirstOrDefault(t => t.Ready);
if (report != null && !report.TestReportRef.Any())
{
Thread.Sleep(1000);
report.Ready = false;
_dc.SubmitChanges();
}
};
Action acceptReport = () =>
{
var _dc = new TestDataContext(cs);
var report = _dc.TestReport.FirstOrDefault(t => t.Ready);
if (report != null && !report.TestReportRef.Any())
{
Thread.Sleep(1000);
_dc.TestReportRef.InsertOnSubmit(new TestReportRef
{
FK_ReportId = report.Id
});
_dc.SubmitChanges();
}
};
var task1 = new Task(recallReport);
var task2 = new Task(acceptReport);
task1.Start();
task2.Start();
task1.Wait();
task2.Wait();
foreach (var t in dc.TestReport)
{
Console.WriteLine(string.Format("{0}\t{1}\t{2}", t.Id, t.Text, t.Ready));
}
foreach (var t in dc.TestReportRef)
{
Console.WriteLine("ref id:\t" + t.FK_ReportId);
}
Thread.Sleep(1000); is added to be ensure, that tasks will check one and the same situation.
The given example may sound awkward, however, I hope, it should explain the issue I'm dealing with.
How can I avoid this? Making the repository singleton doesn't seem to be a good idea. Shall I use some shared mutex (one for all web requests) to separate write-operations only?
Or is there a pattern I should use in this kind of scenario?
This is only a simplified example of one of the scenarios I have. However, there are several scenarios in which it may run into a similar discrepancy. The best thing would be to make this kind of intersection impossible, I guess.

Why don't add a version column on the Report table? Task starts by tracking current version,when task end, if the version is the same that the tracked one, operation is ok, otherwise fail. If operation appear ok, update the version to version +1. This is a sort of optimistic lock; that implicitly suppose that conflicts may occur, but they are not so frequent.
UPDATE
If you are using linqto sql maybe you can have a check at the parameter UpdateCheck [Column(UpdateCheck=UpdateCheck.Always)]
This can be useful to handle concurrency in your case.

Related

EntityFramework and handling duplicate primary key/concurrency/race conditions situations

I wrote a library, referenced by numerous applications, that tracks who is online and which application and page they are viewing.
The data is stored, using EF6, in a Sql Server 2008 table which tracks their username (primary key), application, page and timestamp. I only want to store the latest request for each person so each username should only be stored once.
The library code, which is called from the Global.asax of each application looks like this:
public static void Add(ApplicationType application, string username, string pageRequested)
{
using (var db = new CommonDAL()) // EF context
{
var exists = db.ActiveUsers.Find(username);
if (exists != null)
db.ActiveUsers.Remove(exists);
var activeUser = new ActiveUser() { ApplicationID = application.Value(), Username = username, PageRequested = pageRequested, TimeRequested = DateTime.Now };
db.ActiveUsers.Add(activeUser);
db.SaveChanges();
}
}
I'm intermittently getting the error Violation of PRIMARY KEY constraint 'PK_tblActiveUser_Username'. Cannot insert duplicate key in object 'dbo.tblActiveUser'. The duplicate key value is (xxxxxxxx)
What I can only guess is happening is Request A comes in, removes the existing username. Request B (from same user) then comes in, tries to remove the username, sees nothing exists. Request A then adds the username. Request B then tries to add the username. The error frequently seems to be triggered when a web server sends a client a 401 status, which again points to multiple requests within a short period of time triggering this.
I'm having trouble mocking this race condition using unit tests as I haven't done much async programming before, but tried to create async tests with delays to mock multiple simultaneous slow requests. I've tried to use using (var transaction = new TransactionScope()) and using (var transaction = db.Database.BeginTransaction(System.Data.IsolationLevel.ReadCommitted)) to lock the requests so request A can complete before request B begins but can't verify either one fixes the issue as I can't mock the situation reliably.
1) Which is the right way to prevent the exception (Most recent request is the one that ultimately is stored)?
2) Which is the right way to to write a unit test to prove this is working?
Since you only want to store the latest item, you could use a last update wins and avoid the race condition on who can insert first, the database handles the locks and the last to call update (which is the most recent) is what is in the table.
Something like the following should handle any primary key errors if you run into concurrency issues on the edge case that a brand new user has 2 requests at the same time and avoid an "infinite" loop of errors (well until a stack overflow exception any way).
public static void Add(ApplicationType application,
string username,
string pageRequested,
int recursionCount = 0)
{
using (var db = new CommonDAL()) // EF context
{
var exists = db.ActiveUsers.Find(username);
if (exists != null)
{
exists.propa = "someVal";
}
else
{
var activeUser = new ActiveUser
{
ApplicationID = application.Value(),
Username = username,
PageRequested = pageRequested,
TimeRequested = DateTime.Now
};
db.ActiveUsers.Add(activeUser);
}
try
{
db.SaveChanges();
}
catch(<Primary Key Violation>)
{
if(recursionCount < x)
{
Add(application, username, pageRequested, recursionCount++)
}
else
{
throw;
}
}
}
}
As for unit testing this, it will be very hard unless you insert an artificial delay or can force both threads to run at the same time. Sometimes the timing on the race conditions is in the millisecond range depending on the issue. Tasks may not work because they are not guaranteed to run at the same time, you throw them to the background thread pool and they run when they can. Old school threads may work but I don't know how to force it since the time between read and remove & create are most likely in the 5 ms range or less.

Restart the thread when it will stop

In the question Why I need to overload the method when use it as ThreadStart() parameter?, I got the following solution for saving file in separate thread problem (it's required to save file when delete or add new instance of the PersonEntity):
private ObservableCollection<PersonEntitiy> allStaff;
private Thread dataFileTransactionsThread;
public staffRepository() {
allStaff = getStaffDataFromTextFile();
dataFileTransactionsThread = new Thread(UpdateDataFileThread);
}
public void UpdateDataFile(ObservableCollection<PersonEntitiy> allStaff)
{
dataFileTransactionsThread.Start(allStaff);
// If you want to wait until the save finishes, uncomment the following line
// dataFileTransactionsThread.Join();
}
private void UpdateDataFileThread(object data) {
var allStaff = (ObservableCollection<PersonEntitiy>)data;
System.Diagnostics.Debug.WriteLine("dataFileTransactions Thread Status:"+ dataFileTransactionsThread.ThreadState);
string containsWillBeSaved = "";
// ...
File.WriteAllText(fullPathToDataFile, containsWillBeSaved);
System.Diagnostics.Debug.WriteLine("Data Save Successfull");
System.Diagnostics.Debug.WriteLine("dataFileTransactions Thread Status:" + dataFileTransactionsThread.ThreadState);
}
Now, if sequentially delete two instances of the PersonEntity, System.Threading.ThreadStateException: Thread is still executing or don't finished yet. Restart is impossible. will occur.`.
I understand this exception meaning as a whole, however, the following solution will not be enough: next time, the file will not be saved.
if (!dataFileTransactionsThread.IsAlive) {
dataFileTransactionsThread.Start(allStaff);
}
Probably, it't better to restart the thread when it finished, and then save the file again. However, it's also required to provide the code for the case when will be deleted sequentially three or more instances. Just on the conception level, it's simple: we need only newest allStaff collection, so the previous unsaved allStaff collections or not necessary anymore.
How can I realize above concept on C#?
I'm going to suggest using Microsoft's Reactive Framework. NuGet "System.Reactive".
Then you can do this:
IObservable<List<PersonEntity>> query =
Observable
.FromEventPattern<NotifyCollectionChangedEventHandler, NotifyCollectionChangedEventArgs>(
h => allStaff.CollectionChanged += h, h => allStaff.CollectionChanged -= h)
.Throttle(TimeSpan.FromSeconds(2.0))
.Select(x => allStaff.ToList())
.ObserveOn(Scheduler.Default);
IDisposable subscription =
query
.Subscribe(u =>
{
string containsWillBeSaved = "";
// ...
File.WriteAllText(fullPathToDataFile, containsWillBeSaved);
System.Diagnostics.Debug.WriteLine("Data Save Successful");
});
This code will watch your allStaff collection for all changes and then, for every change, it will wait 2 seconds to see if any other changes come thru and if they don't it then takes a copy of your collection (this is crucial for threading to work) and it saves your collection.
It will save no more than once every 2 seconds and it will only save when there has been one or more changes.

Nopcommerce Update entity issue

Using NopCommerce 3.8, Visual Studio 2015 proff.
I have created a plugin that is responsible for making restful calls to my Web API that exposes a different DB to that of Nop.
The process is run via a nop Task, it successfully pulls the data back and i can step through and manipulate as i see fit, no issues so far.
Issue comes when i try to update a record on the product table, i perform the update... but nothing happens no change, no error.
I believe this is due to the Context having no idea about my newly instantiated product object, however I'm drawing a blank on what i need to do in relation to my particular example.
Similar questions usually reference a "model" object that is part of the parameter of the method call, "model" has the method ToEntity which seems to be the answer in similar question in stack.
However my example doesn't have the ToEntity class/method possibly because my parameter is actually a list of products. To Clarify here my code.
Method in RestClient.cs
public async Task<List<T>> GetAsync()
{
try
{
var httpClient = new HttpClient();
var json = await httpClient.GetStringAsync(ApiControllerURL);
var taskModels = JsonConvert.DeserializeObject<List<T>>(json);
return taskModels;
}
catch (Exception e)
{
return null;
}
}
Method in my Service Class
public async Task<List<MWProduct>> GetProductsAsync()
{
RestClient<MWProduct> restClient = new RestClient<MWProduct>(ApiConst.Products);
var productsList = await restClient.GetAsync();
InsertSyncProd(productsList.Select(x => x).ToList());
return productsList;
}
private void InsertSyncProd(List<MWProduct> inserted)
{
var model = inserted.Select(x =>
{
switch (x.AD_Action)
{
case "I":
//_productService.InsertProduct(row);
break;
case "U":
UpdateSyncProd(inserted);
.....
Then the method to bind and update
private void UpdateSyncProd(List<MWProduct> inserted)
{
var me = inserted.Select(x =>
{
var productEnt = _productRepos.Table.FirstOrDefault(ent => ent.Sku == x.Sku.ToString());
if(productEnt != null)
{
productEnt.Sku = x.Sku.ToString();
productEnt.ShortDescription = x.ShortDescription;
productEnt.FullDescription = x.FullDescription;
productEnt.Name = x.Name;
productEnt.Height = x.Pd_height != null ? Convert.ToDecimal(x.Pd_height) : 0;
productEnt.Width = x.Pd_width != null ? Convert.ToDecimal(x.Pd_width) : 0;
productEnt.Length = x.Pd_depth != null ? Convert.ToDecimal(x.Pd_depth) : 0;
productEnt.UpdatedOnUtc = DateTime.UtcNow;
}
//TODO: set to entity so context nows and can update
_productService.UpdateProduct(productEnt);
return productEnt;
});
}
So as you can see, I get the data and pass data through to certain method based on a result. From that list in the method I iterate over, and pull the the entity from the table, then update via the product service using that manipulated entity.
So what am I missing here, I'm sure its 1 step, and i think it may be either be because 1) The context still has no idea about the entity in question, or 2) Its Incorrect calls.
Summary
Update is not updating, possibly due to context having no knowledge OR my methodology is wrong. (probably both).
UPDATE:
I added some logger.inertlog all around my service, it runs through fine, all to the point of the call of update. But again I check the product and nothing has changed in the admin section.
plugin
I have provided the full source as i think maybe this has something to do with the rest of the code setup possibly?
UPDATE:
Added the following for testin on my execute method.
var myprod = _productRepos.GetById(4852);
myprod.ShortDescription = "db test";
productRepos.Update(myprod);
This successfully updates the product description. I moved my methods from my service into the task class but still no luck. The more i look at it the more im thinking that my async is killing off the db context somehow.
Turned of async and bound the getbyid to a new product, also removed the lambda for the switch and changed it to a foreach loop. Seems to finally update the results.
Cannot confirm if async is the culprit, currently the web api seems to be returning the same result even though the data has changed (some wierd caching by deafult in .net core? ) so im creating a new question for that.
UPDATE: It appears that the issue stems from poor debugging of async. Each instance I am trying to iterate over an await call, simply put im trying to iterate over a collection that technically may or may not be completed yet. And probably due to poor debugging, I was not aware.
So answer await your collection Then iterate after.

Executing part of code exactly 1 time inside Parallel.ForEach

I have to query in my company's CRM Solution(Oracle's Right Now) for our 600k users, and update them there if they exist or create them in case they don't. To know if the user already exists in Right Now, I consume a third party WS. And with 600k users this can be a real pain due to the time it takes each time to get a response(around 1 second). So I managed to change my code to use Parallel.ForEach, querying each record in just 0,35 seconds, and adding it to a List<User> of records to be created or to be updated (Right Now is kinda dumb so I need to separate them in 2 lists and call 2 distinct WS methods).
My code used to run perfectly before multithread, but took too long. The problem is that I can't make a batch too large or I get a timeout when I try to update or create via Web Service. So I'm sending them around 500 records at once, and when it runs the critical code part, it executes many times.
Parallel.ForEach(boDS.USERS.AsEnumerable(), new ParallelOptions { MaxDegreeOfParallelism = -1 }, row =>
{
...
user = null;
user = QueryUserById(row["USER_ID"].Trim());
if (user == null)
{
isUpdate = false;
gObject.ID = new ID();
}
else
{
isUpdate = true;
gObject.ID = user.ID;
}
... fill user attributes as generic fields ...
gObject.GenericFields = listGenericFields.ToArray();
if (isUpdate)
listUserUpdate.Add(gObject);
else
listUserCreate.Add(gObject);
if (i == batchSize - 1 || i == (boDS.USERS.Rows.Count - 1))
{
UpdateProcessingOptions upo = new UpdateProcessingOptions();
CreateProcessingOptions cpo = new CreateProcessingOptions();
upo.SuppressExternalEvents = false;
upo.SuppressRules = false;
cpo.SuppressExternalEvents = false;
cpo.SuppressRules = false;
RNObject[] results = null;
// <Critical_code>
if (listUserCreate.Count > 0)
{
results = _service.Create(_clientInfoHeader, listUserCreate.ToArray(), cpo);
}
if (listUserUpdate.Count > 0)
{
_service.Update(_clientInfoHeader, listUserUpdate.ToArray(), upo);
}
// </Critical_code>
listUserUpdate = new List<RNObject>();
listUserCreate = new List<RNObject>();
}
i++;
});
I thought about using lock or mutex, but it isn't gonna help me, since they will just wait to execute afterwards. I need some solution to execute only ONCE in only ONE thread that part of code. Is it possible? Can anyone share some light?
Thanks and kind regards,
Leandro
As you stated in the comments you're declaring the variables outside of the loop body. That's where your race conditions originate from.
Let's take variable listUserUpdate for example. It's accessed randomly by parallel executing threads. While one thread is still adding to it, e.g. in listUserUpdate.Add(gObject); another thread could already be resetting the lists in listUserUpdate = new List<RNObject>(); or enumerating it in listUserUpdate.ToArray().
You really need to refactor that code to
make each loop run as independent from each other as you can by moving variables inside the loop body and
access data in a synchronizing way using locks and/or concurrent collections
You can use the Double-checked locking pattern. This is usually used for singletons, but you're not making a singleton here so generic singletons like Lazy<T> do not apply.
It works like this:
Separate out your shared data into some sort of class:
class QuerySharedData {
// All the write-once-read-many fields that need to be shared between threads
public QuerySharedData() {
// Compute all the write-once-read-many fields. Or use a static Create method if that's handy.
}
}
In your outer class add the following:
object padlock;
volatile QuerySharedData data
In your thread's callback delegate, do this:
if (data == null)
{
lock (padlock)
{
if (data == null)
{
data = new QuerySharedData(); // this does all the work to initialize the shared fields
}
}
}
var localData = data
Then use the shared query data from localData By grouping the shared query data into a subordinate class you avoid the necessity of making its individual fields volatile.
More about volatile here: Part 4: Advanced Threading.
Update my assumption here is that all the classes and fields held by QuerySharedData are read-only once initialized. If this is not true, for instance if you initialize a list once but add to it in many threads, this pattern will not work for you. You will have to consider using things like Thread-Safe Collections.

Quartz.Net - update/delete jobs/triggers

I'm using Quartz to pull latest tasks (from another source), it then adds it in as a job, creates triggers etc per each task. - Easy.
However, sometimes tasks change (therefore they already exist). Therefore I would like to change its (lets say to keep it simple Description. Code below updates specific task's description with given date.
private static void SetLastPull(DateTime lastPullDateTime)
{
var lastpull = sched.GetJobDetail("db_pull", "Settings");
if(lastpull != null)
{
lastpull.Description = lastPullDateTime.ToString();
}
else
{
var newLastPull = new JobDetail("db_pull", "Settings", typeof(IJob));
newLastPull.Description = lastPullDateTime.ToString();
var newLastPullTrigger = new CronTrigger("db_pull", "Settings", "0 0 0 * 12 ? 2099");
sched.ScheduleJob(newLastPull, newLastPullTrigger);
}
}
I'm assuming after I do lastpull.Description = lastPullDateTime.ToString(); I should call something to save changes to database. Is there a way to do it in Quartz or do I have to go to using other means and update it?
You can't change (update) a job once it has been scheduled. You can only re-schedule it (with any changes you might want to make) or delete it and create a new one.

Categories