In the question Why I need to overload the method when use it as ThreadStart() parameter?, I got the following solution for saving file in separate thread problem (it's required to save file when delete or add new instance of the PersonEntity):
private ObservableCollection<PersonEntitiy> allStaff;
private Thread dataFileTransactionsThread;
public staffRepository() {
allStaff = getStaffDataFromTextFile();
dataFileTransactionsThread = new Thread(UpdateDataFileThread);
}
public void UpdateDataFile(ObservableCollection<PersonEntitiy> allStaff)
{
dataFileTransactionsThread.Start(allStaff);
// If you want to wait until the save finishes, uncomment the following line
// dataFileTransactionsThread.Join();
}
private void UpdateDataFileThread(object data) {
var allStaff = (ObservableCollection<PersonEntitiy>)data;
System.Diagnostics.Debug.WriteLine("dataFileTransactions Thread Status:"+ dataFileTransactionsThread.ThreadState);
string containsWillBeSaved = "";
// ...
File.WriteAllText(fullPathToDataFile, containsWillBeSaved);
System.Diagnostics.Debug.WriteLine("Data Save Successfull");
System.Diagnostics.Debug.WriteLine("dataFileTransactions Thread Status:" + dataFileTransactionsThread.ThreadState);
}
Now, if sequentially delete two instances of the PersonEntity, System.Threading.ThreadStateException: Thread is still executing or don't finished yet. Restart is impossible. will occur.`.
I understand this exception meaning as a whole, however, the following solution will not be enough: next time, the file will not be saved.
if (!dataFileTransactionsThread.IsAlive) {
dataFileTransactionsThread.Start(allStaff);
}
Probably, it't better to restart the thread when it finished, and then save the file again. However, it's also required to provide the code for the case when will be deleted sequentially three or more instances. Just on the conception level, it's simple: we need only newest allStaff collection, so the previous unsaved allStaff collections or not necessary anymore.
How can I realize above concept on C#?
I'm going to suggest using Microsoft's Reactive Framework. NuGet "System.Reactive".
Then you can do this:
IObservable<List<PersonEntity>> query =
Observable
.FromEventPattern<NotifyCollectionChangedEventHandler, NotifyCollectionChangedEventArgs>(
h => allStaff.CollectionChanged += h, h => allStaff.CollectionChanged -= h)
.Throttle(TimeSpan.FromSeconds(2.0))
.Select(x => allStaff.ToList())
.ObserveOn(Scheduler.Default);
IDisposable subscription =
query
.Subscribe(u =>
{
string containsWillBeSaved = "";
// ...
File.WriteAllText(fullPathToDataFile, containsWillBeSaved);
System.Diagnostics.Debug.WriteLine("Data Save Successful");
});
This code will watch your allStaff collection for all changes and then, for every change, it will wait 2 seconds to see if any other changes come thru and if they don't it then takes a copy of your collection (this is crucial for threading to work) and it saves your collection.
It will save no more than once every 2 seconds and it will only save when there has been one or more changes.
Related
I have the following code in C#:
(_StoreQueue is a ConcurrentQueue)
var S = _StoreQueue.FirstOrDefault(_ => _.TimeStamp == T);
if (S == null)
{
lock (_QueueLock)
{
// try again
S = _StoreQueue.FirstOrDefault(_ => _.TimeStamp == T);
if (S == null)
{
S = new Store(T);
_StoreQueue.Enqueue(S);
}
}
}
The system is collecting data in real time (fairly high frequency, around 300-400 calls / second) and puts it in bins (Store objects) that represent a 5 second interval. These bins are in a queue as they get written and the queue gets emptied as data is processed and written.
So, when data is arriving, a check is done to see if there is a bin for that timestamp (rounded by 5 seconds), if not, one is created.
Since this is quite heavily multi-threaded, the system goes with the following logic:
If there is a bin, it is used to put data.
If there is no bin, a lock gets initiated and within that lock, the check is done again to make sure it wasn't created by another thread in the meantime. and if there is still no bin, one gets created.
With this system, the lock is roughly used once every 2k calls
I am trying to see if there is a way to remove the lock, but it is mostly because I'm thinking there has to be a better solution that the double check.
An alternative I have been thinking about is to create empty bins ahead of time and that would entirely remove the need for any locks, but the search for the right bin would become slower as it would have to scan the list pre-built bins to find the proper one.
Using a ConcurrentDictionary can fix the issue you are having. Here i assumed a type double for your TimeStamp property but it can be anything, as long as you make the ConcurrentDictionary key match the type.
class Program
{
ConcurrentDictionary<double, Store> _StoreQueue = new ConcurrentDictionary<double, Store>();
static void Main(string[] args)
{
var T = 17d;
// try to add if not exit the store with 17
_StoreQueue.GetOrAdd(T, new Store(T));
}
public class Store
{
public double TimeStamp { get; set; }
public Store(double timeStamp)
{
TimeStamp = timeStamp;
}
}
}
I have to query in my company's CRM Solution(Oracle's Right Now) for our 600k users, and update them there if they exist or create them in case they don't. To know if the user already exists in Right Now, I consume a third party WS. And with 600k users this can be a real pain due to the time it takes each time to get a response(around 1 second). So I managed to change my code to use Parallel.ForEach, querying each record in just 0,35 seconds, and adding it to a List<User> of records to be created or to be updated (Right Now is kinda dumb so I need to separate them in 2 lists and call 2 distinct WS methods).
My code used to run perfectly before multithread, but took too long. The problem is that I can't make a batch too large or I get a timeout when I try to update or create via Web Service. So I'm sending them around 500 records at once, and when it runs the critical code part, it executes many times.
Parallel.ForEach(boDS.USERS.AsEnumerable(), new ParallelOptions { MaxDegreeOfParallelism = -1 }, row =>
{
...
user = null;
user = QueryUserById(row["USER_ID"].Trim());
if (user == null)
{
isUpdate = false;
gObject.ID = new ID();
}
else
{
isUpdate = true;
gObject.ID = user.ID;
}
... fill user attributes as generic fields ...
gObject.GenericFields = listGenericFields.ToArray();
if (isUpdate)
listUserUpdate.Add(gObject);
else
listUserCreate.Add(gObject);
if (i == batchSize - 1 || i == (boDS.USERS.Rows.Count - 1))
{
UpdateProcessingOptions upo = new UpdateProcessingOptions();
CreateProcessingOptions cpo = new CreateProcessingOptions();
upo.SuppressExternalEvents = false;
upo.SuppressRules = false;
cpo.SuppressExternalEvents = false;
cpo.SuppressRules = false;
RNObject[] results = null;
// <Critical_code>
if (listUserCreate.Count > 0)
{
results = _service.Create(_clientInfoHeader, listUserCreate.ToArray(), cpo);
}
if (listUserUpdate.Count > 0)
{
_service.Update(_clientInfoHeader, listUserUpdate.ToArray(), upo);
}
// </Critical_code>
listUserUpdate = new List<RNObject>();
listUserCreate = new List<RNObject>();
}
i++;
});
I thought about using lock or mutex, but it isn't gonna help me, since they will just wait to execute afterwards. I need some solution to execute only ONCE in only ONE thread that part of code. Is it possible? Can anyone share some light?
Thanks and kind regards,
Leandro
As you stated in the comments you're declaring the variables outside of the loop body. That's where your race conditions originate from.
Let's take variable listUserUpdate for example. It's accessed randomly by parallel executing threads. While one thread is still adding to it, e.g. in listUserUpdate.Add(gObject); another thread could already be resetting the lists in listUserUpdate = new List<RNObject>(); or enumerating it in listUserUpdate.ToArray().
You really need to refactor that code to
make each loop run as independent from each other as you can by moving variables inside the loop body and
access data in a synchronizing way using locks and/or concurrent collections
You can use the Double-checked locking pattern. This is usually used for singletons, but you're not making a singleton here so generic singletons like Lazy<T> do not apply.
It works like this:
Separate out your shared data into some sort of class:
class QuerySharedData {
// All the write-once-read-many fields that need to be shared between threads
public QuerySharedData() {
// Compute all the write-once-read-many fields. Or use a static Create method if that's handy.
}
}
In your outer class add the following:
object padlock;
volatile QuerySharedData data
In your thread's callback delegate, do this:
if (data == null)
{
lock (padlock)
{
if (data == null)
{
data = new QuerySharedData(); // this does all the work to initialize the shared fields
}
}
}
var localData = data
Then use the shared query data from localData By grouping the shared query data into a subordinate class you avoid the necessity of making its individual fields volatile.
More about volatile here: Part 4: Advanced Threading.
Update my assumption here is that all the classes and fields held by QuerySharedData are read-only once initialized. If this is not true, for instance if you initialize a list once but add to it in many threads, this pattern will not work for you. You will have to consider using things like Thread-Safe Collections.
I'll try to explain the issue with a simplified console application example, however the real project is a ASP.NET MVC3 application.
Having the following tables:
imagine the following scenario:
user creates a report (a line in TestReport, where Text is the report string content, and Ready is a bool flag, saying, if the report is ready to be processed); by default Ready is set to false, i.e. not ready.
user wants the report to be processed, so he submits it; Ready is set to true here.
The system gives an opportunity to recall the report back, if it has not been processed yet. So, when the report is recalled, Ready is set to false back. On the contrary, when the report is processed, a line in TestReportRef, referencing report by its Id, is created.
Now imagine that at one and the same moment
user wants to recall the report;
the report is added to the process list;
As soon as this can happen simultaneously, errors may occur. That is the report will have Ready == false and it'll be referenced in TestReportRef.
Here is a simple console example of how this may happen:
var dc = new TestDataContext('my connection string');
dc.TestReport.InsertOnSubmit(new TestReport
{
Text = "My report content",
Ready = true //ready at once
});
dc.SubmitChanges();
Action recallReport = () =>
{
var _dc = new TestDataContext(cs);
var report = _dc.TestReport.FirstOrDefault(t => t.Ready);
if (report != null && !report.TestReportRef.Any())
{
Thread.Sleep(1000);
report.Ready = false;
_dc.SubmitChanges();
}
};
Action acceptReport = () =>
{
var _dc = new TestDataContext(cs);
var report = _dc.TestReport.FirstOrDefault(t => t.Ready);
if (report != null && !report.TestReportRef.Any())
{
Thread.Sleep(1000);
_dc.TestReportRef.InsertOnSubmit(new TestReportRef
{
FK_ReportId = report.Id
});
_dc.SubmitChanges();
}
};
var task1 = new Task(recallReport);
var task2 = new Task(acceptReport);
task1.Start();
task2.Start();
task1.Wait();
task2.Wait();
foreach (var t in dc.TestReport)
{
Console.WriteLine(string.Format("{0}\t{1}\t{2}", t.Id, t.Text, t.Ready));
}
foreach (var t in dc.TestReportRef)
{
Console.WriteLine("ref id:\t" + t.FK_ReportId);
}
Thread.Sleep(1000); is added to be ensure, that tasks will check one and the same situation.
The given example may sound awkward, however, I hope, it should explain the issue I'm dealing with.
How can I avoid this? Making the repository singleton doesn't seem to be a good idea. Shall I use some shared mutex (one for all web requests) to separate write-operations only?
Or is there a pattern I should use in this kind of scenario?
This is only a simplified example of one of the scenarios I have. However, there are several scenarios in which it may run into a similar discrepancy. The best thing would be to make this kind of intersection impossible, I guess.
Why don't add a version column on the Report table? Task starts by tracking current version,when task end, if the version is the same that the tracked one, operation is ok, otherwise fail. If operation appear ok, update the version to version +1. This is a sort of optimistic lock; that implicitly suppose that conflicts may occur, but they are not so frequent.
UPDATE
If you are using linqto sql maybe you can have a check at the parameter UpdateCheck [Column(UpdateCheck=UpdateCheck.Always)]
This can be useful to handle concurrency in your case.
I'm using Quartz to pull latest tasks (from another source), it then adds it in as a job, creates triggers etc per each task. - Easy.
However, sometimes tasks change (therefore they already exist). Therefore I would like to change its (lets say to keep it simple Description. Code below updates specific task's description with given date.
private static void SetLastPull(DateTime lastPullDateTime)
{
var lastpull = sched.GetJobDetail("db_pull", "Settings");
if(lastpull != null)
{
lastpull.Description = lastPullDateTime.ToString();
}
else
{
var newLastPull = new JobDetail("db_pull", "Settings", typeof(IJob));
newLastPull.Description = lastPullDateTime.ToString();
var newLastPullTrigger = new CronTrigger("db_pull", "Settings", "0 0 0 * 12 ? 2099");
sched.ScheduleJob(newLastPull, newLastPullTrigger);
}
}
I'm assuming after I do lastpull.Description = lastPullDateTime.ToString(); I should call something to save changes to database. Is there a way to do it in Quartz or do I have to go to using other means and update it?
You can't change (update) a job once it has been scheduled. You can only re-schedule it (with any changes you might want to make) or delete it and create a new one.
I have a thread which produces data in the form of simple object (record). The thread may produce a thousand records for each one that successfully passes a filter and is actually enqueued. Once the object is enqueued it is read-only.
I have one lock, which I acquire once the record has passed the filter, and I add the item to the back of the producer_queue.
On the consumer thread, I acquire the lock, confirm that the producer_queue is not empty,
set consumer_queue to equal producer_queue, create a new (empty) queue, and set it on producer_queue. Without any further locking I process consumer_queue until it's empty and repeat.
Everything works beautifully on most machines, but on one particular dual-quad server I see in ~1/500k iterations an object that is not fully initialized when I read it out of consumer_queue. The condition is so fleeting that when I dump the object after detecting the condition the fields are correct 90% of the time.
So my question is this: how can I assure that the writes to the object are flushed to main memory when the queue is swapped?
Edit:
On the producer thread:
(producer_queue above is m_fillingQueue; consumer_queue above is m_drainingQueue)
private void FillRecordQueue() {
while (!m_done) {
int count;
lock (m_swapLock) {
count = m_fillingQueue.Count;
}
if (count > 5000) {
Thread.Sleep(60);
} else {
DataRecord rec = GetNextRecord();
if (rec == null) break;
lock (m_swapLock) {
m_fillingQueue.AddLast(rec);
}
}
}
}
In the consumer thread:
private DataRecord Next(bool remove) {
bool drained = false;
while (!drained) {
if (m_drainingQueue.Count > 0) {
DataRecord rec = m_drainingQueue.First.Value;
if (remove) m_drainingQueue.RemoveFirst();
if (rec.Time < FIRST_VALID_TIME) {
throw new InvalidOperationException("Detected invalid timestamp in Next(): " + rec.Time + " from record " + rec);
}
return rec;
} else {
lock (m_swapLock) {
m_drainingQueue = m_fillingQueue;
m_fillingQueue = new LinkedList<DataRecord>();
if (m_drainingQueue.Count == 0) drained = true;
}
}
}
return null;
}
The consumer is rate-limited, so it can't get ahead of the consumer.
The behavior I see is that sometimes the Time field is reading as DateTime.MinValue; by the time I construct the string to throw the exception, however, it's perfectly fine.
Have you tried the obvious: is microcode update applied on the fancy 8-core box(via BIOS update)? Did you run Windows Updates to get the latest processor driver?
At the first glance, it looks like you're locking your containers. So I am recommending the systems approach, as it sound like you're not seeing this issue on a good-ol' dual core box.
Assuming these are in fact the only methods that interact with the m_fillingQueue variable, and that DataRecord cannot be changed after GetNextRecord() creates it (read-only properties hopefully?), then the code at least on the face of it appears to be correct.
In which case I suggest that GregC's answer would be the first thing to check; make sure the failing machine is fully updated (OS / drivers / .NET Framework), becasue the lock statement should involve all the required memory barriers to ensure that the rec variable is fully flushed out of any caches before the object is added to the list.