I'm using Quartz to pull latest tasks (from another source), it then adds it in as a job, creates triggers etc per each task. - Easy.
However, sometimes tasks change (therefore they already exist). Therefore I would like to change its (lets say to keep it simple Description. Code below updates specific task's description with given date.
private static void SetLastPull(DateTime lastPullDateTime)
{
var lastpull = sched.GetJobDetail("db_pull", "Settings");
if(lastpull != null)
{
lastpull.Description = lastPullDateTime.ToString();
}
else
{
var newLastPull = new JobDetail("db_pull", "Settings", typeof(IJob));
newLastPull.Description = lastPullDateTime.ToString();
var newLastPullTrigger = new CronTrigger("db_pull", "Settings", "0 0 0 * 12 ? 2099");
sched.ScheduleJob(newLastPull, newLastPullTrigger);
}
}
I'm assuming after I do lastpull.Description = lastPullDateTime.ToString(); I should call something to save changes to database. Is there a way to do it in Quartz or do I have to go to using other means and update it?
You can't change (update) a job once it has been scheduled. You can only re-schedule it (with any changes you might want to make) or delete it and create a new one.
Related
I have an API that people are calling and I have a database containing statistics of the number of requests. All API requests are made by a user in a company. There's a row in the database per user per company per hour. Example:
| CompanyId | UserId| Date | Requests |
|-----------|-------|------------------|----------|
| 1 | 100 | 2020-01-30 14:00 | 4527 |
| 1 | 100 | 2020-01-30 15:00 | 43 |
| 2 | 201 | 2020-01-30 14:00 | 161 |
To avoid having to make a database call on every request, I've developed a service class in C# maintaining an in-memory representation of the statistics stored in a database:
public class StatisticsService
{
private readonly IDatabase database;
private readonly Dictionary<string, CompanyStats> statsByCompany;
private DateTime lastTick = DateTime.MinValue;
public StatisticsService(IDatabase database)
{
this.database = database;
this.statsByCompany = new Dictionary<string, CompanyStats>();
}
private class CompanyStats
{
public CompanyStats(List<UserStats> userStats)
{
UserStats = userStats;
}
public List<UserStats> UserStats { get; set; }
}
private class UserStats
{
public UserStats(string userId, int requests, DateTime hour)
{
UserId = userId;
Requests = requests;
Hour = hour;
Updated = DateTime.MinValue;
}
public string UserId { get; set; }
public int Requests { get; set; }
public DateTime Hour { get; set; }
public DateTime Updated { get; set; }
}
}
Every time someone calls the API, I'm calling an increment method on the StatisticsService:
public void Increment(string companyId, string userId)
{
var utcNow = DateTime.UtcNow;
EnsureCompanyLoaded(companyId, utcNow);
var currentHour = new DateTime(utcNow.Year, utcNow.Month, utcNow.Day, utcNow.Hour, 0, 0);
var stats = statsByCompany[companyId];
var userStats = stats.UserStats.FirstOrDefault(ls => ls.UserId == userId && ls.Hour == currentHour);
if (userStats == null)
{
var userStatsToAdd = new UserStats(userId, 1, currentHour);
userStatsToAdd.Updated = utcNow;
stats.UserStats.Add(userStatsToAdd);
}
else
{
userStats.Requests++;
userStats.Updated = utcNow;
}
}
The method loads the company into the cache if not already there (will publish EnsureCompanyLoaded in a bit). It then checks if there is a UserStats object for this hour for the user and company. If not it creates it and set Requests to 1. If other requests have already been made for this user, company, and current hour, it increments the number of requests by 1.
EnsureCompanyLoaded as promised:
private void EnsureCompanyLoaded(string companyId, DateTime utcNow)
{
if (statsByCompany.ContainsKey(companyId)) return;
var currentHour = new DateTime(utcNow.Year, utcNow.Month, utcNow.Day, utcNow.Hour, 0, 0); ;
var userStats = new List<UserStats>();
userStats.AddRange(database.GetAllFromThisMonth(companyId));
statsByCompany[companyId] = new CompanyStats(userStats);
}
The details behind loading the data from the database are hidden away behind the GetAllFromThisMonth method and not important to my question.
Finally, I have a timer that stores any updated results to the database every 5 minutes or when the process shuts down:
public void Tick(object state)
{
var utcNow = DateTime.UtcNow;
var currentHour = new DateTime(utcNow.Year, utcNow.Month, utcNow.Day, utcNow.Hour, 0, 0);
foreach (var companyId in statsByCompany.Keys)
{
var usersToUpdate = statsByCompany[companyId].UserStats.Where(ls => ls.Updated > lastTick);
foreach (var userStats in usersToUpdate)
{
database.Save(GenerateSomeEntity(userStats.Requests));
userStats.Updated = DateTime.MinValue;
}
}
// If we moved into new month since last tick, clear entire cache
if (lastTick.Month != utcNow.Month)
{
statsByCompany.Clear();
}
lastTick = utcNow;
}
I've done some single-threaded testing of the code and the concept seem to work out as expected. Now I want to migrate this to be thread-safe but cannot seem to figure out how to implement it the best way. I've looked at ConcurrentDictionary which might be needed. The main problem isn't on the dictionary methods, though. If two threads call Increment simultaneously, they could both end up in the EnsureCompanyLoaded method. I know of the concepts of lock in C#, but I'm afraid to just lock on every invocation and slow down performance that way.
Anyone needed something similar and have some good pointers in which direction I could go?
When keeping counters in memory like this you have two options:
Keep in memory the actual historic value of the counter
Keep in memory only the differential increment of the counter
I have used both approaches and I've found the second to be simpler, faster and safer. So my suggestion is to stop loading UserStats from the database, and just increment the in-memory counter starting from 0. Then every 5 minutes call a stored procedure that inserts or updates the related database record accordingly (while zero-ing the in-memory value). This way you'll eliminate the race conditions at the loading phase, and you'll ensure that every call to Increment will be consistently fast.
For thread-safety you can use either a normal Dictionary
with a lock, or a ConcurrentDictionary without lock. The first option is more flexible, and the second more efficient. If you choose Dictionary+lock, use the lock only for protecting the internal state of the Dictionary. Don't lock while updating the database. Before updating each counter take the current value from the dictionary and remove the entry in an atomic operation, and then issue the database command while other threads will be able to recreate the entry again if needed. The ConcurrentDictionary class contains a TryRemove method that can be used to achieve this goal without locking:
public bool TryRemove (TKey key, out TValue value);
It also contains a ToArray method that returns a snapshot of the entries in the dictionary. At first glance it seems that the ConcurrentDictionary suits your needs, so you could use it as a basis of your implementation and see how it goes.
To avoid having to make a database call on every request, I've
developed a service class in C# maintaining an in-memory
representation of the statistics stored in a database:
If you want to avoid Update race conditions, you should stop doing exactly that.
Databases by design, by purpose prevent simple update race conditions. This is a simple counting-up operation. A single DML statement. Implicity protected by transactions, journaling and locks. Indeed that is why calling them a lot is costly.
You are fighting the concurrency already there, by adding that service. You are also moving a DB job outside of the DB. And Moving DB jobs outside of the DB, is just going to cause issues.
If your worry is speed:
Please read the Speed Rant.
Maybe a Dsitributed Database Design is the droid you are looking for? They had a massive surge in popularity since Mobile Devices have proliferated, both for speed and reliability reasons.
In general, to make your code thread-safe:
Use concurrent collections, such as ConcurrentDictionary
Make sure to understand concepts such as lock statement, Monitor.Wait and Mintor.PulseAll in tutorials. Locks can be slow if IO operations (such as disk write/read) it being locked on, but for something in RAM it is not necessary to worrry about. If you have really some lengthy operation such as IO or http requests, consider using ConcurrentQueue and learn about the consumer-producer pattern to process work in queues by many workers (example)
You can also try Redis server to cache database without need to design something from zero.
You can also make your service singleton, and update database only after value changes. For reading value, you have already stored it in your service.
In the question Why I need to overload the method when use it as ThreadStart() parameter?, I got the following solution for saving file in separate thread problem (it's required to save file when delete or add new instance of the PersonEntity):
private ObservableCollection<PersonEntitiy> allStaff;
private Thread dataFileTransactionsThread;
public staffRepository() {
allStaff = getStaffDataFromTextFile();
dataFileTransactionsThread = new Thread(UpdateDataFileThread);
}
public void UpdateDataFile(ObservableCollection<PersonEntitiy> allStaff)
{
dataFileTransactionsThread.Start(allStaff);
// If you want to wait until the save finishes, uncomment the following line
// dataFileTransactionsThread.Join();
}
private void UpdateDataFileThread(object data) {
var allStaff = (ObservableCollection<PersonEntitiy>)data;
System.Diagnostics.Debug.WriteLine("dataFileTransactions Thread Status:"+ dataFileTransactionsThread.ThreadState);
string containsWillBeSaved = "";
// ...
File.WriteAllText(fullPathToDataFile, containsWillBeSaved);
System.Diagnostics.Debug.WriteLine("Data Save Successfull");
System.Diagnostics.Debug.WriteLine("dataFileTransactions Thread Status:" + dataFileTransactionsThread.ThreadState);
}
Now, if sequentially delete two instances of the PersonEntity, System.Threading.ThreadStateException: Thread is still executing or don't finished yet. Restart is impossible. will occur.`.
I understand this exception meaning as a whole, however, the following solution will not be enough: next time, the file will not be saved.
if (!dataFileTransactionsThread.IsAlive) {
dataFileTransactionsThread.Start(allStaff);
}
Probably, it't better to restart the thread when it finished, and then save the file again. However, it's also required to provide the code for the case when will be deleted sequentially three or more instances. Just on the conception level, it's simple: we need only newest allStaff collection, so the previous unsaved allStaff collections or not necessary anymore.
How can I realize above concept on C#?
I'm going to suggest using Microsoft's Reactive Framework. NuGet "System.Reactive".
Then you can do this:
IObservable<List<PersonEntity>> query =
Observable
.FromEventPattern<NotifyCollectionChangedEventHandler, NotifyCollectionChangedEventArgs>(
h => allStaff.CollectionChanged += h, h => allStaff.CollectionChanged -= h)
.Throttle(TimeSpan.FromSeconds(2.0))
.Select(x => allStaff.ToList())
.ObserveOn(Scheduler.Default);
IDisposable subscription =
query
.Subscribe(u =>
{
string containsWillBeSaved = "";
// ...
File.WriteAllText(fullPathToDataFile, containsWillBeSaved);
System.Diagnostics.Debug.WriteLine("Data Save Successful");
});
This code will watch your allStaff collection for all changes and then, for every change, it will wait 2 seconds to see if any other changes come thru and if they don't it then takes a copy of your collection (this is crucial for threading to work) and it saves your collection.
It will save no more than once every 2 seconds and it will only save when there has been one or more changes.
I am having a problem in processing multiple tasks return values written in Task Parallel Library.
My code does as below:
. Creates set of Keys and objects with data coming from third party
. It starts task for each key in that dictionary
. In each of the task, again set of tasks will be started with user objects associated with that key (from step 2 above)
. Once all the inner tasks are executed, I want to update the user object based on the return value.
So far I have come up with the below code:
void ProcessKeys(ExchangeKey eKey)
{
// Create List of Tasks
var retValues = new List<Task<ReturnStatus>>();
// The ContainsKey here is commented out but I have a comparer method which can compare the compound keys
if (TaskProcessor.lstInputsFromUser.Count > 0 )// && TaskProcessor.lstInputsFromUser.ContainsKey(eKey))
{
// Everytime dData will be filled with data from outside world
OrderBook myOrder = new OrderBook();
// Remove the key as the next object will be inserted somewhere else when new data arrives
dData.TryGetValue(eKey, out myOrder);
if (myOrder != null)
{
foreach (var indiElement in TaskProcessor.lstInputsFromUser[eKey])
{
// If the data is already run for the expected number of cycles to total cyles, don't run again
if (indiElement.NoOfCycles != indiElement.TotalNoOfCycles)
{
var singleRetVal = new Task<ReturnStatus>( () => ProcessData(myOrder, indiElement));
singleRetVal.Start();
retValues.Add(singleRetVal);
}
}
}
// Wait till all tasks are completed
// Here also I am facing problem of how to wait on the 1-exchangekey and 5-tasks policy
Task.WaitAll(retValues.ToArray());
var results = retValues.Select(t => t.Result)
.ToList();
if (results != null)
{
Console.WriteLine("Shall print two times only");
foreach (var retResult in results)
{
if (TaskProcessor.ReturnStatusAfterCalc.Keys.Contains(retResult.ID))
{
TaskProcessor.ReturnStatusAfterCalc[retResult.ID] = retResult;
var myObj = TaskProcessor.lstInputsFromUser[eKey].FirstOrDefault(indi => indi.ID == retResult.ID);
if (myObj != null)
{
// Here I am unable to update my original object
myObj.NoOfCycles += retResult.CyclesCompleted;
/// arEvent.Set();
// Task.Run( () => Update database with myObj.NoOfCyles value for the listInputsFromUser[eKey]
}
}
else
{
// TaskProcessor.ReturnStatusAfterCalc[retResult.ID] = new ReturnStatus();
TaskProcessor.ReturnStatusAfterCalc.TryAdd(retResult.ID, retResult);
var myObj = TaskProcessor.lstInputsFromUser[eKey].FirstOrDefault(indi => indi.ID == retResult.ID);
if (myObj != null)
{
// Here I am unable to update my original object
myObj.NoOfCycles += retResult.CyclesCompleted;
// Task.Run( () => Update database with myObj.NoOfCyles value for the listInputsFromUser[eKey]
// arEvent.Set();
}
}
}
}
}
}
The main problem is, the code runs multiple times the inner set of tasks than the required ones. Also I am facing problem in updating the actual object with return values as it is executing the inner tasks more than once.
Request anyone to help me on this. While(true) is bad but I will replace this with a set event included with Timer.
After couple of comments, I feel my example and explanation does not actually give problem description. Sorry for the trouble guys.
Problem Description:
say I am having 10 symbols from an exchange. User 1 is subscribed to 3 symbols and User 2 subscribed to 7 symbols. Now exchange will keep on sending data for these 10 symbols, User 1 and User 2 will give me certain price to place orders on these 10-symbols. I will be running 10 tasks to check the price for the incoming 10-symbols. Later User 1 also adds remaining 7 symbol price input. So I have to run 13 different tasks which takes 13 User Objects and 10-symbol data as each user object might contain different price and conditions
EDIT:
With the help from Panagiotis Kanavos, I realised Parallel.ForEach can be used to store the return results and Tasks are not required. The main logic presented here and rest of the unnecessary code is removed as per the comments.
I have to query in my company's CRM Solution(Oracle's Right Now) for our 600k users, and update them there if they exist or create them in case they don't. To know if the user already exists in Right Now, I consume a third party WS. And with 600k users this can be a real pain due to the time it takes each time to get a response(around 1 second). So I managed to change my code to use Parallel.ForEach, querying each record in just 0,35 seconds, and adding it to a List<User> of records to be created or to be updated (Right Now is kinda dumb so I need to separate them in 2 lists and call 2 distinct WS methods).
My code used to run perfectly before multithread, but took too long. The problem is that I can't make a batch too large or I get a timeout when I try to update or create via Web Service. So I'm sending them around 500 records at once, and when it runs the critical code part, it executes many times.
Parallel.ForEach(boDS.USERS.AsEnumerable(), new ParallelOptions { MaxDegreeOfParallelism = -1 }, row =>
{
...
user = null;
user = QueryUserById(row["USER_ID"].Trim());
if (user == null)
{
isUpdate = false;
gObject.ID = new ID();
}
else
{
isUpdate = true;
gObject.ID = user.ID;
}
... fill user attributes as generic fields ...
gObject.GenericFields = listGenericFields.ToArray();
if (isUpdate)
listUserUpdate.Add(gObject);
else
listUserCreate.Add(gObject);
if (i == batchSize - 1 || i == (boDS.USERS.Rows.Count - 1))
{
UpdateProcessingOptions upo = new UpdateProcessingOptions();
CreateProcessingOptions cpo = new CreateProcessingOptions();
upo.SuppressExternalEvents = false;
upo.SuppressRules = false;
cpo.SuppressExternalEvents = false;
cpo.SuppressRules = false;
RNObject[] results = null;
// <Critical_code>
if (listUserCreate.Count > 0)
{
results = _service.Create(_clientInfoHeader, listUserCreate.ToArray(), cpo);
}
if (listUserUpdate.Count > 0)
{
_service.Update(_clientInfoHeader, listUserUpdate.ToArray(), upo);
}
// </Critical_code>
listUserUpdate = new List<RNObject>();
listUserCreate = new List<RNObject>();
}
i++;
});
I thought about using lock or mutex, but it isn't gonna help me, since they will just wait to execute afterwards. I need some solution to execute only ONCE in only ONE thread that part of code. Is it possible? Can anyone share some light?
Thanks and kind regards,
Leandro
As you stated in the comments you're declaring the variables outside of the loop body. That's where your race conditions originate from.
Let's take variable listUserUpdate for example. It's accessed randomly by parallel executing threads. While one thread is still adding to it, e.g. in listUserUpdate.Add(gObject); another thread could already be resetting the lists in listUserUpdate = new List<RNObject>(); or enumerating it in listUserUpdate.ToArray().
You really need to refactor that code to
make each loop run as independent from each other as you can by moving variables inside the loop body and
access data in a synchronizing way using locks and/or concurrent collections
You can use the Double-checked locking pattern. This is usually used for singletons, but you're not making a singleton here so generic singletons like Lazy<T> do not apply.
It works like this:
Separate out your shared data into some sort of class:
class QuerySharedData {
// All the write-once-read-many fields that need to be shared between threads
public QuerySharedData() {
// Compute all the write-once-read-many fields. Or use a static Create method if that's handy.
}
}
In your outer class add the following:
object padlock;
volatile QuerySharedData data
In your thread's callback delegate, do this:
if (data == null)
{
lock (padlock)
{
if (data == null)
{
data = new QuerySharedData(); // this does all the work to initialize the shared fields
}
}
}
var localData = data
Then use the shared query data from localData By grouping the shared query data into a subordinate class you avoid the necessity of making its individual fields volatile.
More about volatile here: Part 4: Advanced Threading.
Update my assumption here is that all the classes and fields held by QuerySharedData are read-only once initialized. If this is not true, for instance if you initialize a list once but add to it in many threads, this pattern will not work for you. You will have to consider using things like Thread-Safe Collections.
I'll try to explain the issue with a simplified console application example, however the real project is a ASP.NET MVC3 application.
Having the following tables:
imagine the following scenario:
user creates a report (a line in TestReport, where Text is the report string content, and Ready is a bool flag, saying, if the report is ready to be processed); by default Ready is set to false, i.e. not ready.
user wants the report to be processed, so he submits it; Ready is set to true here.
The system gives an opportunity to recall the report back, if it has not been processed yet. So, when the report is recalled, Ready is set to false back. On the contrary, when the report is processed, a line in TestReportRef, referencing report by its Id, is created.
Now imagine that at one and the same moment
user wants to recall the report;
the report is added to the process list;
As soon as this can happen simultaneously, errors may occur. That is the report will have Ready == false and it'll be referenced in TestReportRef.
Here is a simple console example of how this may happen:
var dc = new TestDataContext('my connection string');
dc.TestReport.InsertOnSubmit(new TestReport
{
Text = "My report content",
Ready = true //ready at once
});
dc.SubmitChanges();
Action recallReport = () =>
{
var _dc = new TestDataContext(cs);
var report = _dc.TestReport.FirstOrDefault(t => t.Ready);
if (report != null && !report.TestReportRef.Any())
{
Thread.Sleep(1000);
report.Ready = false;
_dc.SubmitChanges();
}
};
Action acceptReport = () =>
{
var _dc = new TestDataContext(cs);
var report = _dc.TestReport.FirstOrDefault(t => t.Ready);
if (report != null && !report.TestReportRef.Any())
{
Thread.Sleep(1000);
_dc.TestReportRef.InsertOnSubmit(new TestReportRef
{
FK_ReportId = report.Id
});
_dc.SubmitChanges();
}
};
var task1 = new Task(recallReport);
var task2 = new Task(acceptReport);
task1.Start();
task2.Start();
task1.Wait();
task2.Wait();
foreach (var t in dc.TestReport)
{
Console.WriteLine(string.Format("{0}\t{1}\t{2}", t.Id, t.Text, t.Ready));
}
foreach (var t in dc.TestReportRef)
{
Console.WriteLine("ref id:\t" + t.FK_ReportId);
}
Thread.Sleep(1000); is added to be ensure, that tasks will check one and the same situation.
The given example may sound awkward, however, I hope, it should explain the issue I'm dealing with.
How can I avoid this? Making the repository singleton doesn't seem to be a good idea. Shall I use some shared mutex (one for all web requests) to separate write-operations only?
Or is there a pattern I should use in this kind of scenario?
This is only a simplified example of one of the scenarios I have. However, there are several scenarios in which it may run into a similar discrepancy. The best thing would be to make this kind of intersection impossible, I guess.
Why don't add a version column on the Report table? Task starts by tracking current version,when task end, if the version is the same that the tracked one, operation is ok, otherwise fail. If operation appear ok, update the version to version +1. This is a sort of optimistic lock; that implicitly suppose that conflicts may occur, but they are not so frequent.
UPDATE
If you are using linqto sql maybe you can have a check at the parameter UpdateCheck [Column(UpdateCheck=UpdateCheck.Always)]
This can be useful to handle concurrency in your case.