Creating a thread-safe version of a c# statistics service - c#

I have an API that people are calling and I have a database containing statistics of the number of requests. All API requests are made by a user in a company. There's a row in the database per user per company per hour. Example:
| CompanyId | UserId| Date | Requests |
|-----------|-------|------------------|----------|
| 1 | 100 | 2020-01-30 14:00 | 4527 |
| 1 | 100 | 2020-01-30 15:00 | 43 |
| 2 | 201 | 2020-01-30 14:00 | 161 |
To avoid having to make a database call on every request, I've developed a service class in C# maintaining an in-memory representation of the statistics stored in a database:
public class StatisticsService
{
private readonly IDatabase database;
private readonly Dictionary<string, CompanyStats> statsByCompany;
private DateTime lastTick = DateTime.MinValue;
public StatisticsService(IDatabase database)
{
this.database = database;
this.statsByCompany = new Dictionary<string, CompanyStats>();
}
private class CompanyStats
{
public CompanyStats(List<UserStats> userStats)
{
UserStats = userStats;
}
public List<UserStats> UserStats { get; set; }
}
private class UserStats
{
public UserStats(string userId, int requests, DateTime hour)
{
UserId = userId;
Requests = requests;
Hour = hour;
Updated = DateTime.MinValue;
}
public string UserId { get; set; }
public int Requests { get; set; }
public DateTime Hour { get; set; }
public DateTime Updated { get; set; }
}
}
Every time someone calls the API, I'm calling an increment method on the StatisticsService:
public void Increment(string companyId, string userId)
{
var utcNow = DateTime.UtcNow;
EnsureCompanyLoaded(companyId, utcNow);
var currentHour = new DateTime(utcNow.Year, utcNow.Month, utcNow.Day, utcNow.Hour, 0, 0);
var stats = statsByCompany[companyId];
var userStats = stats.UserStats.FirstOrDefault(ls => ls.UserId == userId && ls.Hour == currentHour);
if (userStats == null)
{
var userStatsToAdd = new UserStats(userId, 1, currentHour);
userStatsToAdd.Updated = utcNow;
stats.UserStats.Add(userStatsToAdd);
}
else
{
userStats.Requests++;
userStats.Updated = utcNow;
}
}
The method loads the company into the cache if not already there (will publish EnsureCompanyLoaded in a bit). It then checks if there is a UserStats object for this hour for the user and company. If not it creates it and set Requests to 1. If other requests have already been made for this user, company, and current hour, it increments the number of requests by 1.
EnsureCompanyLoaded as promised:
private void EnsureCompanyLoaded(string companyId, DateTime utcNow)
{
if (statsByCompany.ContainsKey(companyId)) return;
var currentHour = new DateTime(utcNow.Year, utcNow.Month, utcNow.Day, utcNow.Hour, 0, 0); ;
var userStats = new List<UserStats>();
userStats.AddRange(database.GetAllFromThisMonth(companyId));
statsByCompany[companyId] = new CompanyStats(userStats);
}
The details behind loading the data from the database are hidden away behind the GetAllFromThisMonth method and not important to my question.
Finally, I have a timer that stores any updated results to the database every 5 minutes or when the process shuts down:
public void Tick(object state)
{
var utcNow = DateTime.UtcNow;
var currentHour = new DateTime(utcNow.Year, utcNow.Month, utcNow.Day, utcNow.Hour, 0, 0);
foreach (var companyId in statsByCompany.Keys)
{
var usersToUpdate = statsByCompany[companyId].UserStats.Where(ls => ls.Updated > lastTick);
foreach (var userStats in usersToUpdate)
{
database.Save(GenerateSomeEntity(userStats.Requests));
userStats.Updated = DateTime.MinValue;
}
}
// If we moved into new month since last tick, clear entire cache
if (lastTick.Month != utcNow.Month)
{
statsByCompany.Clear();
}
lastTick = utcNow;
}
I've done some single-threaded testing of the code and the concept seem to work out as expected. Now I want to migrate this to be thread-safe but cannot seem to figure out how to implement it the best way. I've looked at ConcurrentDictionary which might be needed. The main problem isn't on the dictionary methods, though. If two threads call Increment simultaneously, they could both end up in the EnsureCompanyLoaded method. I know of the concepts of lock in C#, but I'm afraid to just lock on every invocation and slow down performance that way.
Anyone needed something similar and have some good pointers in which direction I could go?

When keeping counters in memory like this you have two options:
Keep in memory the actual historic value of the counter
Keep in memory only the differential increment of the counter
I have used both approaches and I've found the second to be simpler, faster and safer. So my suggestion is to stop loading UserStats from the database, and just increment the in-memory counter starting from 0. Then every 5 minutes call a stored procedure that inserts or updates the related database record accordingly (while zero-ing the in-memory value). This way you'll eliminate the race conditions at the loading phase, and you'll ensure that every call to Increment will be consistently fast.
For thread-safety you can use either a normal Dictionary
with a lock, or a ConcurrentDictionary without lock. The first option is more flexible, and the second more efficient. If you choose Dictionary+lock, use the lock only for protecting the internal state of the Dictionary. Don't lock while updating the database. Before updating each counter take the current value from the dictionary and remove the entry in an atomic operation, and then issue the database command while other threads will be able to recreate the entry again if needed. The ConcurrentDictionary class contains a TryRemove method that can be used to achieve this goal without locking:
public bool TryRemove (TKey key, out TValue value);
It also contains a ToArray method that returns a snapshot of the entries in the dictionary. At first glance it seems that the ConcurrentDictionary suits your needs, so you could use it as a basis of your implementation and see how it goes.

To avoid having to make a database call on every request, I've
developed a service class in C# maintaining an in-memory
representation of the statistics stored in a database:
If you want to avoid Update race conditions, you should stop doing exactly that.
Databases by design, by purpose prevent simple update race conditions. This is a simple counting-up operation. A single DML statement. Implicity protected by transactions, journaling and locks. Indeed that is why calling them a lot is costly.
You are fighting the concurrency already there, by adding that service. You are also moving a DB job outside of the DB. And Moving DB jobs outside of the DB, is just going to cause issues.
If your worry is speed:
Please read the Speed Rant.
Maybe a Dsitributed Database Design is the droid you are looking for? They had a massive surge in popularity since Mobile Devices have proliferated, both for speed and reliability reasons.

In general, to make your code thread-safe:
Use concurrent collections, such as ConcurrentDictionary
Make sure to understand concepts such as lock statement, Monitor.Wait and Mintor.PulseAll in tutorials. Locks can be slow if IO operations (such as disk write/read) it being locked on, but for something in RAM it is not necessary to worrry about. If you have really some lengthy operation such as IO or http requests, consider using ConcurrentQueue and learn about the consumer-producer pattern to process work in queues by many workers (example)
You can also try Redis server to cache database without need to design something from zero.
You can also make your service singleton, and update database only after value changes. For reading value, you have already stored it in your service.

Related

Optimization challenges with Objectcache in ASP.NET

One of our clients has a Job Application Web Site built with ASP.NET and Dot Net Framework 4.8.
Over the past few weeks, owing to some performance issues on the main database server, we have started optimizing certain critical features of the application. One such feature is the ability for applicants to search and apply for jobs. There are two broad aspects to this:-
Applicants login and search for jobs, using a set of optional filters
Admins approve jobs (an approved job would immediately show up in the job search results for applicants)
To optimize this feature, we started using ObjectCache to store the Jobs Data and every search request is performed against this cache, instead of running a query on the database. So far we have seen good improvement in application performance when data is fetched from the cache and filters applied via C# code.
As of now, we have a singleton instance of Objectcache, with a lock in place for thread safety:
using System.Threading;
public class JobsDataCache
{
private static ObjectCache jobsDataCache = null;
private static readonly object _lock = new object();
private JobsDataCache() { }
public static ObjectCache GetInstance()
{
if (jobsDataCache == null)
{
lock(_lock)
{
if (jobsDataCache == null)
{
jobsDataCache = new MemoryCache("JobsDataCache");
}
}
}
return jobsDataCache;
}
}
These are the service class methods that provide search results and also manage the cache instance:-
public SearchJobsResponse SearchJobs(SearchJobsParam param, string user, bool isTestUser)
{
try
{
// Method to evaluate and refresh the data cache
EvaluateCache()
//... Remaining Logic for filtering and returning data to controller
}
}
private void EvaluateCache()
{
lock (_lock)
{
var SearchJobsData = JobsDataCache.GetInstance().Get("SearchJobsData");
// If there is data in cache, then assign to result set and return
if (SearchJobsData != null)
{
result = (List<SearchApplyJobs>)SearchJobsData;
}
else
{
// Refresh the cache - fetch latest data from DB
RefreshCacheData();
}
}
}
private void RefreshCacheData()
{
var GlobalQuery = ";with ROWCTE AS (" +
"SELECT t.Ad_Number, t.JobType, c.CategoryID, t.Cert_Code, d.District, t.District_Name, t.End_Date, t.InstructionalShowing, " +
"t.Job_Description, t.Long_Description_String, t.Job_Number, t.Post_Date, t.Region_Code, t.Region_Name, d.Short_Name, t.Start_Date, z.ZIP_Code, z.Latitude, z.Longitude " +
"FROM ApplicationType c " +
"JOIN Job_Ad t ON t.ApplicationType = c.ApplicationTypeID " +
"JOIN District d ON t.District_Code = d.District " +
"JOIN ZIPInfo z ON z.ZIP_Code = d.Zipcode" +
" WHERE (CONVERT(DATE, t.Post_Date) <= CONVERT(DATE, GETDATE()) AND CONVERT(DATE, t.End_Date) >= CONVERT(DATE, GETDATE())))" +
"SELECT Ad_Number, JobType, CategoryID, Cert_Code, District, District_Name, CAST(End_Date AS datetime) AS End_Date, " +
"InstructionalShowing, Job_Description, Long_Description_String, Job_Number, CAST(Post_Date AS datetime) AS Post_Date, Region_Code, " +
"Region_Name AS RegionCode, Short_Name, CAST(Start_Date AS datetime) AS Start_Date, ZIP_Code, Latitude, Longitude " +
"FROM ROWCTE ORDER BY Job_Number";
result = identityConnection.Database.SqlQuery<SearchJobs>(GlobalQuery).ToList();
if (result.Count > 0)
{
CacheItemPolicy policy = new CacheItemPolicy { AbsoluteExpiration = DateTimeOffset.Now.AddHours(2) };
JobsDataCache.GetInstance().Add("SearchJobsData", result, policy);
}
}
// Method that will be used to refresh the cache when a job is approved
public void ClearCacheAndEvaluate()
{
lock(_lock)
{
var data = JobsDataCache.GetInstance().Get("SearchJobsData");
if (data != null)
{
JobsDataCache.GetInstance().Remove("SearchJobsData");
RefreshCacheData();
}
}
}
As far as the job search goes, this approach is working really well. However, when it comes to admins approving jobs, we realized that the cache may have to be refreshed (get the latest data from the DB) every time a job is approved.
Based on usage statistics, there could be anywhere between 15 - 35 jobs approved per day, with perhaps a few minutes to few hours between approvals, based on the admin's discretion (it is a manual task and not automated yet).
From the bandwidth perspective, there is a possibilty of a job search happening every minute (around 1500 - 2000 applicants are logged in during peak time) versus job approvals happening every few minutes to few hours. However, we are not able to get around the fact that the cache will have to be refreshed after every job approval.
We have already tried to optimize the Job Search queries on the database side, but there are infrastructure issues which we are not able to investigate / troubleshoot as we do not have access to the server. The cache solution looks very promising, but there is this challenge of keeping it up to date in regular intervals, and that means a round trip to the database.
The only possible solution I have been able to think is that we refresh the cache after a certain number of approvals, let's say 5 - 7. But since this is a manual task, there might be extended periods of time when this number has not been reached and the cache does not have latest data. Given this situation, should we completely ditch the cache approach and keep focusing on creating optimized queries on the database side ?
The improved performance in the jobs search with cache would keep the client and users very happy, but if there is a slight delay owing to cache refreshes after every job approval, we are not sure what kind of an impression that would have on the client and users.
Any ideas that would help us retain the cache approach and provide a decent user experience would be really appreciated from this community. Happy to share further information and code if necessary.
Thanks

Trying to find a lock-less solution for a C# concurrent queue

I have the following code in C#:
(_StoreQueue is a ConcurrentQueue)
var S = _StoreQueue.FirstOrDefault(_ => _.TimeStamp == T);
if (S == null)
{
lock (_QueueLock)
{
// try again
S = _StoreQueue.FirstOrDefault(_ => _.TimeStamp == T);
if (S == null)
{
S = new Store(T);
_StoreQueue.Enqueue(S);
}
}
}
The system is collecting data in real time (fairly high frequency, around 300-400 calls / second) and puts it in bins (Store objects) that represent a 5 second interval. These bins are in a queue as they get written and the queue gets emptied as data is processed and written.
So, when data is arriving, a check is done to see if there is a bin for that timestamp (rounded by 5 seconds), if not, one is created.
Since this is quite heavily multi-threaded, the system goes with the following logic:
If there is a bin, it is used to put data.
If there is no bin, a lock gets initiated and within that lock, the check is done again to make sure it wasn't created by another thread in the meantime. and if there is still no bin, one gets created.
With this system, the lock is roughly used once every 2k calls
I am trying to see if there is a way to remove the lock, but it is mostly because I'm thinking there has to be a better solution that the double check.
An alternative I have been thinking about is to create empty bins ahead of time and that would entirely remove the need for any locks, but the search for the right bin would become slower as it would have to scan the list pre-built bins to find the proper one.
Using a ConcurrentDictionary can fix the issue you are having. Here i assumed a type double for your TimeStamp property but it can be anything, as long as you make the ConcurrentDictionary key match the type.
class Program
{
ConcurrentDictionary<double, Store> _StoreQueue = new ConcurrentDictionary<double, Store>();
static void Main(string[] args)
{
var T = 17d;
// try to add if not exit the store with 17
_StoreQueue.GetOrAdd(T, new Store(T));
}
public class Store
{
public double TimeStamp { get; set; }
public Store(double timeStamp)
{
TimeStamp = timeStamp;
}
}
}

Update quantity issue with Concurrent Transactions C#

I have developed an application to online purchasing my products.
I have a product "Umbrellas" in my store with 100 pieces. I have developed an application to online purchasing my products.
But there is an issue when there is a concurrent purchasing.
If there is a two concurrent purchasing happening the AvailableQty will update incorrectly. Let's say there are two transactions happening concurrently with Purchasing Qty as 100 & 50. Ideally, the first transaction (purchase qty is 100) should be successful as we have 100 stocks available. But the second transaction should return an error because the stock is insufficient to process as with the first transaction the balance is 0. (100 - 100). But above scenario both transactions are successful and the balance shows as -50 now.
This will work correctly when there are two separate transactions. But this is an issue when this two transactions happening CONCURRENTLY. The reason for this problem is, when concurrent transactions the condition to check the availability hits same time, in that time the condition is satisfied as the DB table has not updated with the latest qty.
How can I correct this?
public bool UpdateStock(int productId, int purchaseQty)
{
using(var db = new MyEntities())
{
var stock = db.Products.Find(productId);
if (stock.AvailableQty >= purchaseQty) // Condition to check the availablity
{
stock.AvailableQty = stock.AvailableQty - purchaseQty;
db.SaveChanges();
return true;
}
else
{
return false;
}
}
}
This is typical thread concurrency issue which can be resolved in multiple ways, one of them is using simple lock statement:
public class StockService
{
private readonly object _availableQtyLock = new object();
public bool UpdateStock(int productId, int purchaseQty)
{
using (var db = new MyEntities())
{
lock (_availableQtyLock)
{
var stock = db.Products.Find(productId);
if (stock.AvailableQty >= purchaseQty) // Condition to check the availablity
{
stock.AvailableQty = stock.AvailableQty - purchaseQty;
db.SaveChanges();
return true;
}
return false;
}
}
}
}
Only one thread can get a exclusive rights to get a lock on _availableQtyLock, which means other thread will have to wait for the first thread to release lock on that object.
Take into account this is the simplest (and possibly slowest) way of dealing with concurrency, there are other ways to do thread synchronization, e.g. Monitor, Semaphore, fast SlimLock etc... Since it's hard to tell which one will suit your needs the best, you'll need to do proper performance/stress testing, but my advice would be to start with simplest.
Note: As others mentioned in comments, concurrency issues can be done on DB level as well, which indeed would be more suitable, but if you don't want/can't introduce any DB changes, this would be way to go

Ensure concurrent (money) transactions in Entity Framework?

Assume I have an account_profile table, which has Score field that is similar to an account's money (the database type is BIGINT(20) and the EntityFramework type is long, because I don't need decimal). Now I have the following function:
public long ChangeScoreAmount(int userID, long amount)
{
var profile = this.Entities.account_profile.First(q => q.AccountID == userID);
profile.Score += amount;
this.Entities.SaveChanges();
return profile.Score;
}
However, I afraid that when ChangeScoreAmount are called multiple times concurrently, the final amount won't be correct.
Here are my current solutions I am thinking of:
Adding a lock with a static locking variable in the ChangeScoreAmount function, since the class itself may be instantiated multiple times when needed. It looks like this:
public long ChangeScoreAmount(int userID, long amount)
{
lock (ProfileBusiness.scoreLock)
{
var profile = this.Entities.account_profile.First(q => q.AccountID == userID);
profile.Score += amount;
this.Entities.SaveChanges();
return profile.Score;
}
}
The problem is, I have never tried a lock on static variable, so I don't know if it is really safe and if any deadlock would occur. Moreover, it may be bad if somewhere else outside this function, a change to Score field is applied midway.
OK this is no longer an option, because my server application will be run on multiple sites, that means the locking variable cannot be used
Creating a Stored Procedure in the database and call that Stored procedure in the function. However, I don't know if there is an "atomic" way to create that Store Procedure, so that it can only be called once at a time, since I still need to retrieve the value, changing it then update it again?
I am using MySQL Community 5.6.24 and MySQL .NET Connector 6.9.6 in case it matters.
NOTE My server application may be runned on multiple server machines.
You can use sql transactions with repeatable read isolation level instead of locking on the application. For example you can write
public long ChangeScoreAmount(int userID, long amount)
{
using(var ts = new TransactionScope(TransactionScopeOption.RequiresNew,
new TransactionOptions { IsolationLevel = IsolationLevel.RepeatableRead })
{
var profile = this.Entities.account_profile.First(q => q.AccountID == userID);
profile.Score += amount;
this.Entities.SaveChanges();
ts.Complete();
return profile.Score;
}
}
Transaction garantees that accountprofile record will not changed in db while you aren't commit or rollback.

Ideas on logic/algorithm and how to prevent race in threaded writes to SqlServer

I have the following logic:
public void InQueueTable(DataTable Table)
{
int incomingRows = Table.Rows.Count;
if (incomingRows >= RowsThreshold)
{
// asyncWriteRows(Table)
return;
}
if ((RowsInMemory + incomingRows) >= RowsThreshold)
{
// copy and clear internal table
// asyncWriteRows(copyTable)
}
internalTable.Merge(Table);
}
There is one problem with this lagorithm:
Given RowsThreshold = 10000
If incomingRows puts RowsInMemory
over RowsThreshold: (1)
asynchronously write out data, (2)
merge incoming data
If incomingRows is over
RowsThreshold, asynchronously write
incoming data
But what if??? Assume a second thread spins up and calls asyncWriteRows(xxxTable); also, that each thread owning the asynchronous method will be writing to the same table in SqlServer: Does SqlServer handle this sort of multi-threaded write functionality to the same table?
Follow up
Based on Greg D's suggestion:
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connectionString,
sqlBulkCopyOptions.KeepIdentity | SqlBulkCopyOptions.UseInternalTransaction))
{
// perform bulkcopy
}
Regardless, I still have the issue of signaling the asyncWriteRows(copyTable). The algorithm needs to determine the need to go ahead and copy internalTable, clear internalTable, and asyncWriteRows(copyTable). I think that what I need to do is move the internalTable.Copy() call to it's own method:
private DataTable CopyTable (DataTable srcTable)
{
lock (key)
{
return srcTable.Copy();
}
}
...and then the following changes to the InQueue method:
public void InQueueTable(DataTable Table)
{
int incomingRows = Table.Rows.Count;
if (incomingRows >= RowsThreshold)
{
// asyncWriteRows(Table)
return;
}
if ((RowsInMemory + incomingRows) >= RowsThreshold)
{
// copy and clear internal table
// asyncWriteRows(CopyTable(Table))
}
internalTable.Merge(Table);
}
...finally, add a callback method:
private void WriteCallback(Object iaSyncResult)
{
int rowCount = (int)iaSyncResult.AsyncState;
if (RowsInMemory >= rowCount)
{
asyncWriteRows(CopyTable(internalTable));
}
}
This is what I have determined as a solution. Any feedback?
Is there some reason you can't use transactions?
I'll admit now that I'm not an expert in this field.
With transactions and cursors you will get lock escalation if your operation is large. E.g. your operation will start locking a row, then a page then a table if it needs to, preventing other operations from functioning.
The idiot that I was assumed that SQL Server would just queue these blocked operations up and wait for locks to be released, but it just returns errors and it's up to the API programmer to keep retrying (someone correct me if I'm wrong, or if it's fixed in a later version).
If you are happy to be reading possibly old data that you then copy over, like we were, we changed our isolation mode to stop the server blocking operations unnecessarily.
ALTER DATABASE [dbname] SET READ_COMMITTED_SNAPSHOT ON;
You may also alter your insert statments to use NOLOCK. But please read up on this.

Categories