One of our clients has a Job Application Web Site built with ASP.NET and Dot Net Framework 4.8.
Over the past few weeks, owing to some performance issues on the main database server, we have started optimizing certain critical features of the application. One such feature is the ability for applicants to search and apply for jobs. There are two broad aspects to this:-
Applicants login and search for jobs, using a set of optional filters
Admins approve jobs (an approved job would immediately show up in the job search results for applicants)
To optimize this feature, we started using ObjectCache to store the Jobs Data and every search request is performed against this cache, instead of running a query on the database. So far we have seen good improvement in application performance when data is fetched from the cache and filters applied via C# code.
As of now, we have a singleton instance of Objectcache, with a lock in place for thread safety:
using System.Threading;
public class JobsDataCache
{
private static ObjectCache jobsDataCache = null;
private static readonly object _lock = new object();
private JobsDataCache() { }
public static ObjectCache GetInstance()
{
if (jobsDataCache == null)
{
lock(_lock)
{
if (jobsDataCache == null)
{
jobsDataCache = new MemoryCache("JobsDataCache");
}
}
}
return jobsDataCache;
}
}
These are the service class methods that provide search results and also manage the cache instance:-
public SearchJobsResponse SearchJobs(SearchJobsParam param, string user, bool isTestUser)
{
try
{
// Method to evaluate and refresh the data cache
EvaluateCache()
//... Remaining Logic for filtering and returning data to controller
}
}
private void EvaluateCache()
{
lock (_lock)
{
var SearchJobsData = JobsDataCache.GetInstance().Get("SearchJobsData");
// If there is data in cache, then assign to result set and return
if (SearchJobsData != null)
{
result = (List<SearchApplyJobs>)SearchJobsData;
}
else
{
// Refresh the cache - fetch latest data from DB
RefreshCacheData();
}
}
}
private void RefreshCacheData()
{
var GlobalQuery = ";with ROWCTE AS (" +
"SELECT t.Ad_Number, t.JobType, c.CategoryID, t.Cert_Code, d.District, t.District_Name, t.End_Date, t.InstructionalShowing, " +
"t.Job_Description, t.Long_Description_String, t.Job_Number, t.Post_Date, t.Region_Code, t.Region_Name, d.Short_Name, t.Start_Date, z.ZIP_Code, z.Latitude, z.Longitude " +
"FROM ApplicationType c " +
"JOIN Job_Ad t ON t.ApplicationType = c.ApplicationTypeID " +
"JOIN District d ON t.District_Code = d.District " +
"JOIN ZIPInfo z ON z.ZIP_Code = d.Zipcode" +
" WHERE (CONVERT(DATE, t.Post_Date) <= CONVERT(DATE, GETDATE()) AND CONVERT(DATE, t.End_Date) >= CONVERT(DATE, GETDATE())))" +
"SELECT Ad_Number, JobType, CategoryID, Cert_Code, District, District_Name, CAST(End_Date AS datetime) AS End_Date, " +
"InstructionalShowing, Job_Description, Long_Description_String, Job_Number, CAST(Post_Date AS datetime) AS Post_Date, Region_Code, " +
"Region_Name AS RegionCode, Short_Name, CAST(Start_Date AS datetime) AS Start_Date, ZIP_Code, Latitude, Longitude " +
"FROM ROWCTE ORDER BY Job_Number";
result = identityConnection.Database.SqlQuery<SearchJobs>(GlobalQuery).ToList();
if (result.Count > 0)
{
CacheItemPolicy policy = new CacheItemPolicy { AbsoluteExpiration = DateTimeOffset.Now.AddHours(2) };
JobsDataCache.GetInstance().Add("SearchJobsData", result, policy);
}
}
// Method that will be used to refresh the cache when a job is approved
public void ClearCacheAndEvaluate()
{
lock(_lock)
{
var data = JobsDataCache.GetInstance().Get("SearchJobsData");
if (data != null)
{
JobsDataCache.GetInstance().Remove("SearchJobsData");
RefreshCacheData();
}
}
}
As far as the job search goes, this approach is working really well. However, when it comes to admins approving jobs, we realized that the cache may have to be refreshed (get the latest data from the DB) every time a job is approved.
Based on usage statistics, there could be anywhere between 15 - 35 jobs approved per day, with perhaps a few minutes to few hours between approvals, based on the admin's discretion (it is a manual task and not automated yet).
From the bandwidth perspective, there is a possibilty of a job search happening every minute (around 1500 - 2000 applicants are logged in during peak time) versus job approvals happening every few minutes to few hours. However, we are not able to get around the fact that the cache will have to be refreshed after every job approval.
We have already tried to optimize the Job Search queries on the database side, but there are infrastructure issues which we are not able to investigate / troubleshoot as we do not have access to the server. The cache solution looks very promising, but there is this challenge of keeping it up to date in regular intervals, and that means a round trip to the database.
The only possible solution I have been able to think is that we refresh the cache after a certain number of approvals, let's say 5 - 7. But since this is a manual task, there might be extended periods of time when this number has not been reached and the cache does not have latest data. Given this situation, should we completely ditch the cache approach and keep focusing on creating optimized queries on the database side ?
The improved performance in the jobs search with cache would keep the client and users very happy, but if there is a slight delay owing to cache refreshes after every job approval, we are not sure what kind of an impression that would have on the client and users.
Any ideas that would help us retain the cache approach and provide a decent user experience would be really appreciated from this community. Happy to share further information and code if necessary.
Thanks
Related
I have an API that people are calling and I have a database containing statistics of the number of requests. All API requests are made by a user in a company. There's a row in the database per user per company per hour. Example:
| CompanyId | UserId| Date | Requests |
|-----------|-------|------------------|----------|
| 1 | 100 | 2020-01-30 14:00 | 4527 |
| 1 | 100 | 2020-01-30 15:00 | 43 |
| 2 | 201 | 2020-01-30 14:00 | 161 |
To avoid having to make a database call on every request, I've developed a service class in C# maintaining an in-memory representation of the statistics stored in a database:
public class StatisticsService
{
private readonly IDatabase database;
private readonly Dictionary<string, CompanyStats> statsByCompany;
private DateTime lastTick = DateTime.MinValue;
public StatisticsService(IDatabase database)
{
this.database = database;
this.statsByCompany = new Dictionary<string, CompanyStats>();
}
private class CompanyStats
{
public CompanyStats(List<UserStats> userStats)
{
UserStats = userStats;
}
public List<UserStats> UserStats { get; set; }
}
private class UserStats
{
public UserStats(string userId, int requests, DateTime hour)
{
UserId = userId;
Requests = requests;
Hour = hour;
Updated = DateTime.MinValue;
}
public string UserId { get; set; }
public int Requests { get; set; }
public DateTime Hour { get; set; }
public DateTime Updated { get; set; }
}
}
Every time someone calls the API, I'm calling an increment method on the StatisticsService:
public void Increment(string companyId, string userId)
{
var utcNow = DateTime.UtcNow;
EnsureCompanyLoaded(companyId, utcNow);
var currentHour = new DateTime(utcNow.Year, utcNow.Month, utcNow.Day, utcNow.Hour, 0, 0);
var stats = statsByCompany[companyId];
var userStats = stats.UserStats.FirstOrDefault(ls => ls.UserId == userId && ls.Hour == currentHour);
if (userStats == null)
{
var userStatsToAdd = new UserStats(userId, 1, currentHour);
userStatsToAdd.Updated = utcNow;
stats.UserStats.Add(userStatsToAdd);
}
else
{
userStats.Requests++;
userStats.Updated = utcNow;
}
}
The method loads the company into the cache if not already there (will publish EnsureCompanyLoaded in a bit). It then checks if there is a UserStats object for this hour for the user and company. If not it creates it and set Requests to 1. If other requests have already been made for this user, company, and current hour, it increments the number of requests by 1.
EnsureCompanyLoaded as promised:
private void EnsureCompanyLoaded(string companyId, DateTime utcNow)
{
if (statsByCompany.ContainsKey(companyId)) return;
var currentHour = new DateTime(utcNow.Year, utcNow.Month, utcNow.Day, utcNow.Hour, 0, 0); ;
var userStats = new List<UserStats>();
userStats.AddRange(database.GetAllFromThisMonth(companyId));
statsByCompany[companyId] = new CompanyStats(userStats);
}
The details behind loading the data from the database are hidden away behind the GetAllFromThisMonth method and not important to my question.
Finally, I have a timer that stores any updated results to the database every 5 minutes or when the process shuts down:
public void Tick(object state)
{
var utcNow = DateTime.UtcNow;
var currentHour = new DateTime(utcNow.Year, utcNow.Month, utcNow.Day, utcNow.Hour, 0, 0);
foreach (var companyId in statsByCompany.Keys)
{
var usersToUpdate = statsByCompany[companyId].UserStats.Where(ls => ls.Updated > lastTick);
foreach (var userStats in usersToUpdate)
{
database.Save(GenerateSomeEntity(userStats.Requests));
userStats.Updated = DateTime.MinValue;
}
}
// If we moved into new month since last tick, clear entire cache
if (lastTick.Month != utcNow.Month)
{
statsByCompany.Clear();
}
lastTick = utcNow;
}
I've done some single-threaded testing of the code and the concept seem to work out as expected. Now I want to migrate this to be thread-safe but cannot seem to figure out how to implement it the best way. I've looked at ConcurrentDictionary which might be needed. The main problem isn't on the dictionary methods, though. If two threads call Increment simultaneously, they could both end up in the EnsureCompanyLoaded method. I know of the concepts of lock in C#, but I'm afraid to just lock on every invocation and slow down performance that way.
Anyone needed something similar and have some good pointers in which direction I could go?
When keeping counters in memory like this you have two options:
Keep in memory the actual historic value of the counter
Keep in memory only the differential increment of the counter
I have used both approaches and I've found the second to be simpler, faster and safer. So my suggestion is to stop loading UserStats from the database, and just increment the in-memory counter starting from 0. Then every 5 minutes call a stored procedure that inserts or updates the related database record accordingly (while zero-ing the in-memory value). This way you'll eliminate the race conditions at the loading phase, and you'll ensure that every call to Increment will be consistently fast.
For thread-safety you can use either a normal Dictionary
with a lock, or a ConcurrentDictionary without lock. The first option is more flexible, and the second more efficient. If you choose Dictionary+lock, use the lock only for protecting the internal state of the Dictionary. Don't lock while updating the database. Before updating each counter take the current value from the dictionary and remove the entry in an atomic operation, and then issue the database command while other threads will be able to recreate the entry again if needed. The ConcurrentDictionary class contains a TryRemove method that can be used to achieve this goal without locking:
public bool TryRemove (TKey key, out TValue value);
It also contains a ToArray method that returns a snapshot of the entries in the dictionary. At first glance it seems that the ConcurrentDictionary suits your needs, so you could use it as a basis of your implementation and see how it goes.
To avoid having to make a database call on every request, I've
developed a service class in C# maintaining an in-memory
representation of the statistics stored in a database:
If you want to avoid Update race conditions, you should stop doing exactly that.
Databases by design, by purpose prevent simple update race conditions. This is a simple counting-up operation. A single DML statement. Implicity protected by transactions, journaling and locks. Indeed that is why calling them a lot is costly.
You are fighting the concurrency already there, by adding that service. You are also moving a DB job outside of the DB. And Moving DB jobs outside of the DB, is just going to cause issues.
If your worry is speed:
Please read the Speed Rant.
Maybe a Dsitributed Database Design is the droid you are looking for? They had a massive surge in popularity since Mobile Devices have proliferated, both for speed and reliability reasons.
In general, to make your code thread-safe:
Use concurrent collections, such as ConcurrentDictionary
Make sure to understand concepts such as lock statement, Monitor.Wait and Mintor.PulseAll in tutorials. Locks can be slow if IO operations (such as disk write/read) it being locked on, but for something in RAM it is not necessary to worrry about. If you have really some lengthy operation such as IO or http requests, consider using ConcurrentQueue and learn about the consumer-producer pattern to process work in queues by many workers (example)
You can also try Redis server to cache database without need to design something from zero.
You can also make your service singleton, and update database only after value changes. For reading value, you have already stored it in your service.
I have developed an application to online purchasing my products.
I have a product "Umbrellas" in my store with 100 pieces. I have developed an application to online purchasing my products.
But there is an issue when there is a concurrent purchasing.
If there is a two concurrent purchasing happening the AvailableQty will update incorrectly. Let's say there are two transactions happening concurrently with Purchasing Qty as 100 & 50. Ideally, the first transaction (purchase qty is 100) should be successful as we have 100 stocks available. But the second transaction should return an error because the stock is insufficient to process as with the first transaction the balance is 0. (100 - 100). But above scenario both transactions are successful and the balance shows as -50 now.
This will work correctly when there are two separate transactions. But this is an issue when this two transactions happening CONCURRENTLY. The reason for this problem is, when concurrent transactions the condition to check the availability hits same time, in that time the condition is satisfied as the DB table has not updated with the latest qty.
How can I correct this?
public bool UpdateStock(int productId, int purchaseQty)
{
using(var db = new MyEntities())
{
var stock = db.Products.Find(productId);
if (stock.AvailableQty >= purchaseQty) // Condition to check the availablity
{
stock.AvailableQty = stock.AvailableQty - purchaseQty;
db.SaveChanges();
return true;
}
else
{
return false;
}
}
}
This is typical thread concurrency issue which can be resolved in multiple ways, one of them is using simple lock statement:
public class StockService
{
private readonly object _availableQtyLock = new object();
public bool UpdateStock(int productId, int purchaseQty)
{
using (var db = new MyEntities())
{
lock (_availableQtyLock)
{
var stock = db.Products.Find(productId);
if (stock.AvailableQty >= purchaseQty) // Condition to check the availablity
{
stock.AvailableQty = stock.AvailableQty - purchaseQty;
db.SaveChanges();
return true;
}
return false;
}
}
}
}
Only one thread can get a exclusive rights to get a lock on _availableQtyLock, which means other thread will have to wait for the first thread to release lock on that object.
Take into account this is the simplest (and possibly slowest) way of dealing with concurrency, there are other ways to do thread synchronization, e.g. Monitor, Semaphore, fast SlimLock etc... Since it's hard to tell which one will suit your needs the best, you'll need to do proper performance/stress testing, but my advice would be to start with simplest.
Note: As others mentioned in comments, concurrency issues can be done on DB level as well, which indeed would be more suitable, but if you don't want/can't introduce any DB changes, this would be way to go
I wrote a library, referenced by numerous applications, that tracks who is online and which application and page they are viewing.
The data is stored, using EF6, in a Sql Server 2008 table which tracks their username (primary key), application, page and timestamp. I only want to store the latest request for each person so each username should only be stored once.
The library code, which is called from the Global.asax of each application looks like this:
public static void Add(ApplicationType application, string username, string pageRequested)
{
using (var db = new CommonDAL()) // EF context
{
var exists = db.ActiveUsers.Find(username);
if (exists != null)
db.ActiveUsers.Remove(exists);
var activeUser = new ActiveUser() { ApplicationID = application.Value(), Username = username, PageRequested = pageRequested, TimeRequested = DateTime.Now };
db.ActiveUsers.Add(activeUser);
db.SaveChanges();
}
}
I'm intermittently getting the error Violation of PRIMARY KEY constraint 'PK_tblActiveUser_Username'. Cannot insert duplicate key in object 'dbo.tblActiveUser'. The duplicate key value is (xxxxxxxx)
What I can only guess is happening is Request A comes in, removes the existing username. Request B (from same user) then comes in, tries to remove the username, sees nothing exists. Request A then adds the username. Request B then tries to add the username. The error frequently seems to be triggered when a web server sends a client a 401 status, which again points to multiple requests within a short period of time triggering this.
I'm having trouble mocking this race condition using unit tests as I haven't done much async programming before, but tried to create async tests with delays to mock multiple simultaneous slow requests. I've tried to use using (var transaction = new TransactionScope()) and using (var transaction = db.Database.BeginTransaction(System.Data.IsolationLevel.ReadCommitted)) to lock the requests so request A can complete before request B begins but can't verify either one fixes the issue as I can't mock the situation reliably.
1) Which is the right way to prevent the exception (Most recent request is the one that ultimately is stored)?
2) Which is the right way to to write a unit test to prove this is working?
Since you only want to store the latest item, you could use a last update wins and avoid the race condition on who can insert first, the database handles the locks and the last to call update (which is the most recent) is what is in the table.
Something like the following should handle any primary key errors if you run into concurrency issues on the edge case that a brand new user has 2 requests at the same time and avoid an "infinite" loop of errors (well until a stack overflow exception any way).
public static void Add(ApplicationType application,
string username,
string pageRequested,
int recursionCount = 0)
{
using (var db = new CommonDAL()) // EF context
{
var exists = db.ActiveUsers.Find(username);
if (exists != null)
{
exists.propa = "someVal";
}
else
{
var activeUser = new ActiveUser
{
ApplicationID = application.Value(),
Username = username,
PageRequested = pageRequested,
TimeRequested = DateTime.Now
};
db.ActiveUsers.Add(activeUser);
}
try
{
db.SaveChanges();
}
catch(<Primary Key Violation>)
{
if(recursionCount < x)
{
Add(application, username, pageRequested, recursionCount++)
}
else
{
throw;
}
}
}
}
As for unit testing this, it will be very hard unless you insert an artificial delay or can force both threads to run at the same time. Sometimes the timing on the race conditions is in the millisecond range depending on the issue. Tasks may not work because they are not guaranteed to run at the same time, you throw them to the background thread pool and they run when they can. Old school threads may work but I don't know how to force it since the time between read and remove & create are most likely in the 5 ms range or less.
Assume I have an account_profile table, which has Score field that is similar to an account's money (the database type is BIGINT(20) and the EntityFramework type is long, because I don't need decimal). Now I have the following function:
public long ChangeScoreAmount(int userID, long amount)
{
var profile = this.Entities.account_profile.First(q => q.AccountID == userID);
profile.Score += amount;
this.Entities.SaveChanges();
return profile.Score;
}
However, I afraid that when ChangeScoreAmount are called multiple times concurrently, the final amount won't be correct.
Here are my current solutions I am thinking of:
Adding a lock with a static locking variable in the ChangeScoreAmount function, since the class itself may be instantiated multiple times when needed. It looks like this:
public long ChangeScoreAmount(int userID, long amount)
{
lock (ProfileBusiness.scoreLock)
{
var profile = this.Entities.account_profile.First(q => q.AccountID == userID);
profile.Score += amount;
this.Entities.SaveChanges();
return profile.Score;
}
}
The problem is, I have never tried a lock on static variable, so I don't know if it is really safe and if any deadlock would occur. Moreover, it may be bad if somewhere else outside this function, a change to Score field is applied midway.
OK this is no longer an option, because my server application will be run on multiple sites, that means the locking variable cannot be used
Creating a Stored Procedure in the database and call that Stored procedure in the function. However, I don't know if there is an "atomic" way to create that Store Procedure, so that it can only be called once at a time, since I still need to retrieve the value, changing it then update it again?
I am using MySQL Community 5.6.24 and MySQL .NET Connector 6.9.6 in case it matters.
NOTE My server application may be runned on multiple server machines.
You can use sql transactions with repeatable read isolation level instead of locking on the application. For example you can write
public long ChangeScoreAmount(int userID, long amount)
{
using(var ts = new TransactionScope(TransactionScopeOption.RequiresNew,
new TransactionOptions { IsolationLevel = IsolationLevel.RepeatableRead })
{
var profile = this.Entities.account_profile.First(q => q.AccountID == userID);
profile.Score += amount;
this.Entities.SaveChanges();
ts.Complete();
return profile.Score;
}
}
Transaction garantees that accountprofile record will not changed in db while you aren't commit or rollback.
As always, im quite the noob, as im sure you will see from both my code and question. For practice im currently writing an Xamarin.Android app for a game called Eve Online. People there mine resources from planets to make cash. These mines have to be reset at different intervals, and the real pros can have up to 30 characters doing it. Each character can have 5 planets, usually there are at least 2 mines (extractors) on each. So there could be 300 timers going on.
In my app you save your characters in an sqlite db, and every hour a intentservice runs through the API and checks your times and if their expired or not. This is how i do that:
public async Task PullPlanets(long KeyID, long CharacterID, string VCode, string CharName)
{
XmlReader lesern = XmlReader.Create("https://api.eveonline.com/char/PlanetaryColonies.xml.aspx?keyID=" + KeyID + "&vCode=" + VCode + "&characterID=" + CharacterID);
while (lesern.Read())
{
long planet = 0;
string planetName;
planet = Convert.ToInt64(lesern.GetAttribute("planetID"));
planetName = lesern.GetAttribute("planetName");
if ((planet != 0) && (planetName != null))
{
planets.Add(planet);
planetNames.Add(planetName);
await GetExpirationTimes(CharName, planet, planetName, KeyID, CharacterID, VCode);
}
}
lesern.Close ();
}
public async Task GetExpirationTimes(string CharName, long planetID, string planetName, long KeyID, long CharacterID, string VCode)
{
string planet = planetID.ToString();
XmlReader lesern = XmlReader.Create("https://api.eveonline.com/char/PlanetaryPins.xml.aspx?keyID=" + KeyID + "&vCode=" + VCode + "&characterID=" + CharacterID + "&planetID=" + planet);
while (lesern.Read())
{
string expTime;
expTime = lesern.GetAttribute("expiryTime");
if ((expTime != null) && (expTime != "0001-01-01 00:00:00"))
{
allInfo.Add (new AllInfo (CharName, planetName, Convert.ToDateTime (expTime)));
}
}
lesern.Close ();
SendOrderedBroadcast (stocksIntent, null);
}
}
After this, it sends the times back to my Activity, where they get added to an extractor. It seems to work pretty fine, although ive only been able to test with 2 characters with a total of 14 extractors so far. An alarmmanger in activity calls the service every hour, and it sends a notification. When user opens the activity, it pulls the list from service, sorts it, and displays it. I would welcome input on if this is the way to do it.
I do see a problem in the horizon, though. The Eve API blocks if an app surpases 30 API-calls per second. Im pretty sure someone with 30 characters would do that. So, im wondering if i should add something to delay each call if a certain number is passed? This is how i call the first XML call.
var table = db.Table<CharsList> ();
foreach (var e in table) {
long KeyIDOut = Convert.ToInt64(e.KeyID);
long CharIDOut = Convert.ToInt64(e.CharacterID);
string VCodeOut = e.VCode.ToString();
string navnOut = e.Name.ToString();
PullPlanets(KeyIDOut, CharIDOut, VCodeOut, navnOut);
}
CheckTimes ();
}
Is it viable to add a
if (table.Count > 10) {
foreach (var e in table) {
//start the first characters call
Thread.Sleep(100)
}
The service is intentservice and not on UI thread. I guess this would bring the calls under 30 a sec, but i have never used Thread.Sleep and fear what else could happen in my code. Are there other things that could help me not blow the limit? Can this code handle 300 extractors?
I believe you are generally right in your approach. I had to do a similar thing for a reddit client I was writing, except their limits is once a second or so.
The only problem I see with your setup is that assume that Thread.Sleep does sleep for the amount of time you give it. Spurious wakeups are possible in some cases, so what I would suggest is that you give it a smaller value, save the last time you accessed the service and then put a loop around the sleep call that terminates once enough time has passed.
Finally if you are going to be firing up a lot of intent services for a relatively short amount of work, you might want to have a normal service with a thread to handle the work - that way it will only have to be created once but it is still of the UI thread.