Threads, Events and databases

Threads, Events and databases - c#

Ok so I am not very familiar with databases so there may be a simple solution that I am not aware of.
I have a SQL database that is to be managed by a class in my c# application. What I want the class to do is to constantly check the database to see if there is new data. If there is new data, I want it to trigger an event that another class will be listening to. Now I'm guessing that I need to implement a thread that will check the database at every other ms or something. However, what would I need to look for in order to fire my event? Can the database notify the class when there is a new entry?

If you are using MS SQLServer, you can use the SqlDependency class from the .NET Framework to get notifications about database changes.
Maybe other database systems have similar mechanisms in their database driver packages.
If you cannot use that for whatever reason, you will need a Thread to poll the database periodically.

1.If you want the database to inform your Application about a change then you can user Broker(first you enable your database to support Brokers and then you write some code so as to "attach" the Broker.). For your Application you will need SqlDependency Class.
Helpful links:
Enable Broker
Query Notifications in SQL Server
If you want to check multiple Queries then be aware that Broker is a little haevy.
2.If you want your application to do all the work you have to create a function that will check the CKECKSUM for the selected table, each time you will keep the last checksum and if you find any difference then you will "hit" the database to get the new data.
You have to decide who is going to do all your job!
Hope it helps.

Other than using SqlDependency, you can use a Timer, or SqlCacheDependency if you are using ASP.NET or MVC with the Cache object. 1ms intervals are not recommended though as you probably wont complete your check before the next one starts, and your database load will be very high as a result. You could also make sure you use the Timer.AutoReset property so you don't have calls tripping over each other.
Edit 2: This MSDN example shows how you can use SqlDependency, including having to Enable Query Notifications (MSDN). There are many considerations for using SqlDependency, for example it was really designed for web servers where limited watchers would be created, not so much for desktop applications, so keep that in mind. There is a good article on BOL on this called Planning for Notifications which emphasises that Query notifications are useful
if the data in the query changes relatively infrequently, if the application does not require an instantaneous update when the data changes, and if the query meets the requirements and restrictions outlined in Creating a Query for Notification
In your sample you suggest the need for 1ms latency, so maybe the Dependency classes are not the best way for you (also see my later comment on your latency requirement).
EDIT: For example (using the timer):
class Program
{
static void Main(string[] args)
{
Timer timer = new Timer(1);
timer.Elapsed += timer_Elapsed;
timer.AutoReset = false;
timer.Enabled = true;
}
static void timer_Elapsed(object sender, ElapsedEventArgs e)
{
Timer timer = (Timer)sender;
try
{
// do the checks here
}
finally
{
// re=enable the timer to check again very soon
timer.Enabled = true;
}
}
}
As for what to check, it depends on what changes you are actually looking to detect. Here are some ideas:
table row count (but dangerous if a row is added and deleted since the last check)
max value of the table id column (only works if you have a numeric identity field that is increasing, and only works to check for new rows)
check individual columns for changes in specific rows you want to watch
use a row CHECKSUM in a column to check for changes on individual rows
ask writers to update a separate table with a change reference id that you can check
use audit tables to record changes, and check for new audit records
You need to better define the scope of your change monitoring before you can get a good answer to this.
Latency
Also ask yourself if you really need 1ms latency on change updates. If you do, a different approach entirely might be better. For example you may need to use a notification mechanism by the data writers to the parts of your application that need to know an update has occurred right now.

Related

Handling Tens of Thousands of Events C# Elegantly from TableDependency

I have a project where a part of it monitors changes made to an SQL database. I am using the SQL Table Dependency NuGet Package to monitor changes so I can update the UI when changes are made.
The issue I have is that there is a function in my program that can add 50-99k rows to a table in the database. The event gets triggered as many times as there are rows added. This is problematic because I do not want to update the UI 99k times. I want to update it at most once or twice. How I am handling right now is when I detect that 5 events are triggered within a certain timespan I DeInit the TableDependency, then a delayed task reenables it after a few seconds and also triggers a UI update at the end so it won't miss anything while it was disabled temporarily.
I also tried using a static bool for rate limiting, instead of DeIniting and ReIniting the TableDependency, but it takes 30-90s sometimes because the event handler cannot reasonably keep up with them all. I think the DeInit works better because removing the callbacks from the eventhandler appears to clear it from events. I could not find a way to clear the event handler from the massive queue of events otherwise.
I tried delving into Reactive Extensions and using the Throttle function. This worked OK except for the fact that the first event received would not trigger. It would wait until the events died off to trigger (I realize this is by design). This makes the program feel unresponsive for awhile because when the events are triggered SQL has already added all the rows so all it really needs is that first event and last event to update at most.
The reason I am trying to find an alternative is because TableDependency sometimes (no idea how to replicate this yet) is orphaning Trigger scripts in SQL on the table with invalid ids and it causes fatal exceptions to occur when the DB Instance (I am using EF6 Core) runs SaveChanges(). I theorize running the DeInit and Init functions frequently is at best not helping the issue and at worst the direct cause of it. So I am trying to find some way to avoid frequently DeIniting and ReIniting the TableDependency but also have my UI updates feel responsive and not have bad performance.
DeInit function:
private static void DeInitDependency(TableType tableType)
{
if(tableType == TableType.Event)
{
eventTableDependency.Stop();
eventTableDependency.Dispose();
eventTableDependency.OnChanged -= SqlDependencyEventTable_OnChanged;
eventTableDependency.OnError -= DepEvent_OnError;
eventTableDependency.OnStatusChanged -= DepEvent_OnStatusChanged;
eventChangeTrackingStarted = false;
}
else if (tableType == TableType.Location)
{
locationTableDependency.Stop();
locationTableDependency.Dispose();
locationTableDependency.OnChanged -= SqlDependencyLocationTable_OnChanged;
locationTableDependency.OnError -= DepLocation_OnError;
locationTableDependency.OnStatusChanged -= DepLocation_OnStatusChanged;
locationChangeTrackingStarted = false;
}
}
Init/Reinit Function:
public static void InitDependency(TableType tableType)
{
try
{
//Set Connection String to SQL
string dbConnectionString = "";
dbConnectionString = sqlCore.generateConnectionString();
if(tableType == TableType.Event)
{
//Create Dependency and Connect
eventTableDependency = new SqlTableDependency<NextGenGui.Models.Event>(dbConnectionString, executeUserPermissionCheck: false);
eventTableDependency.OnChanged += SqlDependencyEventTable_OnChanged;
eventTableDependency.OnError += DepEvent_OnError;
eventTableDependency.OnStatusChanged += DepEvent_OnStatusChanged;
eventTableDependency.Start();
eventChangeTrackingStarted = true;
Debug.WriteLine("Event SQL TRACKING STARTED!");
}
else if(tableType == TableType.Location)
{
locationTableDependency = new SqlTableDependency<Models.Location>(dbConnectionString, executeUserPermissionCheck: false);
locationTableDependency.OnChanged += SqlDependencyLocationTable_OnChanged;
locationTableDependency.OnError += DepLocation_OnError;
locationTableDependency.OnStatusChanged += DepLocation_OnStatusChanged;
locationTableDependency.Start();
locationChangeTrackingStarted = true;
Debug.WriteLine("Location SQL TRACKING STARTED!");
}
}catch (Exception ex)
{
Debug.WriteLine(ex);
if(ex.Message.Contains("Service broker"))
{
InitSQLBrokerSetting();
}
}
}

It sounds like you need one event per business-level operation, instead of one event per table update. If that's the case, then you're going to have to look at implementing your own solution. Topics like SignalR and ServiceBus are good starting points to look into. This stream of business operations is a useful thing to implement anyway, for auditing and caching.
It's worth pointing out that you don't have to completely replace the SQL Table Dependency in one go. You can start with just the tables that are causing problems from the bulk operations.

You effectively need to debounce the event signaling. Rather that removing the event handler (which means you won't know what rows have changed during that period) could your handler be changed to simply make marks in memory state based on what the current UI might be caring about? For instance if the UI is displaying one or more key records that you care about, the handler knows what IDs are relevant and if any of those rows are touched the markers are set, in which a periodic check looks at the markers and refreshes the view. This might be a single row the user is viewing, or a set of row IDs based on something like search results, etc.
If instead the UI reflects a summary of all data state and any row change would impact it, then perhaps consider an in-memory representation that can be updated by the event handler and periodically checked and refresh the view if necessary.
Ultimately it depends on what the view is currently displaying and the relationship with regards to the data update monitoring. Libraries like SignalR are typically employed for more server operation to relevant clients signaling where actions invoked by one client can be relayed to other clients to update data or refresh their view. When it comes to something from the database you would probably want to implement some manner of filtering and processing to monitor when relevant changes have come in to raise a signal for a stale view check to pick up on and refresh.

Is there a way to lock a concurrent dictionary from being used

I have this static class
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityContract> LocationCities = new();
}
My process
Api starts and initializes an empty dictionary
A background job starts and runs once every day to reload the dictionary from the database
Requests come in to read from the dictionary or update a specific city in the dictionary
My problem
If a request comes in to update the city
I update the database
If the update was successful, update the city object in the dictionary
At the same time, the background job started and queried all cities before I updated the specific city
The request finishes and the dictionary city now has the old values because the background job finished last
My solution I thought about first
Is there a way to lock/reserve the concurrent dictionary from reads/writes and then release it when I am done?
This way when the background job starts, it can lock/reserve the dictionary only for itself and when it's done it will release it for other requests to be used.
Then a request might have been waiting for the dictionary to be released and update it with the latest values.
Any ideas on other possible solutions?
Edit
What is the purpose of the background job?
If I manually update/delete something in the database I want those changes to show up after the background job runs again. This could take a day for the changes to show up and I am okay with that.
What happens when the Api wants to access the cache but its not loaded?
When the Api starts I block requests to this particular "Location" project until the background job marks IsReady to true. The cache I implemented is thread safe until I add the background job.
How much time does it take to reload the cache?
I would say less then 10 seconds for a total of 310,000+ records in the "Location" project.
Why I chose the answer
I chose Xerillio's answer because it solves the background job problem by keeping track of date times. Similar to a "object version" approach. I won't be taking this path as I have decided that if I do a manual update in the database, I might as well create an API route that does it for me so that I can update the db and cache at the same time. So I might remove the background job after all or just run it once a week. Thank you for all the answers and I am ok with a possible data inconsistency with the way I am updating the objects because if one route updates 2 specific values and another route updates 2 different specific values then the possibility of having a problem is very minimal
Edit 2
Let's imagine I have this cache now and 10,000 active users
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityUserLogContract> LocationCityUserLogs = new();
}
Things I took into consideration
An update will only happen to objects that the user owns and the rate at which the user might update those objects is most likely once every minute. So that reduces the possibility of a problem by a lot for this specific example.
Most of my cache objects are related only to a specific user so it relates with bullet point 1.
The application owns the data, I don't. So I should never manually update the database unless it's critical.
Memory might be a problem but 1,000,000 normalish objects is somewhere between 80MB - 150MB. I can have a lot of objects in memory to gain performance and reduce the load on the database.
Having a lot of objects in memory will put pressure on Garbage Collection and that is not good but I don't think its bad at all for me because Garbage Collection only runs when memory gets low and all I have to do is just plan ahead to make sure there is enough memory. Yes it will run because of day to day operations but it won't be a big impact.
All of these considerations just so that I can have an in memory cache right at my finger tips.

I would suggest adding a UpdatedAt/CreatedAt property to your LocationCityContract or creating a wrapper object (CacheItem<LocationCityContract>) with such a property. That way you can check if the item you're about to add/update with is newer than the existing object like so:
public class CacheItem<T>
{
public T Item { get; }
public DateTime CreatedAt { get; }
// In case of system clock synchronization, consider making CreatedAt
// a long and using Environment.TickCount64. See comment from #Theodor
public CacheItem(T item, DateTime? createdAt = null)
{
Item = item;
CreatedAt = createdAt ?? DateTime.UtcNow;
}
}
// Use it like...
static class LocationMemoryCache
{
public static readonly
ConcurrentDictionary<int, CacheItem<LocationCityContract>> LocationCities = new();
}
// From some request...
var newItem = new CacheItem(newLocation);
// or the background job...
var newItem = new CacheItem(newLocation, updateStart);
LocationMemoryCache.LocationCities
.AddOrUpdate(
newLocation.Id,
newItem,
(_, existingItem) =>
newItem.CreatedAt > existingItem.CreatedAt
? newItem
: existingItem)
);
When a request wants to update the cache entry they do as above with the timestamp of whenever they finished adding the item to the database (see notes below).
The background job should, as soon as it starts, save a timestamp (let's call it updateStart). It then reads everything from the database and adds the items to the cache like above, where CreatedAt for the newLocation is set to updateStart. This way, the background job only updates the cache items that haven't been updated since it started. Perhaps you're not reading all items from DB as the first thing in the background job, but instead you read them one at a time and update the cache accordingly. In that case updateStart should instead be set right before reading each value (we could call it itemReadStart instead).
Since the way of updating the item in the cache is a little more cumbersome and you might be doing it from a lot of places, you could make a helper method to make the call to LocationCities.AddOrUpdate a little easier.
Note:
Since this approach is not synchronizing (locking) updates to the database, there's a race condition that means you might end up with a slightly out-of-date item in the cache. This can happen if two requests wants to update the same item simultaneously. You can't know for sure which one updated the DB last, so even if you set CreatedAt to the timestamp after updating each, it might not truly reflect which one was updated last. Since you're ok with a 24 hour delay from manually updating the DB until the background job updates the cache, perhaps this race condition is not a problem for you as the background job will fix it when run.
As #Theodor mentioned in the comments, you should avoid updating the object from the cache directly. Either use the C# 9 record type (as opposed to a class type) or clone the object if you want to cache new updates. That means, don't use LocationMemoryCache[locationId].Item.CityName = updatedName. Instead you should e.g. clone it like:
// You need to implement a constructor or similar to clone the object
// depending on how complex it is
var newLoc = new LocationCityContract(LocationMemoryCache[locationId].Item);
newLoc.CityName = updatedName;
var newItem = new CacheItem(newLoc);
LocationMemoryCache.LocationCities
.AddOrUpdate(...); /* <- like above */
By not locking the whole dictionary you avoid having requests being blocked by each other because they're trying to update the cache at the same time. If the first point is not acceptable you can also introduce locking based on the location ID (or whatever you call it) when updating the database, so that DB and cache are updated atomically. This avoids blocking requests that are trying to update other locations so you minimize the risk of requests affecting each other.

No, there is no way to lock a ConcurrentDictionary on demand from reads/writes, and then release it when you are done. This class does not offer this functionality. You could manually use a lock every time you are accessing the ConcurrentDictionary, but by doing so you would lose all the advantages that this specialized class has to offer (low contention under heavy usage), while keeping all its disadvantages (awkward API, overhead, allocations).
My suggestion is to use a normal Dictionary protected with a lock. This is a pessimistic approach that will result occasionally to some threads unnecessarily blocked, but it is also very simple and easy to reason about its correctness. Essentially all access to the dictionary and the database will be serialized:
Every time a thread wants to read an object stored in the dictionary, will first have to take the lock, and keep the lock until it's done reading the object.
Every time a thread wants to update the database and then the corresponding object, will first have to take the lock (before even updating the database), and keep the lock until all the properties of the object have been updated.
Every time the background job wants to replace the current dictionary with a new dictionary, will first have to take the lock (before even querying the database), and keep the lock until the new dictionary has taken the place of the old one.
In case the performance of this simple approach proves to be unacceptable, you should look at more sophisticated solutions. But the complexity gap between this solution and the next simplest solution (that also offers guaranteed correctness) is likely to be quite significant, so you'd better have good reasons before going that route.

new objects added during long loop

We currently have a production application that runs as a windows service. Many times this application will end up in a loop that can take several hours to complete. We are using Entity Framework for .net 4.0 for our data access.
I'm looking for confirmation that if we load new data into the system, after this loop is initialized, it will not result in items being added to the loop itself. When the loop is initialized we are looking for data "as of" that moment. Although I'm relatively certain that this will work exactly like using ADO and doing a loop on the data (the loop only cycles through data that was present at the time of initialization), I am looking for confirmation for co-workers.
Thanks in advance for your help.
//update : here's some sample code in c# - question is the same, will the enumeration change if new items are added to the table that EF is querying?
IEnumerable<myobject> myobjects = (from o in db.theobjects where o.id==myID select o);
foreach (myobject obj in myobjects)
{
//perform action on obj here
}

It depends on your precise implementation.
Once a query has been executed against the database then the results of the query will not change (assuming you aren't using lazy loading). To ensure this you can dispose of the context after retrieving query results--this effectively "cuts the cord" between the retrieved data and that database.
Lazy loading can result in a mix of "initial" and "new" data; however once the data has been retrieved it will become a fixed snapshot and not susceptible to updates.
You mention this is a long running process; which implies that there may be a very large amount of data involved. If you aren't able to fully retrieve all data to be processed (due to memory limitations, or other bottlenecks) then you likely can't ensure that you are working against the original data. The results are not fixed until a query is executed, and any updates prior to query execution will appear in results.

I think your best bet is to change the logic of your application such that when the "loop" logic is determining whether it should do another interation or exit you take the opportunity to load the newly added items to the list. see pseudo code below:
var repo = new Repository();
while (repo.HasMoreItemsToProcess())
{
var entity = repo.GetNextItem();
}
Let me know if this makes sense.

The easiest way to assure that this happens - if the data itself isn't too big - is to convert the data you retrieve from the database to a List<>, e.g., something like this (pulled at random from my current project):
var sessionIds = room.Sessions.Select(s => s.SessionId).ToList();
And then iterate through the list, not through the IEnumerable<> that would otherwise be returned. Converting it to a list triggers the enumeration, and then throws all the results into memory.
If there's too much data to fit into memory, and you need to stick with an IEnumerable<>, then the answer to your question depends on various database and connection settings.

I'd take a snapshot of ID's to be processed -- quickly and as a transaction -- then work that list in the fashion you're doing today.
In addition to accomplishing the goal of not changing the sample mid-stream, this also gives you the ability to extend your solution to track status on each item as it's processed. For a long-running process, this can be very helpful for progress reporting restart / retry capabilities, etc.

Inserting data in background/async task what is the best way?

I have an very quick/lightweight mvc action, that is requested very often and I need to maintain minimal response time under heavy load.
What i need to do, is from time to time depending on conditions to insert small amount of data to sql server (log unique id for statistics, for ~1-5% of queries).
I don't need inserted data for response and if I loose some of it because application restart or smth, I'll survive.
I imagine that I could queue somehow inserting and do it in background, may be even do some kind of buffering - like wait till queue collects 100 of inserts and then make them in one pass.
I'm pretty sure, that somebody must have done/seen such implementation before, there's no need to reinvent wheel, so if somebody could point to right direction, I would be thankful.

You could trigger a background task from your controller action that will do the insertion (fire and forget):
public ActionResult Insert(SomeViewModel model)
{
Task.Factory.StartNew(() =>
{
// do the inserts
});
return View();
}
Be aware though that IIS could recycle the application at any time which would kill any running tasks.

Create a class that will store the data that needs to be pushed to the server, and a queue to hold a queue of the objects
Queue<LogData> loggingQueue = new Queue<LogData>();
public class LogData {
public DataToLog {get; set}
}
The create a timer or some other method within the app that will be triggered every now and then to post the queued data to the database

I agree with #Darin Dimitrov's approach although I would add that you could simply use this task to write to the MSMQ on the machine. From there you could write a service that reads the queue and inserts the data into the database. That way you could throttle the service that reads data or even move the queue onto a different machine.
If you wanted to take this one step further you could use something like nServiceBus and a pub/sub model to write the events into the database.

How do I speed up DbSet.Add()?

I have to import about 30k rows from a CSV file to my SQL database, this sadly takes 20 minutes.
Troubleshooting with a profiler shows me that DbSet.Add is taking the most time, but why?
I have these Entity Framework Code-First classes:
public class Article
{
// About 20 properties, each property doesn't store excessive amounts of data
}
public class Database : DbContext
{
public DbSet<Article> Articles { get; set; }
}
For each item in my for loop I do:
db.Articles.Add(article);
Outside the for loop I do:
db.SaveChanges();
It's connected with my local SQLExpress server, but I guess there isn't anything written till SaveChanges is being called so I guess the server won't be the problem....

As per Kevin Ramen's comment (Mar 29)
I can confirm that setting db.Configuration.AutoDetectChangesEnabled = false makes a huge difference in speed
Running Add() on 2324 items by default ran 3min 15sec on my machine, disabling the auto-detection resulted in the operation completing in 0.5sec.
http://blog.larud.net/archive/2011/07/12/bulk-load-items-to-a-ef-4-1-code-first-aspx

I'm going to add to Kervin Ramen's comment by saying that if you are only doing inserts (no updates or deletes) then you can, in general, safely set the following properties before doing any inserts on the context:
DbContext.Configuration.AutoDetectChangesEnabled = false;
DbContext.Configuration.ValidateOnSaveEnabled = false;
I was having a problem with a once-off bulk import at my work. Without setting the above properties, adding about 7500 complicated objects to the context was taking over 30 minutes. Setting the above properties (so disabling EF checks and change tracking) reduced the import down to seconds.
But, again, I stress only use this if you are doing inserts. If you need to mix inserts with updates/deletes you can split your code into two paths and disable the EF checks for the insert part and then re-enable the checks for the update/delete path. I have used this approach succesfully to get around the slow DbSet.Add() behaviour.

Each item in a unit-of-work has overhead, as it must check (and update) the identity manager, add to various collections, etc.
The first thing I would try is batching into, say, groups of 500 (change that number to suit), starting with a fresh (new) object-context each time - as otherwise you can reasonably expect telescoping performance. Breaking it into batches also prevents a megalithic transaction bringing everything to a stop.
Beyond that; SqlBulkCopy. It is designed for large imports with minimal overhead. It isn't EF though.

There is an extremely easy to use and very fast extension here:
https://efbulkinsert.codeplex.com/
It's called "Entity Framework Bulk Insert".
Extension itself is in namespace EntityFramework.BulkInsert.Extensions. So to reveal the extension method add using
using EntityFramework.BulkInsert.Extensions;
And then you can do this
context.BulkInsert(entities);
BTW - If you do not wish to use this extension for some reason, you could also try instead of running db.Articles.Add(article) for each article, to create each time a list of several articles and then use AddRange (new in EF version 6, along with RemoveRange) to add them together to the dbcontext.

I haven't really tried this, but my logic would be to hold on to ODBC driver to load file into datatable and then to use sql stored procedure to pass table to procedure.
For the first part, try:
http://www.c-sharpcorner.com/UploadFile/mahesh/AccessTextDb12052005071306AM/AccessTextDb.aspx
For the second part try this for SQL procedure:
http://www.builderau.com.au/program/sqlserver/soa/Passing-table-valued-parameters-in-SQL-Server-2008/0,339028455,339282577,00.htm
And create SqlCommnand object in c# and add to its Parameters collection SqlParameter that is SqlDbType.Structured
Well, I hope it helps.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Threads, Events and databases - c#

Related

Handling Tens of Thousands of Events C# Elegantly from TableDependency

Is there a way to lock a concurrent dictionary from being used

new objects added during long loop

Inserting data in background/async task what is the best way?

How do I speed up DbSet.Add()?

Categories

Resources